IMMENSE.LY

IT’S THAT BIG

Run Length Encoding

I was actually somewhat surprised that Run Length Encoding (they call it “String Compression”) was included in Cracking the Coding Interview, because it’s not particularly a difficult algorithm to write. I would have thought that writing Huffman Encoding would have been more interesting in interviews given the entire chapter on graphs and trees in the book.

The key to RLE is to remember the last character as you’re iterating through the string. If the next character is different, write out the current character and append a count of the number of times you have seen it.

Here’s the code:

def runlength(s):
    counter = 0
    lastChar = ''
    newStr = ""

    for currChar in s:
        if currChar == lastChar:
            counter += 1
        else:
            if lastChar != '':
                newStr += lastChar + str(counter)
            counter = 1
            lastChar = currChar

    newStr += lastChar + str(counter)

    return newStr

I didn’t bother with the check for whether the returned string is shorter than the string which was passed in, however, it’s easy enough to do.

But, there is a slight problem here, and I’m guessing that they’re expecting interviewers to play this up. The runtime complexity for iterating through the string is going to be O(N), but the string copy, in theory, is going to be O(N2) every time we concatenate the new string. Or rather, it would be in anything prior to Python 2.4 which changed the behaviour of how concatenation works. And I guess the question is do you really want to be fighting with the interviewer in the middle of a technical interview over the subtleties of how Python string concatenation works? The answer to that is “maybe” depending on the interviewer, but this is starting to feel like a test of wits with iocane powder.

Anyway, long story short, if we used a list() and appended to it instead of appending to the string, we could ensure that the function remains O(N). Use a "".join(myStr) at the end and pass the new resultant string back.