Unique Strings

I’ve been perusing Cracking the Coding Interview lately, and thought I’d share some of my solutions to a few of the problems. If you’re planning on doing a coding interview, I’d definitely recommend attemping to come up with your own solutions, but I figured these might be useful as I wrote them in Python and the book is pretty heavy on Java.

I’m actually not a huge Java fan for doing interviews. Most of the Java programmers that I’ve interviewed over the years tended to get caught up in syntax or fighting the compiler on CoderPad and things have a habit of not turning out well. I think this is exacerbated by a lot of engineers being used to writing in an IDE and then struggling with an unfamiliar interface. When I’m conducting interviews, I actually don’t really care what language a person is using, although I give the interviewee that caveat that if I don’t know the language I’m probably going to ask a lot more questions.

Anyway, first up is the Unique Strings problem, which I guess is kind of like FizzBuzz. I think the point is really to just get you to warmed up to move on to more difficult problems. In Python I cooked up a bunch of different ways to do this.

First up is using a set:

def allUnique1(s):
    return len(set(s)) == len(s)

This uses the fact that sets will omit duplicates, so if you pass it a string with the same character twice, you’re going to end up with a shorter length set. In terms of runtime complexity, this is O(n), but it does require making an extra copy of the string inside of the set, however, if the string was just ASCII characters, the set would be pretty tiny. If it was in Unicode, then it could in theory be a lot longer, but if you’re feeding in a huge string, the set is still going to be pretty tiny.

The second solution relies on sorting:

def allUnique2(s):
    s = sorted(s)
    for cnt in range(len(s)-1):
        if s[cnt] == s[cnt+1]:
            return False
    return True

Python uses “Timsort” which is a hybrid between Mergesort and Quicksort. After sorting the list in place, we iterate through it checking adjacent cells to see if they’re the same. Time complexity here is n*log(n) on average for the sorting. I’m actually not 100% sure of the space required behind the scenes with sorted(), although ultimately we’re assigning over the top of our string, so any used memory would be freed up. Also, I decided to count down here instead of counting up, but you could go either way.

The third and fourth solutions involve using collections.Counter().

from collections import Counter
def allUnique3(s):
    c = Counter(s)
    for v in c.itervalues():
        if v > 1:
            return False
    return True


from collections import Counter
def allUnique4(s):
    c = Counter(s)
    return len([v for v in c.iterkeys()]) == len(s)

These are both similiar but allUnique3() will perform slightly better because it doesn’t have to count the all of the values inside of the counter if it detects a duplicate. allUnique4() is essentially the same thing as allUnique1(), just done with a Counter instead of a set.

Were I a gambling man, I would guess that an interviewer is expecting an O(n2) solution which iterates through the string multiple times to find the duplicate and then asking to refine on that. That’s fine for a lower level language like C or Assembly, but in Python I think allUnique1() would probably be your best bet in almost every case.