[Interview Question] Longest Substring Without Repeating Characters

[toc]

?️ Company Tags: As reported by numerous programmers across the globe, this question has been asked in coding interviews/rounds by companies like:

Amazon
Adobe
Bloomberg
Yelp

So, if you are preparing for your upcoming coding interview, then you might well come across this question in your coding round. Can you solve it optimally?

Problem Formulation

Given a string “s”. Find the longest substring without repeating any characters.

⚠️Constraints:

0 <= s.length <= 5 * 10⁴
s consists of English letters, digits, symbols, and spaces.

Note: In formal language theory and computer science, a substring is a contiguous sequence of characters within a string.

“string” is a substring of “substring”

(source: Wikipedia)

?Examples

Let’s have a look at some examples to improve our understanding of this problem.

Example 1

Input s = "xyzxyzyy"
Output: 3
Explanation: The longest substring is "xyz", with a length of 3.

Example 2

Input: s = "kkkkk"
Output: 1
Explanation: The longest substring is "k", with a length of 1.

Example 3

Input: s = "2455lmno#%kk"
Output: 8
Explanation: The longest substring is "5lmno#%k", with a length of 8.
Notice that the answer must be a substring, "245lmno#%k" is a subsequence and not a substring.

Example 4

Input: s = ""
Output: 0
Explanation: This is an edge case with a null string.

Example 5

Input: s = "tweet"
Output: 3
Explanation: The longest substring is "twe", with a length of 3.

? Tidbit:
❖ A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. Whereas a substring is a “contiguous sequence” of characters within a string.
❖ A substring is also a subsequence but not vice-versa. Example: "ace" is a subsequence of "abcde" but it is not a substring. "abc" is a substring as well as a subsequence of "abcde".

?️Naïve Approach: Using A Brute Force Algorithm

The most straightforward solution to this problem is to use the brute force method of searching the unique characters.

❖ Approach: The basic idea of this algorithm is to scan all the substrings one by one and check if it contains any duplicate character. You need all unique characters within the resultant substring. Thus, you have to store the longest substring without any repeating characters in a variable and then return it.

We can iterate through all possible substrings with the help of a nested loop.
If no duplicate character is found within the current substring, we update the answer with the length of the maximum substring.
However, if a duplicate character is encountered, we break out of the inner loop and the next substring is taken into account.

The following diagram illustrates the approach that is being followed here:

❖ Let’s look at the code:

def largest_substring(s):
    lsub = 0
    for i in range(len(s)):
        curr = ""
        for j in range(i, len(s)):
            if s[j] not in curr:
                curr += s[j]
                lsub = max(lsub, len(curr))
            else:
                break
    return lsub

Let’s execute this code on our examples:

# Example 1
s = "xyzxyzyy"
print(largest_substring(s))
#3

# Example 2
s = "kkkkk"
print(largest_substring(s))
#1

# Example 3
s = "2455lmno#%kk"
print(largest_substring(s))
#8

# Example 4
s = ""
print(largest_substring(s))
#0

# Example 5
s = "tweet"
print(largest_substring(s))
#3

Hurrah! ? It passed all the test cases.

❖ Analysis: Consider a string “s” with size “n”. In this case, there will be (n * (n+1)/2) possible substrings. Hence, the nested for loop has a complexity of O(n^2). Thus, this approach has a time complexity of O(n^2).

❖ Discussion: Although this pretty much works but it is not an efficient solution. In this approach, we are repeatedly checking every substring for unique characters. But do we need to check each substring?

?️Solution 2: Sliding Window

❖ Approach:

We can optimize the brute force method by using the sliding window technique. In this solution, we will keep traversing the string from left to right till we don’t encounter any repeating character. To know the length of the current window, we will use a couple of pointers/indexes. We will also keep a map to store the count of the unique characters and keep on updating that as we go on expanding or shrinking the sliding window.

❖ Let’s look at the algorithm:

Initialize two pointers i and j at 0. These pointers will allow us to determine the size of the sliding window.
Define a set to store the unique characters (Set does not allow any duplicate values) and a variable “lon” to store the length of the longest substring.
Start scanning the string:
- If the current character has occurred before (not present in set), add the character to the set and increment the j pointer and also update the variable “lon” which stores the answer.
- Else if the current character has been repeated (present in the set) at an index before i, set the “lon” as the current length of the sliding window and remove the character at index i, i.e., s[i].
Return the variable “lon”.

Here’s an example to illustrate the above algorithm:

Explanation:

Initially, the current index and the end index point at the first index. Hence, we start with the first index of the string and store it in the set char.
We then shift the pointer j to the right. Thus, the current window expands and the length of the substring is simultaneously incremented and stored in a variable that keeps a track of the length of the longest substring. The process is repeated until a repeating character is found. In this case, the repeating character is found at the 3^rd iteration.
Once a repeating character is found, the character at the i^th index is removed from the set. In this case, [T] gets removed at the end of the 3^rd iteration. Thus the set now contains [W, E] after the 3^rd iteration. This process is repeated and after the entire string has been traversed, you will have the length of the largest substring stored within the output variable.

❖ Now, let us have a look at the code:

def largest_substring(s):
    i = j = lon = 0
    chars = set()
    while j < len(s):
        if s[j] not in chars:
            chars.add(s[j])
            j = j + 1
            lon = max(lon, len(chars))
        else:
            chars.remove(s[i])
            i = i + 1
    return lon

❖ Test cases: Let’s execute the examples on this code to check if it works.

# Example 1
s = "xyzxyzyy"
print(largest_substring(s))
#3

# Example 2
s = "kkkkk"
print(largest_substring(s))
#1

# Example 3
s = "2455lmno#%kk"
print(largest_substring(s))
#8

# Example 4
s = ""
print(largest_substring(s))
#0

# Example 5
s = "tweet"
print(largest_substring(s))
#3

Perfect! It passed all test cases.

❖ Time Complexity Analysis:

In this solution, we have to traverse the string only once, and hence the time complexity will be linear → O(n).

In order to check that no character repeats inside a window, we have used set data structure. The lookup time for this is O(1).
In the worst-case, each character in the string will be visited twice, accounting for a complexity of O(2*n).
Thus, the total runtime complexity = O(1)+O(2*n) ~ O(n).

?️Optimal Solution: Using a Dictionary

❖ Approach:

We can optimize the above code slightly by using a dictionary. The previous solution requires a maximum of 2n steps. But it can be further optimized to require only n steps. Using this approach, you can skip more characters immediately when a repeating character is found. You can do this by mapping each character to its index.
Reason: If s[j] is a duplicate character in the range [i, j) with index j’, you don’t have to increase i one at a time. Instead, you can simply skip all the elements in the range [i, j’] and set i to be j’ + 1 directly.

Here’s an illustration of the concept:

Explanation:

The index of every character is stored as key-value pairs within the dictionary hmap. The variable lon which is used to store the length of the longest substring is also updated such that lon stores the result of max(lon,j-i+1).
- Note: Initially, lon = 0
As soon as a character is repeated, the elements within the range [i,j’] are skipped and i is set to j’+1. In this case, the repeating character is found at the 4^th iteration. Thus, all the characters within the range [0,2] are skipped and i is set to point at the 3^rd index.
- Note: j' represents the index of the repeating character. In this example, j’ = 2 ( 4th iteration) for the repeating character E and j’= 1 (5th iteration) for repeating character T.
After a complete execution of the loop, the length of the largest element will be stored within the variable “lon”.

❖ Now, let’s look at the code:

def largest_substring(s):
    i = lon = 0
    hmap = {}
    for j in range(0, len(s)):
        if s[j] in hmap:
            i = max(i, hmap[s[j]] + 1)
        hmap[s[j]] = j
        lon = max(lon, j-i+1)
    return lon

❖ Test Case Verification

# Example 1
s = "xyzxyzyy"
print(largest_substring(s))
#3

# Example 2
s = "kkkkk"
print(largest_substring(s))
#1

# Example 3
s = "2455lmno#%kk"
print(largest_substring(s))
#8

# Example 4
s = ""
print(largest_substring(s))
#0

# Example 5
s = "tweet"
print(largest_substring(s))
#3

❖ Complexity Analysis: Using this approach you have to scan the string from left to ring only once. This means that the loop will undergo n iterations. Thus this approach has a linear time complexity, i.e., O(n).

The following table will help you to visualize the complexity with respect to length of the string.

INPUT	OUTPUT	O(n)
xyzxyzyy	3	O(3)
kkkkk	1	O(1)
2455lmno#%kk	8	O(8)
	0	O(1)
tweet	5	O(5)

Conclusion

I hope you enjoyed this coding interview question. Please stay tuned and subscribe for more interesting coding problems.

Recommended: Finxter Computer Science Academy

Do you want to master the most popular Python IDE fast?
This course will take you from beginner to expert in PyCharm in ~90 minutes.
For any software developer, it is crucial to master the IDE well, to write, test and debug high-quality code with little effort.

Join the PyCharm Masterclass now, and master PyCharm by tomorrow!

✍️ Post Credits: Shubham Sayon and Rashi Agarwal