💬 Challenge: Given a Python string. Find the longest substring for which holds that all of its characters are in alphabetical order so that the first character comes earlier in the alphabet than the second character, and so on.
Here are three examples:
'helloworld', longest substring:
'abcde', longest substring:
'zyxw', longest substring:
You get the point. 🙂
Next, I’ll show you three methods, one is fast, one is faster, and one is blazingly fast:
Feel free to read on to learn how they work and how they compare against each other in our final runtime evaluation (Who will win?). It’s super insightful! 💡
Method 1: Using a Simple For Loop on Strings
The easiest way to get the longest substring in alphabetical order is to iterate over all characters in the original string and keep track of two variables:
longest– the longest substring found so far, and
current– the current substring that could become the longest.
In each iteration, you check if you can add the latest character to the current substring by comparing it against the last character of the current substring.
As soon as the loop terminates, you’ve found the longest substring in alphabetical order.
Here’s one implementation of this idea:
def longest_substring(s): longest = '' current = '' for c in s: if current == '' or c >= current[-1]: current += c else: current = c longest = max(current, longest, key=len) return longest
Here’s how this works for our initial examples:
print(longest_substring('helloworld')) # ellow print(longest_substring('abcde')) # abcde print(longest_substring('zyxw')) # w
Explanation: Let’s examine some important details of this solution next.
if current == '' or c >= current[-1]:
This line first checks if the current variable is empty which only holds once, in the beginning. In that specific case, the expression already evaluates to
True before checking the second condition
c >= current[-1], so Python doesn’t raise an indexing error due to the empty current variable.
This specific Python feature is called “Short-Circuiting”. And you should definitely understand it as an expert coder!
Using this approach also allows us to pass an empty string into the function without obtaining any error:
print(longest_substring('')) # ''
Here’s another interesting line from the code solution:
longest = max(current, longest, key=len)
Instead, you could also do something like this using the ternary operator:
longest = current if len(current)>len(longest) else longest
Both solutions are equally efficient.
💡 Runtime Complexity: The runtime complexity of this approach is linear in the number of characters of the input string, i.e, O(n). You iterate once over all characters. Each loop body has constant runtime complexity because you only compare individual characters that are independent of the solution size. Therefore, we have O(n) * O(1) = O(n) runtime complexity.
Method 2: Using Lists to Speed Things Up
The previous method has linear runtime complexity which is the best we can do, assymptotically. However, we can still improve the “constant factors”, i.e., make the algorithm more efficient using a small optimization.
You can observe in the previous code snippet that we use a lot of string concatenation operations, i.e.,
current += c, which creates a new string in each loop iteration. Just to append a single character to the string!
In fact, depending on the implementation, this could turn out to kill our “linear runtime complexity” property in the first place!
Anyway, there’s an easy workaround: use Python lists instead of strings. Lists are mutable whereas strings are immutable.
💡 Recommended Tutorial: Mutable vs Immutable Objects in Python
This means you can append new characters at the end of a list object in constant runtime complexity no matter how large the list is. But you can never append a new character at the end of a string!
def longest_substring(s): longest =  current =  for c in s: if not current or c >= current[-1]: current.append(c) else: current = [c] longest = current if len(current)>len(longest) else longest return ''.join(longest) print(longest_substring('helloworld')) # ellow print(longest_substring('abcde')) # abcde print(longest_substring('zyxw')) # z print(longest_substring('')) # ''
Method 3: Using the Regex Engine
You can use a simple regular expression that adds an asterisk operator after each character in the alphabet to find all substrings that are alphabetically ordered. Store it in the
substrings variable. Then use the
max(substrings, key=len) function to obtain the longest of those substrings.
Here’s the regex:
I must say – I hate this solution. 😅 It’s hard-coded and uses an external dependency without a real need.
However, it works beautifully. It’s concise, readable, and bold. So, why not?
import re def longest_substring(s): substrings = re.findall('a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*', s) return max(substrings, key=len) print(longest_substring('helloworld')) # ellow print(longest_substring('abcde')) # abcde print(longest_substring('zyxw')) # z print(longest_substring('')) # ''
💡 Info: You can read the asterisk
A* quantifier as zero-or-more regex: the preceding regex
A is matched an arbitrary number of times. So, you’re matching an arbitrary number of
'a' characters, then an arbitrary number of
'b' characters, and so on.
If you want to learn more about the
re.findall() function, check out our guide on the Finxter blog.
For example, the input string
'helloworld' results in the following output of the
['h', 'ellow', 'or', 'l', 'd', '']. These are all alphabetically ordered substrings of the string
'helloworld'. We take the longest one which is
The fastest method to find the longest substring in alphabetical order is, surprisingly, the regex method (Method 3) which reduces runtime by more than an order of magnitude: Method 1 needed 0.093 seconds, Method 2 needed 0.038 seconds, and Method 3 needed 0.0064.
Regex to the rescue! 💪⚡🔥
I have evaluated this on strings with lengths 26, 260, 2600, 26000, and 260000 using the following code on my notebook (Acer, Intel i7-8565U, 8GB DDR4 RAM):
import re import time import matplotlib.pyplot as plt def longest_substring_1(s): longest = '' current = '' for c in s: if current == '' or c >= current[-1]: current += c else: current = c longest = max(current, longest, key=len) return longest def longest_substring_2(s): longest =  current =  for c in s: if not current or c >= current[-1]: current.append(c) else: current = [c] longest = current if len(current)>len(longest) else longest return ''.join(longest) def longest_substring_3(s): substrings = re.findall('a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*', s) return max(substrings, key=len) xs = [1, 10, 100, 1000, 10000] runtimes_1, runtimes_2, runtimes_3 = , ,  for i in xs: s = 'abcdefghijklmnopqrstuvwxyz'*i # Elapsed Runtime Method 1 t1 = time.time() longest_substring_1(s) t2 = time.time() runtimes_1.append(t2-t1) # Elapsed Runtime Method 2 t1 = time.time() longest_substring_2(s) t2 = time.time() runtimes_2.append(t2-t1) # Elapsed Runtime Method 3 t1 = time.time() longest_substring_3(s) t2 = time.time() runtimes_3.append(t2-t1) plt.plot(xs, runtimes_1, 'o-', label='Method 1') plt.plot(xs, runtimes_2, 'x--', label='Method 2') plt.plot(xs, runtimes_3, 'v-', label='Method 3') plt.xscale('log') plt.ylabel('Elapsed Runtime (s)') plt.xlabel('String size in multiples of 26 chars') plt.legend() plt.show()
Thanks for reading through the whole article. ❤️ Feel free to join our email academy and download our free Python cheat sheets:
Image Credit: https://www.motorbiketireshop.com/
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.