5 Best Ways to Sort List of Strings Containing Numbers (Python)

5/5 - (1 vote)

πŸ’‘ Problem Formulation: Working with datasets often involves sorting lists, and it can become tricky when a list contains strings with numbers.

For instance, you might have a list like ["item2", "item12", "item1"] and want it sorted so that the numerical part of the strings dictates the order, resulting in ["item1", "item2", "item12"].

How can you achieve this in Python, considering the default sort would treat the numbers lexicographically, yielding an unintuitive ["item1", "item12", "item2"]?

Here are five methods to solve this sorting problem.

Method 1: Using a Custom Key Function

In Python, the sort() method of lists accepts a key argument that allows you to specify a function to be called on each list item before making comparisons. The key function can be crafted to extract numerical values from strings and use them for sorting.

Here’s an example:

import re

def numerical_key(s):
    return int(re.search(r'\d+', s).group())

items = ["apple10", "apple2", "banana1"]
items.sort(key=numerical_key)
print(items)

Output:

['banana1', 'apple2', 'apple10']

This code defines a numerical_key function that uses the re module to find the first sequence of digits in each string and converts it to an integer. When passed as the key argument to sort(), it ensures the numbers within the strings are compared numerically, not lexicographically.

Method 2: Using the natsort Library

natsort is a third-party library designed to sort lists “naturally,” handling the insertion of numbers within strings seamlessly. It’s especially useful for lists that cannot be easily handled with custom key functions.

Here’s an example:

from natsort import natsorted

items = ["version_1.9.1", "version_1.10.0", "version_1.9.2"]
sorted_items = natsorted(items)
print(sorted_items)

By simply calling natsorted() from the natsort library, our list is sorted with the numerical values interpreted correctly, keeping the versions in the anticipated incremental order.

Method 3: Parsing Numbers Manually πŸ‘‰ NO LIBRARY!

If you want to avoid external dependencies and prefer handling number parsing manually, you can create a function that splits strings into segments of numbers and non-numbers, then sorts by converting numeric segments to integers.

Here’s an example:

def parse_num(s):
    return [int(text) if text.isdigit() else text.lower() for text in re.split(r'(\d+)', s)]

items = ["x10y", "x2y", "x1y"]
items.sort(key=parse_num)
print(items)

The parse_num function divides each string into a list of numbers and text, converting recognizable numbers into integers. This list can then be used as a sorting key.

Method 4: Using functools.cmp_to_key

The functools module provides a cmp_to_key utility that converts an old-style comparison function (one that returns -1, 0, or 1) to a key function. This is useful when upgrading legacy code or when comparison logic is complex.

Here’s an example:

from functools import cmp_to_key
import re

def compare_items(a, b):
    a_num = int(re.search(r'\d+', a).group())
    b_num = int(re.search(r'\d+', b).group())
    return (a_num > b_num) - (a_num < b_num)

items = ["item202", "item20", "item3"]
items.sort(key=cmp_to_key(compare_items))
print(items)

By defining a comparison function, compare_items, which extracts numbers and compares them directly, you can use cmp_to_key to transform this function into a key function for sorting.

Also check out my article on this:

πŸ‘‰ Python List Sort Key

Bonus One-Liner Method 5: Using List Comprehension with sort()

Sometimes, the simplest methods are the most satisfying. If you know that every string in your list starts with non-digits followed by digits, a one-liner can do the trick with sort().

Here’s an example:

items = ["stage3", "stage11", "stage1"]
items.sort(key=lambda x: (x.rstrip('0123456789'), int(re.search(r'\d+$', x).group())))
print(items)

The lambda function strips away trailing digits and isolates the numeric suffix of each string. The sort() method then sorts items first by their non-numeric prefix and then by the numeric value of the suffix.

Summary/Discussion

  • Method 1 uses a custom key function; it’s built-in and efficient for simple cases.
  • Method 2 leverages natsort, an external library; very powerful and handles complex cases but requires an external dependency.
  • Method 3 requires manual parsing; it’s flexible for diverse string structures but is more complex to implement and maintain.
  • Method 4 takes advantage of functools.cmp_to_key; useful for adapting comparison functions but may be overkill for simpler cases.
  • Method 5 is a compact one-liner using list comprehension; it’s clean and succinct but might not be as readable for those unfamiliar with lambdas or regex.

Python One-Liners Book: Master the Single Line First!

Python programmers will improve their computer science skills with these useful one-liners.

Python One-Liners

Python One-Liners will teach you how to read and write “one-liners”: concise statements of useful functionality packed into a single line of code. You’ll learn how to systematically unpack and understand any line of Python code, and write eloquent, powerfully compressed Python like an expert.

The book’s five chapters cover (1) tips and tricks, (2) regular expressions, (3) machine learning, (4) core data science topics, and (5) useful algorithms.

Detailed explanations of one-liners introduce key computer science concepts and boost your coding and analytical skills. You’ll learn about advanced Python features such as list comprehension, slicing, lambda functions, regular expressions, map and reduce functions, and slice assignments.

You’ll also learn how to:

  • Leverage data structures to solve real-world problems, like using Boolean indexing to find cities with above-average pollution
  • Use NumPy basics such as array, shape, axis, type, broadcasting, advanced indexing, slicing, sorting, searching, aggregating, and statistics
  • Calculate basic statistics of multidimensional data arrays and the K-Means algorithms for unsupervised learning
  • Create more advanced regular expressions using grouping and named groups, negative lookaheads, escaped characters, whitespaces, character sets (and negative characters sets), and greedy/nongreedy operators
  • Understand a wide range of computer science topics, including anagrams, palindromes, supersets, permutations, factorials, prime numbers, Fibonacci numbers, obfuscation, searching, and algorithmic sorting

By the end of the book, you’ll know how to write Python at its most refined, and create concise, beautiful pieces of “Python art” in merely a single line.

Get your Python One-Liners on Amazon!!