5 Best Ways to Sort a List of Strings by Numeric Part in Python

How to Sort a List of Strings by Numeric Value in Python

πŸ’‘ Problem Formulation: Python developers often face the challenge of sorting lists where each element is a string containing a numeric part. Standard sorting won’t work as expected because strings are sorted lexicographically, not numerically. For instance, given a list ['item25', 'item3', 'item100'], a numerical sort should reorder it to ['item3', 'item25', 'item100']. This article explores five methods to accomplish this task effectively.

Method 1: Using a Custom Sort Key with Regular Expressions

Sorting by extracting numeric values can be achieved by using regular expressions within a custom sort key function. The re library in Python enables pattern matching, which can be used to identify the numeric parts of strings. The .sort() method or sorted() function can utilize this custom key to sort the list based on the extracted numbers.

Here’s an example:

import re

my_list = ['item25', 'item3', 'item100']
my_list.sort(key=lambda x: int(re.search(r'\d+', x).group()))

print(my_list)

Output:

['item3', 'item25', 'item100']

In this code, we define a lambda function that uses the re.search() method to find the first occurrence of one or more digits in each string and converts the result to an integer. The list is then sorted using this integer value as the sort key. It’s a powerful method for lists with a consistent structure where the numeric part can be well-defined with a regular expression pattern.

Method 2: Using the “natsort” Library

For those seeking an external library solution, the natsort library provides natural sorting capabilities, handling the numeric parts intelligently. This library treats numerical sections of strings as numbers during sorting, which leads to a natural ordering, much like a human would sort.

Here’s an example:

from natsort import natsorted

my_list = ['item25', 'item3', 'item100']
sorted_list = natsorted(my_list)

print(sorted_list)

Output:

['item3', 'item25', 'item100']

Here, by using the natsorted() function from the natsort library, we don’t need to implement any custom functions, making the code much more simple and readable. This is especially useful when dealing with complex string structures or large datasets. However, this method requires installing an external library.

Method 3: Using a Custom Key Function without Regular Expressions

If you prefer not to use regular expressions or external libraries, you can write a custom key function that parses the numeric part out of each string manually. This might involve iterating over the string and extracting the numeric characters using string methods.

Here’s an example:

def extract_number(s):
    return int(''.join(filter(str.isdigit, s)))

my_list = ['item25', 'item3', 'item100']
my_list.sort(key=extract_number)

print(my_list)

Output:

['item3', 'item25', 'item100']

In this code snippet, the extract_number() function removes all non-digit characters from a string and converts the remaining numeric part to an integer. The built-in filter() function with str.isdigit is a straightforward way to achieve this. Sorting the list using this custom function as the key ensures that it is sorted based on numeric values. This method keeps everything in Python’s standard library without the need for regular expressions.

Method 4: Using List Comprehensions and Zipping

This method extracts numerics using list comprehension and then sorts the original list by zipping it with the extracted numbers. This is a more Pythonic way by combining list manipulations and tuple sorting.

Here’s an example:

my_list = ['item25', 'item3', 'item100']
sorted_list = [x for _, x in sorted(zip([int(''.join(filter(str.isdigit, i))) for i in my_list], my_list))]

print(sorted_list)

Output:

['item3', 'item25', 'item100']

We extract the numerics using list comprehension combined with filtering and convert these to integers. We then zip this list of integers with the original list and sort it. Finally, we use another list comprehension to extract the sorted strings. While creative and Pythonic, this method might be less readable to some due to its compact nature.

Bonus One-Liner Method 5: Using Sorted with a Custom Inline Function

For a quick, inline solution, you can use the sorted() function with a lambda function directly as the key. This one-liner approach combines string manipulations and sorting in a succinct way.

Here’s an example:

my_list = ['item25', 'item3', 'item100']
sorted_list = sorted(my_list, key=lambda x: int(''.join(filter(str.isdigit, x))))

print(sorted_list)

Output:

['item3', 'item25', 'item100']

This one-liner uses the same technique as in Method 3 with filter() and str.isdigit to extract numerics but does it directly within the sorted() function call. It’s concise and utilizes the power of Python’s lambda functions for a quick sort operation without the need for defining a separate function. It’s perfect for simple scripting or when the list format is consistent and simplicity is preferred.

Summary/Discussion

  • Method 1: Custom Sort Key with Regular Expressions. Highly effective and precise. Relies on regex knowledge. Best for complex patterns.
  • Method 2: “natsort” Library. Simplifies code. Requires external library. Great for complex or mixed string structures.
  • Method 3: Custom Key Function without Regular Expressions. Uses standard libraries. More verbose code. Good for those avoiding regex.
  • Method 4: Using List Comprehensions and Zipping. Pythonic. May sacrifice readability for brevity. Designed for Python enthusiasts.
  • Method 5: Bonus One-Liner. Quick and easy. Sufficient for straightforward problems. Not suitable for complex sorting requirements.