5 Best Ways to Sort a List of Strings by the Numeric Part in Python

πŸ’‘ Problem Formulation: Developers often encounter the need to order lists of strings that contain numeric data, especially when dealing with filenames or identifiers that follow a certain nomenclature. For instance, given a list such as ["item2", "item12", "item1"], Python’s default sorting would yield ["item1", "item12", "item2"] due to lexicographical ordering. However, the desired result is to have the list ordered by the numeric part as ["item1", "item2", "item12"].

Method 1: Using Regular Expressions

This approach involves using the re module to extract numeric parts of the string and sort the list accordingly. The sorted() function takes a key parameter where you can pass a lambda function that uses the re.findall() method to find all the numeric substrings and return them as integers for proper numeric sorting.

Here’s an example:

import re

def numerical_sort(value):
    numbers = re.compile(r'\d+')
    parts = numbers.split(value)
    parts[1:2] = map(int, numbers.findall(value))
    return parts

strings = ["item2", "item12", "item1"]
sorted_strings = sorted(strings, key=numerical_sort)
print(sorted_strings)

Output:

["item1", "item2", "item12"]

The numerical_sort function divides strings into non-numeric and numeric parts, then transforms the numeric parts into integers. This process allows the sorted() function to compare the numeric values accurately, leading to the correct order of the strings.

Method 2: Using the natsort Library

For developers preferring not to reinvent the wheel, the natsort library offers a simple solution for natural sorting. After installing via pip with pip install natsort, the natsorted() function can be used to order strings containing numbers in a human-friendly way.

Here’s an example:

from natsort import natsorted

strings = ["item2", "item12", "item1"]
sorted_strings = natsorted(strings)
print(sorted_strings)

Output:

["item1", "item2", "item12"]

Here, the natsorted() function internally handles parsing the numeric parts and ordering the items naturally. It offers a very convenient one-liner solution for sorting strings.

Method 3: Custom Sort Function

Defining a custom sort function allows you to control sorting behavior explicitly. Here, we utilize the sorted() function again, but with a more direct approach where we manually parse the string digits and directly convert them to integers within the sorting key, avoiding regular expressions.

Here’s an example:

strings = ["item2", "item12", "item1"]

def extract_number(string):
    return int(''.join(filter(str.isdigit, string)))

sorted_strings = sorted(strings, key=extract_number)
print(sorted_strings)

Output:

["item1", "item2", "item12"]

This script works by defining an extract_number function that filters out all non-digit characters and converts the resulting string to an integer, which is then used for sorting.

Method 4: Using a Third-Party Utility Function

In addition to specific libraries like natsort, numerous utility functions provided by various third-party libraries can be adapted for natural sorting of strings. One must be careful to select a function well-suited for the specific format of the strings to be sorted.

Here’s an example:

# Assuming a utility function 'natural_key' is available from a third-party library.
from third_party_lib import natural_key

strings = ["item2", "item12", "item1"]
sorted_strings = sorted(strings, key=natural_key)
print(sorted_strings)

Note: Replace ‘third_party_lib’ and ‘natural_key’ with actual library and function.

Output:

["item1", "item2", "item12"]

In this placeholder code, ‘natural_key’ is a hypothetical utility function that behaves similar to the custom sort functions previously described. The usage is straightforward: You pass ‘natural_key’ as the key parameter to Python’s sorted() function.

Bonus One-Liner Method 5: Using List Comprehensions and sorted()

Python enthusiasts often prefer concise, one-liner solutions. This method uses list comprehension to extract numeric parts as integers and pairs them with the original strings for sorting.

Here’s an example:

strings = ["item2", "item12", "item1"]
sorted_strings = [x for _, x in sorted((int(''.join(filter(str.isdigit, s))), s) for s in strings)]
print(sorted_strings)

Output:

["item1", "item2", "item12"]

This elegant one-liner transforms each string into a tuple containing the extracted integer and the original string, sorts the list of tuples, and then uses a list comprehension to retrieve the strings in the sorted order.

Summary/Discussion

  • Method 1: Regular Expressions. Offers granular control and is built-in. Can be complex for unfamiliar users.
  • Method 2: natsort Library. Simplicity itself in one line. Requires an additional dependency.
  • Method 3: Custom Sort Function. No third-party dependencies and is fairly straightforward. Requires writing a custom function.
  • Method 4: Third-Party Utility Function. Leverages existing solutions. The availability and suitability of functions can vary, and they may require some research.
  • Bonus Method 5: One-Liner. Extremely concise. Readability may be reduced for those less familiar with Python.