Python | Sort a List of Strings by Part of String

5/5 - (1 vote)

When working with lists of strings, it’s often necessary to sort them based not on the entire string, but on a specific segment of each string.

This can be particularly useful when managing filenames, dates, and other structured data encapsulations in strings. In this article, we will demonstrate several methods to sort a list of strings by a specific part of each string in Python.

βœ… Problem Formulation: Suppose you’re given a list of strings that contain dates in the YYYY-MM-DD format, concatenated with a unique identifier, such as ["2023-03-01-AB123", "2023-01-15-XY987", "2022-12-19-QW564"]. Your task is to sort this list by the date part of each string.

Method 1: Using lambda and split

Python’s lambda functions are small anonymous functions defined with the lambda keyword. By combining a lambda function with the split() method, you can create a custom key function that sorts a list of strings based on the specific part of the string you’re interested in.

Here’s an example:

data = ["2023-03-01-AB123", "2023-01-15-XY987", "2022-12-19-QW564"]
sorted_data = sorted(data, key=lambda x: x.split('-')[:3])
print(sorted_data)

This code snippet sorts the list data by the date part, assuming the date is always formatted as YYYY-MM-DD and separates the date from the identifier with a hyphen. The lambda function splits each string at hyphens and uses the first three elements (the date portion) as the sorting key.

Method 2: Using a custom function

Instead of using a lambda, you might define a full-fledged function to process the strings and provide a sorting key. This can make the code more readable and easier to maintain, particularly if the logic for extracting the substring is complex.

Here’s an example:

def get_date_key(string):
    return string.split('-')[:3]

data = ["2023-03-01-AB123", "2023-01-15-XY987", "2022-12-19-QW564"]
sorted_data = sorted(data, key=get_date_key)
print(sorted_data)

By defining the function get_date_key, this code snippet does the same as the previous method but increases readability. The function clearly describes that it’s obtaining a “date key” from each string for sorting.

Method 3: Using Regular Expressions

Regular expressions provide a powerful way to match patterns within strings. In Python, the re module can help to extract date parts or other specific patterns from each string for sorting.

Here’s an example:

import re

def date_key(string):
    return re.search(r'\d{4}-\d{2}-\d{2}', string).group()

data = ["2023-03-01-AB123", "2023-01-15-XY987", "2022-12-19-QW564"]
sorted_data = sorted(data, key=date_key)
print(sorted_data)

The date_key function uses the re.search() method to find a pattern that looks like a date and uses that as the key for sorting. It’s a robust option if the date is not consistently positioned in each string.

Method 4: Using itemgetter with map

The operator module’s itemgetter function can work in combination with map to sort strings based on multiple positions or structure. This can be handy when the substring to sort by is not separated neatly by a delimiter or when working with fixed-width fields.

Here’s an example:

from operator import itemgetter

data = ["2023-03-01-AB123", "2023-01-15-XY987", "2022-12-19-QW564"]
sorted_data = sorted(data, key=itemgetter(slice(0, 10)))
print(sorted_data)

Using the slice object within itemgetter, this code snippet defines the range of characters to be used as the sorting key. It’s a good choice when dealing with strings of predictable structures.

Bonus One-Liner Method 5: Using list comprehension and tuple unpacking

This method uses list comprehension and tuple unpacking to create an intermediate list of tuples, where each tuple consists of the sorting key and the original string, then sorts based on the key and extracts the sorted strings.

data = ["2023-03-01-AB123", "2023-01-15-XY987", "2022-12-19-QW564"]
sorted_data = [x for _, x in sorted((x.split('-')[:3], x) for x in data)]
print(sorted_data)

Using list comprehension, this one-liner creates tuples for sorting and unpacks them after the sorting is done to get the final sorted list of strings.

Summary/Discussion

  • Method 1 (lambda and split):
    • Strength: Compact and convenient for simple extractions.
    • Weakness: Can become unreadable with more complex extraction logic.
  • Method 2 (custom function):
    • Strength: Clear and maintainable, good for complex extractions.
    • Weakness: Requires additional overhead of defining a function.
  • Method 3 (Regular Expressions):
    • Strength: Very powerful, can match complex and varied patterns.
    • Weakness: May have performance overhead, can be difficult to read and maintain.
  • Method 4 (itemgetter with map):
    • Strength: Works well with fixed-width fields and structured strings.
    • Weakness: Not intuitive for complex or irregularly structured data.
  • Method 5 (list comprehension and tuple unpacking):
    • Strength: Efficient and concise one-liner for simple cases.
    • Weakness: Can be less readable, not suitable for all cases.

Choosing the right sorting technique largely depends on the structure of your data and your specific needs in terms of performance and code maintainability. Each method has its place and can be the most efficient way to achieve the desired sorting in different scenarios.