5 Best Ways to Sort an Array by Increasing Frequency of Elements in Python

Rate this post

πŸ’‘ Problem Formulation: In Python, sorting an array by the increasing frequency of its elements presents a unique challenge. We must account for both the value of the elements and how often they occur. Given an input such as [4, 6, 2, 2, 6, 4, 4, 4], the desired output would be [6, 6, 2, 2, 4, 4, 4, 4], with the elements sorted according to their increasing frequency.

Method 1: Using Collections Counter and Sort

This method employs the Collections.Counter class to count the frequency of elements and a custom sort function. The elements are sorted primarily by frequency and secondarily by their value to ensure that those with the same frequency are sorted in ascending order.

Here’s an example:

from collections import Counter

def sort_by_frequency(arr):
    count = Counter(arr)
    return sorted(arr, key=lambda x: (count[x], x))

# Example usage
arr = [4, 6, 2, 2, 6, 4, 4, 4]
print(sort_by_frequency(arr))

Output:

[6, 6, 2, 2, 4, 4, 4, 4]

The code defines a function that takes an array and returns it sorted by frequency. The sorting is done by using a lambda function as the key, which looks up the count of each element for the primary sort and uses the element value for the secondary sort.

Method 2: Using DefaultDict

The collections.defaultdict is utilized to store the frequency of elements. A custom sorting function is then applied that leverages these counts. It’s similar to Method 1 but offers an alternative way to create the frequency dictionary.

Here’s an example:

from collections import defaultdict

def sort_by_frequency_defaultdict(arr):
    freq_dict = defaultdict(int)
    for item in arr:
        freq_dict[item] += 1
    return sorted(arr, key=lambda x: (freq_dict[x], x))

# Example usage
arr = [4, 6, 2, 2, 6, 4, 4, 4]
print(sort_by_frequency_defaultdict(arr))

Output:

[6, 6, 2, 2, 4, 4, 4, 4]

This snippet creates a defaultdict for storing frequencies and then sorts the array similarly to the first method: by frequency and then by element value.

Method 3: Using Heapq with Frequency Tuple

The heapq library can be used to sort based on multiple criteria. In this method, we first create a frequency map and then turn each element into a tuple of (frequency, element), upon which the heap sorting is performed.

Here’s an example:

import heapq
from collections import Counter

def sort_by_frequency_heapq(arr):
    count = Counter(arr)
    freq_elements = [(freq, num) for num, freq in count.items()]
    heapq.heapify(freq_elements)
    result = []
    while freq_elements:
        freq, num = heapq.heappop(freq_elements)
        result.extend([num]*freq)
    return result

# Example usage
arr = [4, 6, 2, 2, 6, 4, 4, 4]
print(sort_by_frequency_heapq(arr))

Output:

[2, 2, 6, 6, 4, 4, 4, 4]

The code constructs a heap of tuples representing frequency and value, and then successively pops from the heap to build the result list, ensuring that it is sorted by increasing frequency.

Method 4: Using Lambda Function and List Comprehension

This technique leverages a lambda function within a list comprehension for sorting. The code is concise and Pythonic, using no external libraries.

Here’s an example:

def sort_by_frequency_lambda(arr):
    return sorted(arr, key=lambda x: (arr.count(x), x))

# Example usage
arr = [4, 6, 2, 2, 6, 4, 4, 4]
print(sort_by_frequency_lambda(arr))

Output:

[6, 6, 2, 2, 4, 4, 4, 4]

The given code uses list comprehension to apply a sorting key that counts the frequency of each element inline. Although concise, this is less efficient for larger lists due to the repeated counting of elements.

Bonus One-Liner Method 5: Using Sorted with a Sorted Key Function

This one-liner approach is a much more Pythonic way of solving the problem using a compound key in the sorted function.

Here’s an example:

arr = [4, 6, 2, 2, 6, 4, 4, 4]
print(sorted(arr, key=lambda x: (arr.count(x), x)))

Output:

[6, 6, 2, 2, 4, 4, 4, 4]

The one-liner code simply applies Python’s sorted function with a compound key, combining the count and value of the elements as the sorting key. This is elegant but suffers from the same inefficiency as Method 4 for large arrays.

Summary/Discussion

  • Method 1: Collections Counter and Sort. Efficient and easy to understand. Handles large lists well. However, requires the import of the Counter class.
  • Method 2: DefaultDict. Similar to Method 1 with an alternative way to create the frequency dictionary. Equally efficient.
  • Method 3: Heapq with Frequency Tuple. Provides an interesting use of heap sorting. Good for understanding heap operations. Not as intuitive or direct as Counter methods.
  • Method 4: Lambda Function and List Comprehension. Very Pythonic and easy to write but inefficient for large data sets due to the repeated counting operation.
  • Method 5: One-Liner. Very concise and readable. Shares the same inefficiency with large arrays as Method 4.