5 Best Ways to Program to Count Minimum Deletions Needed to Make Character Frequencies Unique in Python

March 4, 2024 by Emily Rosemary Collins

💡 Problem Formulation: The problem entails writing a Python program that determines the least number of character deletions required to make each character in a given string appear with a unique frequency. For instance, given the input string “aaabbbbcc”, the output would be 1 because deleting one ‘b’ would make the frequencies unique (3 ‘a’s, 3 ‘b’s, and 2 ‘c’s).

Method 1: Using a Frequency Map and Sorting

This method involves creating a mapping of character frequencies, converting it into a list, and then sorting it. The program then iterates through the sorted list and adjusts frequencies by deleting characters where necessary to ensure uniqueness.

Here’s an example:

def min_deletion_freq_unique(s):
    from collections import Counter
    freqs = list(Counter(s).values())
    freqs.sort(reverse=True)
    deletions = 0
    for i in range(1, len(freqs)):
        if freqs[i] >= freqs[i-1]:
            deletions += freqs[i] - freqs[i-1] + 1
            freqs[i] = freqs[i-1] - 1
    return deletions

# Example usage
print(min_deletion_freq_unique("aaabbbbcc"))

The output of this code snippet is:

In this code, Counter is used to create a histogram of the characters in the input, which is then converted to a list of frequencies. The list is sorted in descending order to compare each frequency with its predecessor and adjust it to be unique, counting the number of deletions made in the process.

Method 2: Priority Queue

This approach uses a priority queue (or heap) to maintain the frequencies of the characters. When two characters have the same frequency, we increase the number of deletions and decrease the frequency until it is unique.

Here’s an example:

import heapq

def min_deletion_using_heap(s):
    freqs = list(Counter(s).values())
    heapq._heapify_max(freqs)
    deletions = 0
    while freqs:
        top = heapq._heappop_max(freqs)
        if freqs and top == freqs[0]:
            if top > 1:
                heapq.heappush(freqs, top - 1)
            deletions += 1
        heapq._heapify_max(freqs)
    return deletions

# Example usage
print(min_deletion_using_heap("aaabbbbcc"))

The output of this code snippet is:

By leveraging a max heap (priority queue), the largest frequency is always at the front. If the next largest is equal, we decrement the frequency by one and account for it as a deletion, re-heapifying each time to maintain the heap property.

Method 3: Greedy Approach with Set

A greedy approach involves iteratively checking whether a frequency is already in a set of seen frequencies and reducing it until it’s unique, counting deletions as they occur. This can be done without pre-sorting.

Here’s an example:

def min_deletions_greedy(s):
    freqs = Counter(s).values()
    seen = set()
    deletions = 0
    for freq in freqs:
        while freq in seen:
            freq -= 1
            deletions += 1
        if freq > 0:
            seen.add(freq)
    return deletions

# Example usage
print(min_deletions_greedy("aaabbbbcc"))

The output of this code snippet is:

Here, we loop through each frequency and continuously decrement the frequency until it’s not in the set of seen frequencies, accumulating deletions. This method effectively reaches a unique state without requiring sorting.

Method 4: Optimizing Deletions with No Sorting

This optimization of the greedy approach does not sort the frequencies, and instead, it directly makes each frequency as low as needed to be unique, utilizing a while loop.

Here’s an example:

def min_deletions_optimized(s):
    freqs = Counter(s).values()
    seen = set()
    deletions = 0
    for freq in freqs:
        while freq > 0 and freq in seen:
            freq -= 1
            deletions += 1
        seen.add(freq)
    return deletions

# Example usage
print(min_deletions_optimized("aaabbbbcc"))

The output of this code snippet is:

This method works similarly to the previous greedy method but continues to reduce the current frequency until it’s either unique or zero, bypassing frequencies that don’t cause collisions. It is efficient since it avoids unnecessary iterations and no sorting is needed.

Bonus One-Liner Method 5: Using a Generator Expression and List Comprehension

A compact one-liner to solve this problem uses a combination of set comprehension to establish uniqueness and a generator expression for counting deletions.

Here’s an example:

def min_deletions_oneliner(s):
    return sum(f - y for y, f in enumerate(sorted((Counter(s).values()),reverse=True)) if f > y)

# Example usage
print(min_deletions_oneliner("aaabbbbcc"))

The output of this code snippet is:

This Pythonic one-liner computes the deletions required to make frequencies unique by enforcing that the nth frequency must be less than n (it sorts the frequencies in descending order and enforces this property using enumeration).

Summary/Discussion

Method 1: Frequency Map and Sorting. Strengths: Intuitive and straightforward. Weaknesses: Involves sorting, which can be inefficient for large datasets.
Method 2: Priority Queue. Strengths: Maintains a dynamic order of uniqueness with efficient updates. Weaknesses: Re-heapification steps can be costly.
Method 3: Greedy Approach with Set. Strengths: Does not require sorting, works intuitively. Weaknesses: Potentially less efficient than other methods since it may require iterating multiple times for each frequency.
Method 4: Optimizing Deletions without Sorting. Strengths: Efficiently reaches a unique state without the need for sorting. Weaknesses: May still iterate multiple times but is generally more efficient than the basic greedy approach.
Bonus Method 5: One-Liner. Strengths: Compact and elegant. Weaknesses: Less readable and can be confusing for those not familiar with Python’s list comprehensions and generator expressions.