5 Best Ways to Extract Percentages from Strings in Python

πŸ’‘ Problem Formulation: In many applications, strings contain percentage values that need to be isolated for analysis or further processing. A common task is extracting these percentage figures efficiently. For instance, given the input string “The battery is at 80% and your disk usage is at only 15%”, the desired output would be a list of percentages: [‘80%’, ‘15%’].

Method 1: Using Regular Expressions with re.findall()

The re.findall() method from Python’s regular expression module re is a powerful tool for pattern matching. It can be used to find all occurrences of a pattern within a string. In this case, the pattern is a percentage figure, typically represented by one or more digits followed by a percent sign.

Here’s an example:

import re

text = "Profit growth was 10% in Q1, though it jumped to 15% in Q2."
percentages = re.findall(r'\d+%', text)

print(percentages)

Output:

['10%', '15%']

This code snippet utilizes the function re.findall() to search the input string for all matches that consist of one or more digits (\d+) followed by a percent sign (%), and returns them as a list.

Method 2: Using Regular Expressions with re.finditer()

While re.findall() is straightforward, re.finditer() provides an iterator yielding match objects that contain detailed information about each match. This can be useful if additional information about the match (such as its position in the string) is needed.

Here’s an example:

import re

text = "Progress was slow at 5%, but sped up to 20% later."
percentages = [match.group() for match in re.finditer(r'\b\d+%\b', text)]

print(percentages)

Output:

['5%', '20%']

Using re.finditer(), we process each match object from an iterator, applying match.group() to extract the matched string. This approach is slightly more verbose but offers more control and information about each match.

Method 3: Using String Methods with a List Comprehension

This method combines Python’s string methods such as split(), and list comprehensions to identify percentage values within a string. It’s a more manual approach but does not require importing any additional modules.

Here’s an example:

text = "We're at 45% completion, aiming for 100%!"
words = text.split()
percentages = [word for word in words if word.endswith('%')]

print(percentages)

Output:

['45%', '100%']

This code splits the string into a list of words and then uses a list comprehension to select only the words that end with the percent symbol, effectively extracting percentage values.

Method 4: Using the filter() Function

The built-in filter() function can be used to create an iterator from elements of an iterable for which a function returns True. Here, it’s combined with a lambda function to check for the ‘%’ symbol at the end of the string elements in a list.

Here’s an example:

text = "Down by 3% today, up by 7% the previous day."
words = text.split()
percentages = list(filter(lambda word: word.endswith('%'), words))

print(percentages)

Output:

['3%', '7%']

The filter() function applies the lambda to each element in the list, returning an iterator that only contains elements where the lambda evaluates to Trueβ€”in this case, words that end with ‘%’. The result is then converted back to a list.

Bonus One-Liner: Method 5: Extracting with List Comprehension and re.match()

For concise code lovers, a one-liner list comprehension with a condition that uses re.match() to validate the format can quickly extract percentages. This is compact and efficient but sacrifices some readability.

Here’s an example:

import re

text = "Early results: 90% positive, 10% negative."
percentages = [word for word in text.split() if re.match(r'\b\d+%$', word)]

print(percentages)

Output:

['90%', '10%']

This one-liner uses a list comprehension to check each word (split using split()) and includes it in the result if it matches the regular expression pattern that defines a percentage value at the word boundary.

Summary/Discussion

    Method 1: Regular Expressions with re.findall(). Provides a simple and robust solution. Does not include context or positions. Method 2: Regular Expressions with re.finditer(). Offers detailed match information. More complex and slightly less readable than re.findall(). Method 3: String Methods with List Comprehension. Does not require regex. Limited to simple patterns and might need modifications for complex strings. Method 4: Using Filter() Function. Functional approach. Less concise than list comprehensions. Convert the result to list explicitly. Method 5: One-Liner with List Comprehension and re.match(). Quick and compact. Less readable; best for simple scenarios where clarity is not paramount.