How To Extract Numbers From A String In Python?

4.2/5 - (5 votes)

The easiest way to extract numbers from a Python string s is to use the expression re.findall('\d+', s). For example, re.findall('\d+', 'hi 100 alice 18 old 42') yields the list of strings ['100', '18', '42'] that you can then convert to numbers using int() or float().

There are some tricks and alternatives, so keep reading to learn about them. πŸ‘‡

In particular, you’ll learn about the following methods to extract numbers from a given string in Python:

Problem Formulation

Extracting digits or numbers from a given string might come up in your coding journey quite often. For instance, you may want to extract certain numerical figures from a CSV file, or you need to separate complex digits and figures from given patterns.

Having said that, let us dive into our mission-critical question:

Problem: Given a string. How to extract numbers from the string in Python?

Example: Consider that you have been given a string and you want to extract all the numbers from the string as given in the following example:

Given is the following string:

s = 'Extract 100, 1000 and 10000 from this string'

This is your desired output:

[100, 1000, 10000]

Let us discuss the methods that we can use to extract the numbers from the given string:

Method 1: Using Regex Module

The most efficient approach to solving our problem is to leverage the power of the re module. You can easily use Regular Expressions (RegEx) to check or verify if a given string contains a specified pattern (be it a digit or a special character, or any other pattern).

Thus to solve our problem, we must import the regex module, which is already included in Python’s standard library, and then with the help of the findall() function we can extract the numbers from the given string.

β—ˆ Learn More: re.findall() is an easy-to-use regex function that returns a list containing all matches. To learn more about re.findall() check out our blog tutorial here.

Let us have a look at the following code to understand how we can use the regex module to solve our problem:

import re

sentence = 'Extract 100 , 100.45 and 10000 from this string'
s = [float(s) for s in re.findall(r'-?\d+\.?\d*', sentence)]
print(s)

Output

[100.0, 100.45, 10000.0]

This is a Python code that uses the re module, which provides support for regular expressions in Python, to extract numerical values from a string.

Code explanation: πŸ‘‡

The line s = [float(s) for s in re.findall(r'-?\d+\.?\d*', sentence)] uses the re.findall() function from the re module to search the sentence string for numerical values.

Specifically, it looks for strings of characters that match the regular expression pattern r'-?\d+.?\d*'. This pattern matches an optional minus sign, followed by one or more digits, followed by an optional decimal point, followed by zero or more digits.

The re.findall() function returns a list of all the matching strings.

The list comprehension [float(s) for s in re.findall(r'-?\d+\.?\d*', sentence)] takes the list of matching strings returned by findall and converts each string to a floating-point number using the float() function. This resulting list of floating-point numbers is then assigned to the variable s.

πŸ‘‰ Recommended: Python List Comprehension

Method 2: Split and Append The Numbers To A List using split() and append()

Another workaround for our problem is to split the given string using the split() function and then extract the numbers using the built-in float() method then append the extracted numbers to the list.

Note:

  • split() is a built-in python method which is used to split a string into a list.
  • append() is a built-in method in python that adds an item to the end of a list.

Now that we have the necessary tools to solve our problem based on the above concept let us dive into the code to see how it works:

sentence = 'Extract 100 , 100.45 and 10000 from this string'

s = []
for t in sentence.split():
    try:
        s.append(float(t))
    except ValueError:
        pass
print(s)

Output

[100.0, 100.45, 10000.0]

Method 3: Using isdigit() Function In A List Comprehension

Another approach to solving our problem is to use the isdigit() inbuilt function to extract the digits from the string and then store them in a list using a list comprehension.

The isdigit() function is used to check if a given string contains digits. Thus if it finds a character that is a digit, then it returns True. Otherwise, it returns False.

Let us have a look at the code given below to see how the above concept works:

sentence = 'Extract 100 , 100.45 and 10000 from this string'
s = [int(s) for s in str.split(sentence) if s.isdigit()]
print(s)

Output

[100, 10000]

☒ Alert! This technique is best suited to extract only positive integers. It won’t work for negative integers, floats, or hexadecimal numbers.

Method 4: Using Numbers from String Library

This is a quick hack if you want to avoid spending time typing explicit code to extract numbers from a string.

You can import a library known as nums_from_string and then use it to extract numbers from a given string. It contains several regex rules with comprehensive coverage and can be a very useful tool for NLP researchers.

Since the nums_from_string library is not a part of the standard Python library, you have to install it before use. Use the following command to install this useful library:

pip install nums_from_string

The following program demonstrates the usage of nums_from_string :

import nums_from_string

sentence = 'Extract 100 , 100.45 and 10000 from this string'
print(nums_from_string.get_nums(sentence))

Output

[100.0, 100.45, 10000.0]

Conclusion

Thus from the above discussions, we found that there are numerous ways of extracting a number from a given string in python.

My personal favorite, though, would certainly be the regex module re.

You might argue that using other methods like the isdigit() and split() functions provide simpler and more readable code and faster. However, as mentioned earlier, it does not return numbers that are negative (in reference to Method 2) and also does not work for floats that have no space between them and other characters like '25.50k' (in reference to Method 2).

Furthermore, speed is kind of an irrelevant metric when it comes to log parsing. Now you see why regex is my personal favorite in this list of solutions.

If you are not very supportive of the re library, especially because you find it difficult to get a strong grip on this concept (just like me in the beginning), here’s THE TUTORIAL for you to become a regex master. 🀯🀯

I hope you found this article useful and added some value to your coding journey. Please stay tuned for more interesting stuff in the future.