Problem Formulation and Solution Overview
To make it more interesting, we have the following running scenario:
Snippet of emps.csv
emp_name | |
0 | Martin (Robert) Simpson |
1 | Howie (George) Smith |
2 | Alice (May) Jones |
3 | Micah (Ray) Hamilton |
4 | Joey (Jon) Howard |
Let’s given them two (2) options to pick from.
Method 1: Use find() and slicing
This example uses the find()
method to locate and remove text inside a string. It returns an integer with the location if found or -1 if not found. Slicing is used to format the results.
For this example, let’s use the first employee’s name to test with.
Martin (Robert) Simpson |
Option 1: Remove Text and Brackets
This option will format the name as: Martin Simpson.
employee = 'Martin (Robert) Simpson'.replace(' ', '') strt_pos = employee.find('(') stop_pos = employee.find(')') option_1 = f'{employee[:strt_pos]} {employee[stop_pos+1:]}' print(option_1)
The first line in the above code declares an employee’s name. Then, the replace()
function is appended to remove any space characters from the string. This gives us the following output.
Martin(Robert)Simpson |
The following two (2) lines locate the first occurrence of both the (
and )
characters. The results save to strt_pos
and stop_pos
, respectively. If output to the terminal, the following displays.
6 |
We can conclude that the (
character was found at position 6, and the )
character was found at position 13.
The following lines remove the text inside the brackets (as well as the brackets). The output is formatted as first name and last name using f-string
and slicing
. The results save to option_1
and output to the terminal.
Martin Simpson |
Let’s put this code to work and update the emps.csv
file created earlier.
import pandas as pd df = pd.read_csv('emps.csv') for i in range(len(df)): strt_pos = df['emp_name'].values[i].find('(') stop_pos = df['emp_name'].values[i].find(')') df['emp_name'].values[i] = f"{df['emp_name'].values[i][:strt_pos:].strip()} {df['emp_name'].values[i][stop_pos+1:].strip()}"
The first line in the above code imports the pandas library.
The following line reads in the emps.csv
file to the DataFrame, df
.
π Recommended Tutorial: How to Read a CSV File in Python?
A for
loop is instantiated to iterate through each row in the DataFrame column emp_name
.
Inside this statement, we take the code written earlier and fine-tune it. Instead of using replace()
, we use strip()
to remove any extra spaces, and slicing is used to format the data. The results of the emps.csv
file display below.
emp_name | |
0 | Martin Simpson |
1 | Howie Smith |
2 | Alice Jones |
3 | Micah Hamilton |
4 | Joey Howard |
π‘The above code snippets will need to be modified to meet your specific requirements.
Option 2: Remove Text Keep Brackets
This option will format the name as Martin () Simpson
. This may be done to accommodate the eventual use of a middle name. For example:
Martin (R.) Simpson
employee = 'Martin (Robert) Simpson'.replace(' ', '') strt_pos = employee.find('(') stop_pos = employee.find(")") option_2 = f'{employee[:strt_pos]} {employee[strt_pos]}{employee[13]} {employee[stop_pos+1:]}' print(option_2)
This code works similarly to Option 1 above. However, in the f-string format, the ()
characters are added between the first and last names.
Martin () Simpson |
Method 2: Use split()
This example uses the split()
function to split a string on the space character, save it as a List and remove the text within the brackets.
Option 1: Remove Text and Brackets
employee = 'Martin (Robert) Simpson'.split() option_1 = f'{employee[0]} {employee[2]}' print(option_1) option_2 = f'{employee[0].strip()} {employee[1][0]}{employee[1][-1]} {employee[2]}' print(option_2)
The first line uses Python’s built-in split()
function to break the string on a specified character. If no argument is passed, a space character is assumed. The results save as a List to employee
.
['Martin', '(Robert)', 'Simpson'] |
The following line configures option_1
. This option removes the text inside the brackets (including the brackets). The result is output to the terminal.
Martin Simpson |
Option 2: Remove Text Keep Brackets
The following line displays option_
2, which removes the data between the brackets, and leaves the brackets as is.
Martin () Simpson |
Method 3: Use re.sub()
This example uses the re-sub()
function from the regex
library to remove the data inside the brackets.
Option 1: Remove Text and Brackets
import re employee = 'Martin (Robert) Simpson'.replace(' ', '') option_1 = re.sub(r'\([^()]*\)', ' ', employee) print(option_1)
The first line in the above code imports the regex
library. This library, allows us to use the re.sub()
function to remove the data inside brackets as well as the brackets themselves.
The following line removes all space characters using
. If output to the terminal, the following displays.replace()
Martin(Robert)Simpson |
Next a regex
pattern is defined to remove the ()
characters as well as the characters within same. Then, a space character is added to separate the first name from the last and the results are saved to option_1
and output to the terminal.
Martin Simpson |
Option 2: Remove Text Keep Brackets
Another option is to remove the data inside the brackets ()
and leave the brackets as is.
import re employee = 'Martin (Robert) Simpson' option_2 = re.sub("\(.*?\)","()", employee) print(option_2)
Martin () Simpson |
π€Can you spot the different between Option1 and Option 2?
Bonus: Update a DataFrame Column
What if Rivers Clothing would like the names formatted as follows: Martin (R.) Simpson
import pandas as pd df = pd.read_csv('emps.csv') for i in range(len(df)): strt_pos = df['emp_name'].values[i].find('(') stop_pos = df['emp_name'].values[i].find(')') df['emp_name'].values[i] = f"{df['emp_name'].values[i][:strt_pos]}{df['emp_name'].values[i][strt_pos]}{df['emp_name'].values[i][strt_pos+1]}.{df['emp_name'].values[i][stop_pos]}{df['emp_name'].values[i][stop_pos+1:]}" print(df)
emp_name | |
0 | Martin (R.) Simpson |
1 | Howie (G.) Smith |
2 | Alice (M.) Jones |
3 | Micah (R.) Hamilton |
4 | Joey (J.) Howard |
Summary
This article has provided four (4) ways to remove data between brackets to select the best fit for your coding requirements.
Good Luck & Happy Coding!