How to Remove Text Within Parentheses in a Python String?

Problem Formulation and Solution Overview

This article will show you how to remove text within parentheses in Python.

To make it more interesting, we have the following running scenario:

Rivers Clothing has a CSV file containing all their employees. The format is currently first name (middle name) and last name (for example, Martin (Robert) Simpson). However, they would like the design changed.

Snippet of emps.csv

emp_name
0Martin (Robert) Simpson
1Howie (George) Smith
2Alice (May) Jones
3Micah (Ray) Hamilton
4Joey (Jon) Howard

Let’s given them two (2) options to pick from.


πŸ’¬ Question: How would we write code to remove text within parentheses?

We can accomplish this task by one of the following options:


Method 1: Use find() and slicing

This example uses the find() method to locate and remove text inside a string. It returns an integer with the location if found or -1 if not found. Slicing is used to format the results.

For this example, let’s use the first employee’s name to test with.

Martin (Robert) Simpson

Option 1: Remove Text and Brackets

This option will format the name as: Martin Simpson.

employee = 'Martin (Robert) Simpson'.replace(' ', '')

strt_pos = employee.find('(')
stop_pos = employee.find(')')

option_1 = f'{employee[:strt_pos]} {employee[stop_pos+1:]}'
print(option_1)

The first line in the above code declares an employee’s name. Then, the replace() function is appended to remove any space characters from the string. This gives us the following output.

Martin(Robert)Simpson

The following two (2) lines locate the first occurrence of both the ( and ) characters. The results save to strt_pos and stop_pos, respectively. If output to the terminal, the following displays.

6
13

We can conclude that the ( character was found at position 6, and the ) character was found at position 13.

The following lines remove the text inside the brackets (as well as the brackets). The output is formatted as first name and last name using f-string and slicing. The results save to option_1 and output to the terminal.

Martin Simpson

Let’s put this code to work and update the emps.csv file created earlier.

import pandas as pd 

df = pd.read_csv('emps.csv')

for i in range(len(df)):
    strt_pos = df['emp_name'].values[i].find('(')
    stop_pos = df['emp_name'].values[i].find(')')
    df['emp_name'].values[i] = f"{df['emp_name'].values[i][:strt_pos:].strip()} {df['emp_name'].values[i][stop_pos+1:].strip()}"

The first line in the above code imports the pandas library.

The following line reads in the emps.csv file to the DataFrame, df.

πŸ‘‰ Recommended Tutorial: How to Read a CSV File in Python?

A for loop is instantiated to iterate through each row in the DataFrame column emp_name.

Inside this statement, we take the code written earlier and fine-tune it. Instead of using replace(), we use strip() to remove any extra spaces, and slicing is used to format the data. The results of the emps.csv file display below.

emp_name
0Martin Simpson
1Howie Smith
2Alice Jones
3Micah Hamilton
4Joey Howard

πŸ’‘The above code snippets will need to be modified to meet your specific requirements.

Option 2: Remove Text Keep Brackets

This option will format the name as Martin () Simpson. This may be done to accommodate the eventual use of a middle name. For example:
Martin (R.) Simpson

employee = 'Martin (Robert) Simpson'.replace(' ', '')

strt_pos = employee.find('(') 
stop_pos = employee.find(")") 

option_2 = f'{employee[:strt_pos]} {employee[strt_pos]}{employee[13]} {employee[stop_pos+1:]}'
print(option_2)

This code works similarly to Option 1 above. However, in the f-string format, the () characters are added between the first and last names.

Martin () Simpson

Method 2: Use split()

This example uses the split() function to split a string on the space character, save it as a List and remove the text within the brackets.

Option 1: Remove Text and Brackets

employee = 'Martin (Robert) Simpson'.split()

option_1 = f'{employee[0]} {employee[2]}'
print(option_1)

option_2 = f'{employee[0].strip()} {employee[1][0]}{employee[1][-1]} {employee[2]}'
print(option_2)

The first line uses Python’s built-in split() function to break the string on a specified character. If no argument is passed, a space character is assumed. The results save as a List to employee.

['Martin', '(Robert)', 'Simpson']

The following line configures option_1. This option removes the text inside the brackets (including the brackets). The result is output to the terminal.

Martin Simpson

Option 2: Remove Text Keep Brackets

The following line displays option_2, which removes the data between the brackets, and leaves the brackets as is.

Martin () Simpson

Method 3: Use re.sub()

This example uses the re-sub() function from the regex library to remove the data inside the brackets.

Option 1: Remove Text and Brackets

import re

employee = 'Martin (Robert) Simpson'.replace(' ', '')
option_1 = re.sub(r'\([^()]*\)', ' ', employee)
print(option_1)

The first line in the above code imports the regex library. This library, allows us to use the re.sub() function to remove the data inside brackets as well as the brackets themselves.

The following line removes all space characters using replace(). If output to the terminal, the following displays.

Martin(Robert)Simpson

Next a regex pattern is defined to remove the () characters as well as the characters within same. Then, a space character is added to separate the first name from the last and the results are saved to option_1 and output to the terminal.

Martin Simpson

Option 2: Remove Text Keep Brackets

Another option is to remove the data inside the brackets () and leave the brackets as is.

import re

employee = 'Martin (Robert) Simpson'
option_2 = re.sub("\(.*?\)","()", employee)
print(option_2)
Martin () Simpson

πŸ€”Can you spot the different between Option1 and Option 2?

https://youtu.be/3MtrUf81k6c

Bonus: Update a DataFrame Column

What if Rivers Clothing would like the names formatted as follows: Martin (R.) Simpson

import pandas as pd 

df = pd.read_csv('emps.csv')

for i in range(len(df)):
    strt_pos = df['emp_name'].values[i].find('(')
    stop_pos = df['emp_name'].values[i].find(')')
    df['emp_name'].values[i] = f"{df['emp_name'].values[i][:strt_pos]}{df['emp_name'].values[i][strt_pos]}{df['emp_name'].values[i][strt_pos+1]}.{df['emp_name'].values[i][stop_pos]}{df['emp_name'].values[i][stop_pos+1:]}"
print(df)
emp_name
0Martin (R.) Simpson
1Howie (G.) Smith
2Alice (M.) Jones
3Micah (R.) Hamilton
4Joey (J.) Howard

Summary

This article has provided four (4) ways to remove data between brackets to select the best fit for your coding requirements.

Good Luck & Happy Coding!


Programming Humor – Python

“I wrote 20 short programs in Python yesterday. It was wonderful. Perl, I’m leaving you.”xkcd