Problem Formulation and Solution Overview
Watts Security has contacted you for assistance. They have been given a flat file containing user account breaches. Upon review, they notice each field includes a newline character (\n
). You have been asked to write a script to clean the data.
Watts has provided you with one (1) fictitious row of the data file to work with.
['592-073-402\n','MableB\n','shei5MeQu\n','9210\n','mableb@acme.org'] |
π¬ Question: How would we remove the newline character from List elements?
We can accomplish this task by one of the following options:
- Method 1: Use List Comprehension and
strip()
- Method 2: Use List Comprehension and slicing
- Method 3: Use List Comprehension and
replace()
- Method 4: Use a Lambda and
strip()
- Bonus: Put the Script to Work
Preparation
Add the following code to the top of each code snippet. This snippet will allow the Bonus code in this article to run error-free.
import pickle
Method 1: Use List Comprehension and strip()
List Comprehension and strip()
is an efficient way to remove special characters from a List element, such as the newline character.
rec = ['592-073-402\n','MableB\n','shei5MeeQu\n','9210\n','mableb@acme.org'] rec = [r.strip() for r in rec] print(rec)
This code loops through each List element to remove the newline character and save the output to the calling variable.
Output
['592-073-402', 'MableB', 'shei5MeeQu', '9210', 'mableb@acme.org'] |
Method 2: Use List Comprehension and Slicing
List comprehension and slicing are a great combination to remove special characters, such as the newline character, from list elements. No additional functions are needed to perform the task!
rec = ['592-073-402\n','MableB\n','shei5MeeQu\n','9210\n','mableb@acme.org'] rec = [r[:-1] for r in rec] print(rec)
βThe Finxter Academy’s favorite method!
This code loops through each List element to remove the newline character and save the output to the calling variable.
Output
['592-073-402', 'MableB', 'shei5MeeQu', '9210', 'mableb@acme.org'] |
Method 3: Use List Comprehension and replace()
List Comprehension and replace()
is another way to remove special characters from a List element, such as the newline character.
rec = ['592-073-402\n','MableB\n','shei5MeeQu\n','9210\n','mableb@acme.org'] rec = [r.replace("\n", "") for r in rec] print(rec)
This code loops through each List element to remove the newline character and save the output to the calling variable.
Output
['592-073-402', 'MableB', 'shei5MeeQu', '9210', 'mableb@acme.org'] |
Method 4: Use Lambda and map()
The map()
function passes a Lambda to strip()
the special characters from the List elements and save them to an object. This object is then converted to a List.
rec = ['592-073-402\n','MableB\n','shei5MeeQu\n','9210\n','mableb@acme.org'] rec = list(map(lambda x:x.strip(), rec)) print(rec)
This code loops through each List element to remove the newline character and save the output to the calling variable.
Output
['592-073-402', 'MableB', 'shei5MeeQu', '9210', 'mableb@acme.org'] |
Putting the Script to Work
After testing the above methods, you decide Method 3 is the best solution for this situation. But you have only verified it works on a single List! Watts Security needs to run this script against thousands of records!
Let’s create our own sample text file: users.txt
.
π‘ Note: To follow along, create a flat file containing the data below. Place this file in the current working directory.
File Contents
592-07-4024\n,rionterly1991\n,shei5MeQu\n,9210\n,mableb@acme.ca |
fixed = [] with open('users.txt') as fp: for line in fp: rec = list(line.split(',')) rec = [r.replace("\\n", "") for r in rec] fixed.append(rec) print(fixed) with open('fixed.pickle', 'wb') as fp: pickle.dump(fixed, fp)
The code reads in users.txt
one line at a time and performs the following:
- Splits
line
on the field separator (,) and saves torec
. - Use List Comprehension and
replace()
to loop through each element, remove the newline character, and save the output to the calling variable. - Appends the updated line to
fixed
.
For testing purposes, the output is sent to the terminal.
[['592-07-4024', 'rionterly1991', 'shei5MeQu', '9210', 'mableb@acme.ca\n'], ['283-82-2139', 'chends1964', 'Ui4ohgae', '3989', 'stanleyd@acme.ca\n'], |
Let’s save the updated data to a pickle file.
with open('fixed.pickle', 'wb') as fp: pickle.dump(fixed, fp)
π‘ Note: To learn more about the Pickle file, click here.
Summary
These four (4) methods of removing the newline character from List elements should give you enough information to select the best one for your coding requirements.
Good Luck & Happy Coding!