Summary: The simplest way to split a string and remove the newline characters is to use a list comprehension with a if condition that eliminates the newline strings.
text = '\n-hello\n-Finxter' words = text.split('-') # Method 1 res = [x.strip('\n') for x in words if x!='\n'] print(res) # Method 2 li = list(map(str.strip, words)) res = list(filter(bool, li)) print(res) # Method 3 import re words = re.findall('([^-\s]+)', text) print(words) # ['hello', 'Finxter']
Problem: Say you use the split function to split a string on all occurrences of a certain pattern. If the pattern appears at the beginning, in between, or at the end of the string along with a newline character, the resulting split list will contain newline strings along with the required substrings. How to get rid of the newline character strings automatically?
text = '\n\tabc\n\txyz\n\tlmn\n' words = text.split('\t') # ['\n', 'abc\n', 'xyz\n', 'lmn\n']
Note the empty strings in the resulting list.
['abc', 'xyz', 'lmn']
Method 1: Use a List Comprehension
The trivial solution to this problem is to remove all newline strings from the resulting list using list comprehension with a condition such as
[x.strip('\n') for x in words if x!='\n'] to filter out the newline strings. To be specific, the strip function in the expression allows you to get rid of the newline characters from the items, while the if condition allows you to eliminate any independently occurring newline character.
text = '\n\tabc\n\txyz\n\tlmn\n' words = text.split('\t') res = [x.strip('\n') for x in words if x!='\n'] print(res) # ['abc', 'xyz', 'lmn']
Method 2: Use a map and filter
map()function transforms one or more iterables into a new one by applying a “transformator function” to the i-th elements of each iterable. The arguments are the transformator function object and one or more iterables. If you pass n iterables as arguments, the transformator function must be an n-ary function taking n input arguments. The return value is an iterable map object of transformed, and possibly aggregated, elements.
- Python’s built-in
filter()function is used to filter out elements that pass a filtering condition. It takes two arguments:
functionassigns a Boolean value to each element in the
iterableto check whether the element will pass the filter or not. It returns an iterator with the elements that pass the filtering condition.
Approach: An alternative solution is to remove all newline strings from the resulting list using
map() to first get rid of the newline characters attached to each item of the returned list and then using the
filter() function such as
filter(bool, words) to filter out any empty string
'' and other elements that evaluate to
False such as
text = '\n\tabc\n\txyz\n\tlmn\n' words = text.split('\t') li = list(map(str.strip, words)) res = list(filter(bool, li)) print(res) # ['abc', 'xyz', 'lmn']
Method 3: Use re.findall() Instead
A simple and Pythonic solution is to use
re.findall(pattern, string) with the inverse pattern used for splitting the list. If pattern A is used as a split pattern, everything that does not match pattern A can be used in the
re.findall() function to essentially retrieve the split list.
Here’s the example that uses a negative character class
[^\s]+ to find all characters that do not match the split pattern:
import re text = '\n\tabc\n\txyz\n\tlmn\n' words = re.findall('([^\s]+)', text) print(words) # ['abc', 'xyz', 'lmn']
re.findall(pattern, string) method scans
string from left to right, searching for all non-overlapping matches of the
pattern. It returns a list of strings in the matching order when scanning the string from left to right.
🌎Related Read: Python re.findall() – Everything You Need to Know
Exercise: Split String and Remove Empty Strings
Problem: Say you have been given a string that has been split by the split method on all occurrences of a given pattern. The pattern appears at the end and beginning of the string. How to get rid of the empty strings automatically?
s = '_hello_world_' words = s.split('_') print(words) # ['', 'hello', 'world', '']
Note the empty strings in the resulting list.
import re s = '_hello_world_' words = s.split('_') # Method 1: Using List Comprehension print([x for x in words if x!='']) # Method 2: Using filter print(list(filter(bool, words))) # Method 3: Using re.findall print(re.findall('([^_\s]+)', s))
Thus, we come to the end of this tutorial. We have learned how to eliminate newline characters and empty strings from a list in Python in this article. I hope it helped you and answered all your queries. Please subscribe and stay tuned for more interesting reads.