Python | Split String and Remove newline

1/5 - (1 vote)

Summary: The simplest way to split a string and remove the newline characters is to use a list comprehension with a if condition that eliminates the newline strings.

Minimal Example

text = '\n-hello\n-Finxter'
words = text.split('-')

# Method 1
res = [x.strip('\n') for x in words if x!='\n']
print(res)

# Method 2
li = list(map(str.strip, words))
res = list(filter(bool, li))
print(res)

# Method 3
import re
words = re.findall('([^-\s]+)', text)
print(words)

# ['hello', 'Finxter']

Problem Formulation

Problem: Say you use the split function to split a string on all occurrences of a certain pattern. If the pattern appears at the beginning, in between, or at the end of the string along with a newline character, the resulting split list will contain newline strings along with the required substrings. How to get rid of the newline character strings automatically?

Example

text = '\n\tabc\n\txyz\n\tlmn\n'
words = text.split('\t')

# ['\n', 'abc\n', 'xyz\n', 'lmn\n']

Note the empty strings in the resulting list.

Expected Output:

['abc', 'xyz', 'lmn']

Method 1: Use a List Comprehension

The trivial solution to this problem is to remove all newline strings from the resulting list using list comprehension with a condition such as [x.strip('\n') for x in words if x!='\n'] to filter out the newline strings. To be specific, the strip function in the expression allows you to get rid of the newline characters from the items, while the if condition allows you to eliminate any independently occurring newline character.

Code:

text = '\n\tabc\n\txyz\n\tlmn\n'
words = text.split('\t')
res = [x.strip('\n') for x in words if x!='\n']
print(res)

# ['abc', 'xyz', 'lmn']

Method 2: Use a map and filter

Prerequisite

  • The map() function transforms one or more iterables into a new one by applying a “transformator function” to the i-th elements of each iterable. The arguments are the transformator function object and one or more iterables. If you pass n iterables as arguments, the transformator function must be an n-ary function taking n input arguments. The return value is an iterable map object of transformed, and possibly aggregated, elements.
  • Python’s built-in filter() function is used to filter out elements that pass a filtering condition. It takes two arguments: function and iterable. The function assigns a Boolean value to each element in the iterable to check whether the element will pass the filter or not. It returns an iterator with the elements that pass the filtering condition.

🌎Related Read:
(i) Python map()

(ii) Python filter()

Approach: An alternative solution is to remove all newline strings from the resulting list using map() to first get rid of the newline characters attached to each item of the returned list and then using the filter() function such as filter(bool, words) to filter out any empty string '' and other elements that evaluate to False such as None.

text = '\n\tabc\n\txyz\n\tlmn\n'
words = text.split('\t')
li = list(map(str.strip, words))
res = list(filter(bool, li))
print(res)

# ['abc', 'xyz', 'lmn']

Method 3: Use re.findall() Instead

A simple and Pythonic solution is to use re.findall(pattern, string) with the inverse pattern used for splitting the list. If pattern A is used as a split pattern, everything that does not match pattern A can be used in the re.findall() function to essentially retrieve the split list.

Here’s the example that uses a negative character class [^\s]+ to find all characters that do not match the split pattern:

import re

text = '\n\tabc\n\txyz\n\tlmn\n'
words = re.findall('([^\s]+)', text)
print(words)

# ['abc', 'xyz', 'lmn']

Note:

The re.findall(pattern, string) method scans string from left to right, searching for all non-overlapping matches of the pattern. It returns a list of strings in the matching order when scanning the string from left to right.

🌎Related Read: Python re.findall() – Everything You Need to Know

Exercise: Split String and Remove Empty Strings

Problem: Say you have been given a string that has been split by the split method on all occurrences of a given pattern. The pattern appears at the end and beginning of the string. How to get rid of the empty strings automatically?

s = '_hello_world_'
words = s.split('_')
print(words)

# ['', 'hello', 'world', '']

Note the empty strings in the resulting list.

Expected Output:

['hello', 'world']

💡 Hint: Python Regex Split Without Empty String

Solution:

import re

s = '_hello_world_'
words = s.split('_')

# Method 1: Using List Comprehension
print([x for x in words if x!=''])

# Method 2: Using filter
print(list(filter(bool, words)))

# Method 3: Using re.findall
print(re.findall('([^_\s]+)', s))

Conclusion

Thus, we come to the end of this tutorial. We have learned how to eliminate newline characters and empty strings from a list in Python in this article. I hope it helped you and answered all your queries. Please subscribe and stay tuned for more interesting reads.