How To Eliminate All The Whitespace From A String?

In this article, you’ll learn the ultimate answer to the following question:

How To Eliminate All The Whitespace From A String—on Both Ends, and In-Between Words?

Summary: Use the string methods join(), split(), strip(), rstrip(), lstrip() and or replace()—in specific combinations—to remove any whitespace in a given string. The simplest way to remove all whitespaces in a string is to use the split()function to create a list of non-whitespace words—and then join together the words in the split list.

The Official Python Website offers a brief explanation of these and other string methods for further reference.

Note: All the solutions provided below have been verified using Python 3.8.5

Problem

Given the following string variable:

sentence = '\t\t hello    world \n'

Desired Output

Manipulate it to provide the following output:

>>> 'helloworld'

Background

In Python, similar to many other popular programming languages, strings are simply arrays of bytes which represent unicode characters. Similar to basic data-types such as Booleans, integers and floats, strings are one of the most important data-types in the Python programming language. Python provides a plethora of helper methods such as join(), split(), strip(), rstrip(), lstrip() and replace(), to manipulate string objects.Β The use of such string methods will be explored below, to solve the problem described above.

Method 1: string.split() and string.join()

A concise one-liner method to remove duplicate whitespace in a string, is shown below:

sentence = '\t\t hello    world \n'

Note: If one cuts and pastes the above and gets a syntax error in Python, it is likely because the ' (i.e. tick) unicode character was modified by Google. So one needs to make sure that the proper ' (tick) or " (quote) character is used.

To remove duplicate whitespace characters, use str.split() and str.join() as follows:

β€˜β€™.join(sentence.split())

The code as run in a Python shell looks as follows:

>>> sentence = '\t\t hello    world \n'
>>> ''.join(sentence.split())
'helloworld'

What is going on here?

By default, the str.split() method, without any arguments, will regard and use consecutive runs of whitespace characters as a single separator.Β  If the string has leading or trailing whitespace, the splitting will be done such that there are no empty strings at the start or end of the string object. So the following happens when one uses the str.split() method on the string variable sentence with default arguments (i.e. None or nothing).

>>> sentence.split()
['hello', 'world']

Note how all the whitespace characters got eliminated around the words hello and world.Β  Note also that the words got put into a list iterable. This resulting list iterable is now handed over to the str.join(iterable) method, which will concatenate all the strings inΒ iterable and return a string object. Note that the string object (e.g. the variable s2 or '', below), which the join method operates on, will be used as a separator, to join all the strings in the list iterable.

Consider the following code snippet to tie everything together.

>>> sentence                        # This is the original string.
'\t\t hello    world \n'
>>> s1 = sentence.split()     #s1 is the resulting list iterable from the split method
>>> s1
['hello', 'world']
>>> s2 = ''                              # s2 is the dummy separator (i.e. an empty string)
>>> s2
''
>>> s3 = s2.join(s1)              # s3 is the desired result from joining elements in
>>> s3                                         # the s1 list iterable, using string s2 as a seperator
'helloworld'
>>> 

Next, let’s see if this solution works on a bigger and more elaborate string:

>>> sentence = '''
... ## This is a curious case. Since the step is a -ve number all the indexing
... ## is done from the right side of the list. The start index is beyond the
... ## list, so the last letter '!' is included, the end index is a -ve number
... ## so the counting for the end index begins from the right side of the list.
... ## So the end of the list is the letter to the right of index -5 i.e. 'a'.
... ## Note that end index is excluded, so answer is '!ssa'
... '''
>>> 
>>> sentence
"\n## This is a curious case. Since the step is a -ve number all the indexing\n## is done from the right side of the list. The start index is beyond the\n## list, so the last letter '!' is included, the end index is a -ve number\n## so the counting for the end index begins from the right side of the list.\n## So the end of the list is the letter to the right of index -5 i.e. 'a'.\n## Note that end index is excluded, so answer is '!ssa'\n"
>>> 
>>> s2
''
>>> s3 = s2.join(sentence.split())
>>> 
>>> s3
"##Thisisacuriouscase.Sincethestepisa-venumberalltheindexing##isdonefromtherightsideofthelist.Thestartindexisbeyondthe##list,sothelastletter'!'isincluded,theendindexisa-venumber##sothecountingfortheendindexbeginsfromtherightsideofthelist.##Sotheendofthelististhelettertotherightofindex-5i.e.'a'.##Notethatendindexisexcluded,soansweris'!ssa'"
>>>

We see here again that the solution works perfectly on a longer string too. It got rid of all the white spaces in the string variable sentence. Note here that the string variable sentence is a multi-line string, which is created using the '''...''' (i.e. triple-tick) notation.Β 

The following explanations show other more tedious methods to remove whitespace. They are effective but not as practical as Method 1, for the specific problem on hand. The steps however are generic and may be applied elsewhere, for other substitutions.Β 

Method 2: string.replace()

A more elaborate and tedious way to remove duplicate whitespace in a string, is by using the str.replace(old, new) method, as shown below.

The code as run in a Python shell looks as follows:

>>> sentence = '\t\t hello    world \n'
>>> sentence
'\t\t hello    world \n'
>>> 
>>> s1 = sentence.replace(' ', '')
>>> s1
'\t\thelloworld\n'
>>> s1.replace('\t', '')
'helloworld\n'
>>> s1
'\t\thelloworld\n'
>>> s2 = s1.replace('\t', '')
>>> s2
'helloworld\n'
>>> s3 = s2.replace('\n', '')
>>> s3
'helloworld'
>>>

What is going on here?

The str.replace(old, new) method will replace all occurrences of substring old with the string new and return a modified copy of the original string object. Lets see how this worked in the above code snippet.

In Method 2, the string variable sentence is shaped one step at a time, to achieve the desired result. In the first step, the string " " (i.e. the space character) is eliminated by replacing it with ""(i.e. nothing). Note that the tab (i.e. \t) and the newline (i.e. \n) continue to exist in the string variable s1.

>>> sentence = '\t\t hello    world \n'
>>> sentence
'\t\t hello    world \n'
>>> 
>>> s1 = sentence.replace(' ', '')
>>> s1
'\t\thelloworld\n'

In the next step, the "\t" (i.e. the tab character) is eliminated by replacing it with "" (i.e. Nothing, again). Note that the newline (i.e. \n) still continues to exist in the string variable s2.

>>> s1
'\t\thelloworld\n'
>>> s2 = s1.replace('\t', '')
>>> s2
'helloworld\n'

In the last step, the "\n" (i.e. the newline character) is eliminated by replacing it with ""(i.e. Nothing, yet again). This last step now yields the desired result in string variable s3.

>>> s2
'helloworld\n'
>>> s3 = s2.replace('\n', '')
>>> s3
'helloworld'
>>>

Next, let’s see if this solution works on a bigger and more elaborate string:

>>> sentence = '''
... ## This is a curious case. Since the step is a -ve number all the indexing
... ## is done from the right side of the list. The start index is beyond the
... ## list, so the last letter '!' is included, the end index is a -ve number
... ## so the counting for the end index begins from the right side of the list.
... ## So the end of the list is the letter to the right of index -5 i.e. 'a'.
... ## Note that end index is excluded, so answer is '!ssa'
... '''
>>> sentence
"\n## This is a curious case. Since the step is a -ve number all the indexing\n## is done from the right side of the list. The start index is beyond the\n## list, so the last letter '!' is included, the end index is a -ve number\n## so the counting for the end index begins from the right side of the list.\n## So the end of the list is the letter to the right of index -5 i.e. 'a'.\n## Note that end index is excluded, so answer is '!ssa'\n"
>>> 
>>> s1 = sentence.replace(' ', '')
>>> s1
"\n##Thisisacuriouscase.Sincethestepisa-venumberalltheindexing\n##isdonefromtherightsideofthelist.Thestartindexisbeyondthe\n##list,sothelastletter'!'isincluded,theendindexisa-venumber\n##sothecountingfortheendindexbeginsfromtherightsideofthelist.\n##Sotheendofthelististhelettertotherightofindex-5i.e.'a'.\n##Notethatendindexisexcluded,soansweris'!ssa'\n"
>>> s2 = s1.replace('\t', '')
>>> s2
"\n##Thisisacuriouscase.Sincethestepisa-venumberalltheindexing\n##isdonefromtherightsideofthelist.Thestartindexisbeyondthe\n##list,sothelastletter'!'isincluded,theendindexisa-venumber\n##sothecountingfortheendindexbeginsfromtherightsideofthelist.\n##Sotheendofthelististhelettertotherightofindex-5i.e.'a'.\n##Notethatendindexisexcluded,soansweris'!ssa'\n"
>>> s3 = s2.replace('\n', '')
>>> s3
"##Thisisacuriouscase.Sincethestepisa-venumberalltheindexing##isdonefromtherightsideofthelist.Thestartindexisbeyondthe##list,sothelastletter'!'isincluded,theendindexisa-venumber##sothecountingfortheendindexbeginsfromtherightsideofthelist.##Sotheendofthelististhelettertotherightofindex-5i.e.'a'.##Notethatendindexisexcluded,soansweris'!ssa'"
>>> 

We see here again that even though the solution is tedious, compared to Method 1, it continues to work perfectly on a longer string too. It got rid of all the white spaces in the string variable sentence.Β 

Method 3: replace(), lstrip(), and rstrip()

This final method is purely educational. It shows yet another elaborate and tedious way to remove duplicate whitespace in a string by using the str.replace(old, new), the str.lstrip([chars]) and the str.rstrip([chars]) methods, as shown below.

The code as run in a Python shell looks as follows:

>>> sentence = '\t\t hello    world \n'
>>> sentence
'\t\t hello    world \n'
>>> 
>>> s1 = sentence.replace(" ", "")
>>> s1
'\t\thelloworld\n'
>>>
>>> s2 = s1.lstrip()
>>> s2
'Helloworld\n'
>>>
>>> s3 = s2.rstrip()
>>> s3
'helloworld'
>>> 

What is going on here?

The str.lstrip([chars]) method returns a modified copy of the string object str with leading characters removed. The removed characters are specified in the set represented by the string chars. Whitespace is removed, by default, if chars is not specified or is None.

Similarly, The str.rstrip([chars]) method returns a modified copy of the string object str with trailing characters removed. The removed characters are specified in the set represented by the string chars. Whitespace is removed, by default, if chars is not specified or is None.

In Method 3 the string variable sentence is shaped one step at a time to achieve the desired result (i.e. similar to Method 2). In the first step, the string " " (i.e. the space character) is eliminated by replacing it with ""(i.e. nothing). Note that the tab (i.e. \t) and the newline (i.e. \n) continue to exist in the string variable s1.

>>> sentence = '\t\t hello    world \n'
>>> sentence
'\t\t hello    world \n'
>>> 
>>> s1 = sentence.replace(" ", "")
>>> s1
'\t\thelloworld\n'

In the next step, the "\t" (i.e. the tab character) is eliminated by prefix stripping it (i.e. str.lstrip()). Note that the newline (i.e. \n) continues to exist in the string variable s2.

>>> s1
'\t\thelloworld\n'
>>>
>>> s2 = s1.lstrip()
>>> s2
'Helloworld\n'

In the last step, the "\n" (i.e. the newline character) is eliminated by suffix stripping it (i.e. str.rstrip()). This last step now yields the desired result in string variable s3.

>>> s2
'Helloworld\n'
>>>
>>> s3 = s2.rstrip()
>>> s3
'helloworld'

Finxter Academy

This blog was brought to you by Girish, a student of Finxter Academy. You can find his Upwork profile here.

Reference

All research for this blog article was done using Python Documents and the shared knowledge-base of the Stack Overlfow and the Finxter Academy Communities.