How to Remove Control Characters from a String in Python?

Problem Formulation

Given a string s. Create a new string based on s with all control characters such as '\n' and '\t' removed.

What is a Control Character?

A control character, also called non-printing character (NPC), is a character that doesn’t represent a written symbol. Examples are the newline character '\n' and the tabular character '\t'. The inverse set of control characters are the printable characters.

In Unicode, control characters have the code pattern U+000 - 0U+001F, U+007F, and U+0080 - U+009F.

Solution Based on Unicode Category

The unicodedata module provides a function unicodedata.category(c) that returns the general category assigned to the character c as a string. The Unicode categories 'Cc', 'Cf', 'Cs', 'Co', and 'Cn' could be seen as “control characters”, although you could argue that only 'Cc' is a control character. In any case, you can customize our solution below based on your preferences.

Depending on your preferences, you’d obtain the Python one-liner ''.join(c for c in s if unicodedata.category(c)[0] != 'C') removes all control characters in the original string s.

Here’s the final code that removes all control characters from a string:

import unicodedata


def remove_control_characters(s):
    return ''.join(c for c in s if unicodedata.category(c)[0] != 'C')


s = 'hello\nworld\tFinxters!'
print(s)

s = remove_control_characters(s)
print(s)
  • The join() function combines all characters in an iterable using the separator string on which it is called. In our case, we combine them on the empty string ''.
  • The generator expressionc for c in s if unicodedata.category(c)[0] != 'C' goes over all characters that are not in a category starting with the uppercase 'C'.

Alternatively, you can write it using a simple for loop like this:

import unicodedata


def remove_control_characters(s):
    s_new = ''
    for c in s:
        if unicodedata.category(c)[0] != 'C':
            s_new = s_new + c
    return s_new



s = 'hello\nworld\tFinxters!'
print(s)

s = remove_control_characters(s)
print(s)

The output of both variants is:

# First print() statement before removal of control chars
hello
world	Finxters!

# Second print() statement after removal of control chars
helloworldFinxters!

You can see that the second output doesn’t contain any control characters.