Given a string
s. Create a new string based on
s with all control characters such as
What is a Control Character?
A control character, also called non-printing character (NPC), is a character that doesn’t represent a written symbol. Examples are the newline character
'\n' and the tabular character
'\t'. The inverse set of control characters are the printable characters.
In Unicode, control characters have the code pattern
U+000 - 0U+001F,
U+0080 - U+009F.
Solution Based on Unicode Category
unicodedata module provides a function
unicodedata.category(c) that returns the general category assigned to the character
c as a string. The Unicode categories
'Cn' could be seen as “control characters”, although you could argue that only
'Cc' is a control character. In any case, you can customize our solution below based on your preferences.
Depending on your preferences, you’d obtain the Python one-liner
''.join(c for c in s if unicodedata.category(c) != 'C') removes all control characters in the original string
Here’s the final code that removes all control characters from a string:
import unicodedata def remove_control_characters(s): return ''.join(c for c in s if unicodedata.category(c) != 'C') s = 'hello\nworld\tFinxters!' print(s) s = remove_control_characters(s) print(s)
join()function combines all characters in an iterable using the separator string on which it is called. In our case, we combine them on the empty string
- The generator expression
c for c in s if unicodedata.category(c) != 'C'goes over all characters that are not in a category starting with the uppercase
Alternatively, you can write it using a simple for loop like this:
import unicodedata def remove_control_characters(s): s_new = '' for c in s: if unicodedata.category(c) != 'C': s_new = s_new + c return s_new s = 'hello\nworld\tFinxters!' print(s) s = remove_control_characters(s) print(s)
The output of both variants is:
# First print() statement before removal of control chars hello world Finxters! # Second print() statement after removal of control chars helloworldFinxters!
You can see that the second output doesn’t contain any control characters.
While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.
To help students reach higher levels of Python success, he founded the programming education website Finxter.com that has taught exponential skills to millions of coders worldwide. He’s the author of the best-selling programming books Python One-Liners (NoStarch 2020), The Art of Clean Code (NoStarch 2022), and The Book of Dash (NoStarch 2022). Chris also coauthored the Coffee Break Python series of self-published books. He’s a computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.
His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.