Problem Formulation: How to check if all characters of a string are uppercase?
Background: A string is a sequence of characters, and is amongst the most commonly used and popular data types in Python. Strings can be enclosed by either single or double quotes and are ‘immutable’, meaning they can’t be changed once created. There are various methods we can pass over a string, and in this article we are going to focus on one in particular: checking if all characters of a string are uppercase.
Example: To start with, let’s create two sample strings in Python:
example_1 = 'Hello, my name is Rikesh!' example_2 = 'HELLO, MY NAME IS RIK@SH48!!'
As you can see, we have covered all our characters here – uppercase, lowercase, and then some special characters and digits as well.
Method 1: isupper()
This is a string built-in method used for string handling in Python, and returns a boolean value –
True if all characters are uppercase or
False if not.
Let’s pass our examples through this method and see what we get as an output:
>>> example_1.isupper() False
Although our string contains an uppercase ‘H’ and ‘R’, the return is False as not ALL characters are uppercase. Now let’s try with our example_2:
>>> example_2.isupper() True
Even though we have some special characters and digits, the return is
True as all our characters are indeed uppercase. This method returns
True for whitespaces, digits and symbols, only lowercase letters return
Method 2: Regex Match Uppercase Only
re module allows us to search and match our Python string character by character. The way regex defines characters is slightly different, as it uses ASCII instead of Unicode. Whilst this makes no practical difference to our string, it does change the way regex searches as it classes alphabetical characters differently from digits and other special characters.
There are two ways we can use the regex module to check for uppercase characters. Next, we’re exploring the first one.
Once imported, we can use regex to check our string and look for only uppercase matches. In the code below, the
[A-Z] character set limits our match criteria to capitalised (uppercase) alphabetical characters only, in the range of A – Z. The
$ ensures we are searching until the end of the string. As we just want to know if the string is all uppercase or not, we can specify the return as a boolean value:
import re example_1 = 'Hello, my name is Rikesh!' res = bool(re.match(r'[A-Z]+$', example_1)) print(res) # False
This should not come as a surprise as our string clearly contains a mix of upper and lower case characters.
import re res=bool(re.match(r'[A-Z]+$', example_2)) example_2 = ‘HELLO, MY NAME IS RIK@SH48!!’ print(res) # False
Okay, this one is probably not what you were expecting. It certainly isn’t what I was expecting! All our characters are clearly uppercase, so what has happened? Basically, regex has recognised our string contains special characters ( ‘@’, ‘!!’ and ‘,’) as well as digits (48). As these are considered different classes, they are not in the uppercase A-Z range, so it has returned False. Remember, we asked regex to match the string ONLY contains uppercase alphabetical characters.
If we now try the same function on a string containing uppercase alphabetic characters only, without special characters or digits, we get the following result:
import re example_3 = 'HELLO MY NAME IS RIKESH' res = bool(re.match(r'[A-Z]+$', example_3)) print(res) # False
Even this does not work! Unfortunately, regex does not ignore whitespaces by default meaning this function would only be applicable if we were sure our original string contained no special characters, digits or even whitespaces. Adapting the function would I am sure be possible, but seems overly complicated when we have a much more straightforward solution.
Method 3: Regex Any Lowercase
We can use the way regex defines characters to our advantage, by working with lowercase characters instead. The logic being that if there is a lowercase character in our string, not all characters are uppercase. This would render the issue with special characters, digits and whitespace obsolete – assuming of course you did not want these to affect the outcome. For the purposes of this article we will assume our aim is to ensure all characters in our string are uppercase alphabetic characters, rather than lowercase alphabetic characters.
The way we can check if ANY alphabetic character in our string is lowercase with regex is as follows:
import re example_1 = 'Hello, my name is Rikesh!' res=bool(re.match(r'\w*[a-z]\w*', example_1)) print(res) # True
Remember, we are looking for lowercase characters now, so even though we have a mix of upper and lower in this example, the return is
True i.e. the string does contain lowercase so does not only have uppercase characters.
import re example_2 = 'HELLO, MY NAME IS RIK@SH48!!' res=bool(re.match(r'\w*[a-z]\w*', example_2)) print(res) # False
Even though our string contains special characters, digits and whitespaces the return is
False because it contains no lowercase alphabetic characters. All characters are uppercase.
Method 4: ASCII and any()
The string library contains functions specifically for processing Python strings, and we can use this to search our string based on the ASCII character, which we just touched upon.
As we have seen previously with regex, unless we are sure your original string contains no digits, special characters or even whitespaces searching based on whether all characters are uppercase can be problematic. In cases where we can not be sure our string only contains alphabetical characters we can once again use our ‘reverse check’ method – if the string contains any lowercase characters we know not all characters are uppercase. The
string.ascii_lowercase function will help us do this.
We can use the
any() method to check if any characters in the string contain the property we are looking for:
import string example_1 = 'Hello, my name is Rikesh!' res = any(s in string.ascii_lowercase for s in example_1) print(res) # True
As we have a mix of upper and lowercase alphabetic characters the function has returned
True, again remember we are asking if any characters are lowercase.
import string example_2 = 'HELLO, MY NAME IS RIK@SH48!!' res = any(s in string.ascii_lowercase for s in example_2) print(res) # False
All of our alphabetic characters in this example are uppercase, so the function returned
False there are no lowercase characters. The special characters and digits have been ignored.
This method has been left to the end for good reason, as I think it’s the most complicated of them all. The method is based on the fact that all our ASCII characters have a corresponding value, so we can check whether our characters are uppercase based on their corresponding values. For example, the ASCII values for uppercase letters range from 65-90 inclusive and for lowercase range from 97-122 inclusive.
If we were to check if all characters are uppercase we would have the problem we encountered before with special and numeric characters. We could however use the lowercase logic – if there is a lowercase character they can’t all be uppercase. To get the ASCII value of the character we have to use the
Let’s just test it to see:
>>> ord('A') 65 >>> ord('a') 97
So we can now check to see if any of the characters in our string fall within the lowercase range (97-122), just to reiterate if we used the uppercase range it would flag not only lowercase characters but special characters and digits as well.
example_1 = 'Hello, my name is Rikesh!' res=any(ord(s)>=97 and ord(s)<=122 for s in example_1) print(res) # True example_2 = 'HELLO, MY NAME IS RIK@SH48!!' res=any(ord(s)>=97 and ord(s)<=122 for s in example_2) print(res) # False
As we can see from our examples, our
example_1 does contain lowercase alphabetic characters so we got a
True return. Despite the fact, our
example_2 contains special characters and digits we got a
False return as there are no lowercase characters.
The aim of this article was to look at methods for checking if all characters of a string are uppercase. If that is our primary goal, the
isupper() method appears to be the most straightforward primarily because it focuses on alphabetic characters only, and ignores anything else — digits, special characters, and white spaces.
Whilst the other methods can be more targeted their usefulness really depends on how we want to define our ‘characters’ and what we are trying to achieve. On the basis that we want to focus purely on ensuring our alphabetic characters are uppercase rather than lowercase they have limited usefulness and can provide misleading results. As we have seen, we can get around this by adapting our search criteria to focus on identifying lowercase alphabetic characters only. This seems like the opposite of what we are trying to achieve, so we need to make sure we interpret our
False results correctly.
All things considered it is difficult though to find a reason not to use the