A RegEx to Match Bitcoin Addresses

What regular expressions can be used to match Bitcoin addresses?

A regular expression for validating Bitcoin addresses must check that the string is 26-35 characters long, start with "1" or "3" or "bc1" consists of uppercase or lowercase alphabetic and numeric characters, and ensure it doesn’t contain ambiguous characters. Not allowed are the uppercase letter "O", uppercase letter "I", lowercase letter "l", and the number "0".

A RegEx to Match Bitcoin Addresses

The following regular expression satisfies these conditions:

([13]|bc1)[A-HJ-NP-Za-km-z1-9]{27,34}

It consists of the following parts:

  • The part ([13]|bc1) checks whether the string prefix matches either character '1', '3', or 'bc1'. Feel free to dive deeper into character sets and the logical OR relation on regular expressions.
  • The part [A-HJ-NP-Za-km-z1-9] matches a single alphanumeric character, except "O", "I", "l", and "0".
  • The part {27, 34} is called a quantifier and it matches 27 to 34 repetitions of the preceding regex.

Here’s a Python code snippet that shows how this regex can be used for testing different strings:

import re

pattern = '([13]|bc1)[A-HJ-NP-Za-km-z1-9]{27,34}'

bitcoin_addresses = [
    '1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2', # True
    '3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy', # True
    'bc1qar1srrr0xfkvy5r643hydnw9re59gtzzwf5mdq' # False ('0' char)
    ]

for addr in bitcoin_addresses:
    print(re.match(pattern, addr))
    

If you print the code, you’d obtain the following output:

# Output:
<re.Match object; span=(0, 34), match='1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2'>
<re.Match object; span=(0, 34), match='3J98t1WpEZ73CNmQviecrnyiWrnqRhWNLy'>
None

The third string in the list, doesn’t match because it contains the character '0' that is not allowed.

Discussion

While the above regular expression will filter out all strings that are surely no match (False Positives), it allows for a lot of strings that would never be allowed as public Bitcoin keys. Why? Because the Bitcoin addresses contain a checksum to prevent people from using invalid addresses.

Here’s a short notice about this checksum issue (highlights by me):

๐Ÿ’กย Several of the characters inside a Bitcoin invoice are used as a checksum so that typographical errors can be automatically found and rejected. The checksum also allows Bitcoin software to confirm that a 33-character (or shorter) invoice is in fact valid and isn’t simply an invoice with a missing character.” (Source: Bitcoin Wiki)

If you want to adopt Bitcoin as your saving instrument, check out our article on [Article] How to Adopt Bitcoin as a Treasury Reserve Asset.

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.