Python | Split String Every “N” Characters

Summary: One of the easiest ways to split a string after every n character is to use a list comprehension to and slice the string accordingly to extract every n character of the given string and store them in a list. The items of this list represent the required split substrings.
A quick Look at the solution: [(given_string[i:i+n]) for i in range(0, len(given_string), n)]

Minimal Example

given_string = 'abcdef'
n = 3
print([(given_string[i:i+n]) for i in range(0, len(given_string), n)])

# OUTPUT: ['abc', 'def']

Problem Formulation

πŸ“œProblem: Given a string, How will you split the string after every n characters?

Let’s visualize the problem with the help of an example:

Example: In the problem given below, you have to split the string after every 3 characters –

# Input:
s = "12345abcde6789fghi"
n = 3
# Output:
['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Now that you have a clear picture of what the question asks you to do let us dive into the solutions without any further ado.

Method 1: Using a List Comprehension

Prerequisite: In order to understand the solution given below, it is essential to know what a list comprehension does. Simply put, list comprehension in Python is a compact way of creating lists. The simple formula is [expression + context], where the “expression” determines what to do with each list element. And the “context” determines what elements to select. The context can consist of an arbitrary number of for and if statements.

🌎To learn more about list comprehensions, read this article on β€œList Comprehension in Python β€” A Helpful Illustrated Guide”

Approach: Split the given string after every n characters using a list comprehension, such that the list comprehension returns a new list containing n characters of the given string.

Code:

# Given text
s = "12345abcde6789fghi"
n = 3
# Using list comprehension
op = [(s[i:i + n]) for i in range(0, len(s), n)]

# Printing the output
print(op)

# ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Let’s look at what the above code does by dissecting it into the expression and context part.

  • Expression: The expresssion (s[i:i + n]) returns a sliced substring that represents each split string obtained by splitting the given string after every n characters.
  • Context:
    • The context contains a for loop that allows you to iterate through a sequence of values ranging from 0 until the length of the given string such that the values taken into consideration in each iteration are multiples of 3. Here, the range function allows you to determine the sequence over which the loop will iterate. Note that the range function has a step size of “n” which ensures that in every iteration “n” (3 in this case) values are taken into account.
    • For example, in the above code, the context variable “i” will return 0 in the first iteration, then in the second iteration i will return 3, again in the third iteration i will return 6 until the entire length of the string has been traversed.
    • Finally, the expression returns and stores all the split substrings in a new list which can then be displayed as the output.

Multi-line Solution: 

The above code can also be written in a simple form by using a for loop to iterate across individual characters of the given string instead of using a list comprehension. You can store the split strings in a new list with the help of the append() method.

Code:

# Given text
text = "12345abcde6789fghi"
n = 3
# Empty list to store the resultant split strings
op = []
# For loop to cut the given string
for i in range(0, len(text), n):
    op.append(text[i:i + n])

# Printing the output
print(op)

# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Method 2: Using zip_longest from Itertools Module

The itertools module consists of different functions that return iterators. The zip_longest function is one function from the module that makes an iterator that aggregates elements from each of the iterables. The iteration will continue till the longest iterable is not exhausted.

Syntax:

zip_longest(fillvalue = None, *iterables)

The function takes two arguments:

  • The fillvalue parameter is the value that gets filled where the iterables are of uneven length.
  • The iterables parameter denotes the sequence over which we want to iterate.

Code:

# Importing the function from the itertools module
from itertools import zip_longest


# Splitting string using zip_longest
def fun(n, i, fillvalue=None):
    # This code groups as follows: ('abcdefg', 5, 'x') --> abc def g5x"
    args = [iter(i)] * n
    return zip_longest(fillvalue=fillvalue, *args)


# Given text
my_string = "12345abcde6789fghi"
n = 3
# List of the separated string
op_str = [''.join(l) for l in fun(n, my_string, '')]
# Output list initialization
op = []
# Converting the list
for a in op_str:
    op.append(a)
# Printing the output
print(op)

# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Note: The iter() method returns an iterator for the given argument.

Method 3: Using the regex Module

We can split the string with every n character using the re.findall() method from the regex module. The re.findall(pattern, string) method scans the string from left to right, searching for all non-overlapping matches of the pattern. When scanning the string from left to right, it returns a list of strings in the matching order.

🌎Related Tutorial: “Python re.findall() – Everything You Need to Know.”

Code:

# Importing the regex module
import re
# Using re.findall() method
r = re.findall('.{1,3}','12345abcde6789fghi')
print(r)

# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Method 4: Using Textwrap

Python provides you with a built-in function to solve this problem directly without any hassle. The function name is wrap and it is a part of the textwrap module in Python. Simply pass the given string and the number of characters to wrap funcition as the parameters and it will automatically split the string after every n characters.

Here’s a quick look at what the docstring for the wrap function says:

help(wrap)
'''
Help on function wrap in module textwrap:

wrap(text, width=70, **kwargs)
    Wrap a single paragraph of text, returning a list of wrapped lines.

    Reformat the single paragraph in 'text' so it fits in lines of no
    more than 'width' columns, and return a list of wrapped lines.  By
    default, tabs in 'text' are expanded with string.expandtabs(), and
    all other whitespace characters (including newline) are converted to
    space.  See TextWrapper class for available keyword args to customize
    wrapping behaviour.
'''

Okay! Let’s see the wrap function in action:

Code:

from textwrap import wrap
s = "12345abcde6789fghi"
n = 3
print(wrap(s, n))

# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Method 5: Using list+map+join+zip

Another approach to solve the given problem is to use a combination of the list(), map(), join() and zip() functions to split the string accordingly. Follow the code given below that demonstrates how to solve the problem using these functions.

Code:

s = "12345abcde6789fghi"
n = 3
print(list(map(''.join, zip(*[iter(s)]*n))))

# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Method 6: Using sliced

Yet another function that allows you to split the given string after every n characters is the sliced method of the more_itertools module. The sliced function returns a sliced object, hence, you need to convert that to a list containing the split substrings with the help of the list() constructor as shown below.

Code:

from more_itertools import sliced

s = "12345abcde6789fghi"
n = 3
print(list(sliced(s, n)))

# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

As a matter of fact, the more_itertools module offers you many other options to solve the given problem. Here are two more ways that you can use to split the given string after n characters –

import more_itertools as mit
s = "12345abcde6789fghi"
n = 3
print(["".join(c) for c in mit.chunked(s, n)])
# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']
print(["".join(c) for c in mit.windowed(s, n, step=3)])
# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Method 7: Using islice from Itertools

Here’s another (not so Pythonic) solution that uses an itertools function known as islice to solve the given problem.

Code:

from itertools import islice

s = "12345abcde6789fghi"
n = 3


def split_fun(n, iterable):
    i = iter(iterable)
    piece = list(islice(i, n))
    while piece:
        yield ''.join(piece)
        piece = list(islice(i, n))


print(list(split_fun(n, list(s))))

# OUTPUT: ['123', '45a', 'bcd', 'e67', '89f', 'ghi']

Note: This iterator function islice is used to print the values mentioned in its iterable container selectively.

🌎Want to learn about the yield keyword in Python? Read this comprehensive guide: Yield Keyword in Python – A Simple Illustrated Guide

Conclusion

We have successfully solved the given problem using different approaches. I hope you enjoyed this article and it helps you in your to become a better coder. Please subscribe and stay tuned for more interesting articles and solutions.


Check out my new Python book Python One-Liners (Amazon Link).

If you like one-liners, you’ll LOVE the book. It’ll teach you everything there is to know about a single line of Python code. But it’s also an introduction to computer science, data science, machine learning, and algorithms. The universe in a single line of Python!

The book was released in 2020 with the world-class programming book publisher NoStarch Press (San Francisco).

Publisher Link: https://nostarch.com/pythononeliners