How To Cut A String In Python?

Problem: Given a string; how to split/cut the string and extract the required characters?

In this article, we will be discussing some interesting scenarios which allow us to split or cut a string and extract the necessary portion of the string that we need. Let us dive into each example/scenario and have a look at how we can successfully cut the string based on the requirement in each scenario.

āœØ Scenario 1

Problem Formulation

Given the following string:

s = 'http://www.example.com/?s=something&two=20'

Requirement:

You have to split the string such that whatever is after & in the given string (‘url’) is neglected, i.e., the output string should be as follows:

s = 'http://www.example.com/?s=something

ā—ˆ Method 1: Using split() Method

split() is a built-in method in Python which is used to cut/split a given string based on a given separator. You can specify any separator according to your requirement, however, by default the separator is a whitespace.

Syntax:

  • separator is an optional parameter which is used to specify the separator (delimiters). By default it is any whitespace character.
  • maxsplit is an optional parameter which allows us to specify the maximum number of splits that we want to perform. By default its value is -1 that is “all occurences”.

The Solution : You can use the split() method and specify the separator based on which you want to cut the string and then extract the section of the string from the list generated by the split() function. Let us have a look at how this can be implemented in the following piece of code:

s = 'http://www.example.com/?s=something&two=20'
print(s.split('&')[0])

Output:

http://www.example.com/?s=something

ā—ˆ Method 2: Using rfind() Method And Slicing The String

We need to extract the portion of the string which is prior to the & character. Therefore, a simple work-around for our problem is to find the index of the & character in the string with the help of the rfind() function and then slice the string using the index.

Note: The rfind() method is used to find the last occurrence of a specified value.

The Solution

s = 'http://www.example.com/?s=something&two=20'
print(s[:s.rfind('&')])

Output:

http://www.example.com/?s=something

ā—ˆ Method 3: Using index() Method

Another simple approach to cut the given string is to slice it using the index method. TheĀ index(value)Ā method returns the index of theĀ valueĀ argument . Let us have a look at the procedure to implement the index(value) method and spit our string.

s = 'http://www.example.com/?s=something&two=20'
print(s[:s.index('&')])

Output:

http://www.example.com/?s=something

In this scenario, the task of cutting the string was quite simple since there was a single delimiter and all we had to do was separate the string based on the delimiter & . What if you want to extract the string by eliminating more than a single character or sequence. That brings us to the next scenario!

āœØ Scenario 2

Problem Formulation

Given a string consisting of numbers, letters and special characters; how to split the string whenever a special character or a number occurs?

Example

string = "Finxter$#! Academy Python111Freelancing"

Desired Output

['Finxter', 'Academy', 'Python', 'Freelancing']

ā—ˆ Method 1: Using re.split

TheĀ re.split(pattern, string)Ā method matches all occurrences of theĀ patternĀ in theĀ stringĀ and divides the string along the matches resulting in a list of stringsĀ betweenĀ the matches. For example,Ā re.split('a', 'bbabbbab')Ā results in the list of stringsĀ ['bb', 'bbb', 'b'].

The Solution

import re

s = "Finxter$#! Academy Python111Freelancing"
res = re.split('\d+|\W+', s)
print(res)

Output:

['Finxter', 'Academy', 'Python', 'Freelancing']

Note:

  • TheĀ \dĀ special characterĀ matches any digit between 0 and 9.
  • \WĀ is a special sequence that returns a match where it does not find any word characters in the given string. Here it is used to find the delimiters while splitting the string.

In case you want to store the separators as well, please have a look at this tutorial which will answer you question in details.

ā—ˆ Method 2: Using itertools.groupby()

  • TheĀ itertools.groupby(iterable, key=None)Ā function creates an iterator that returns tuplesĀ (key, group-iterator)Ā grouped by each value ofĀ key. We use theĀ str.isalpha()Ā function as the key function.
  • TheĀ str.isalpha()Ā function returnsĀ TrueĀ if the string only consists of alphabetic characters.

The Solution

from itertools import groupby
s = "Finxter$#! Academy Python111Freelancing"
r=[]
res = [''.join(g) for _, g in groupby(s, str.isalpha)]
for item in res:
    if item.isalpha():
        r.append(item)
print(r)

Output:

['Finxter', 'Academy', 'Python', 'Freelancing']

āœØ Scenario 3

If you are specifically dealing with URLs then you would want to use built-in libraries that deal with URLs.

Example: You want to remove two=20 from the query string given below:

s='http://www.domain.com/?s=some&two=20'

Desired Output:

http://www.domain.com/?s=some

Solution

  • Step 1: parse the entire URL.
  • Step 2: Extract the query string.
  • Step 3: Convert it to a Python dictionary.
  • Step 4: Remove the key ‘two’ from the dictionary.
  • Step 5: Put it back into the query string.
  • Step 6: Stich the URL back together.

Let us have a look at the following program which demonstrates the exact process as explained in the above steps. (Please follow the comments in the code!)

import urllib.parse

# Step 1: parse the entire URL
parse_result = urllib.parse.urlsplit("http://www.example.com/?s=something&two=20")
# Step 2: Extract the query string
query_s = parse_result.query
# Step 3: Convert it to a Python dictionary
query_d = urllib.parse.parse_qs(parse_result.query)
# Step 4: remove the ['two'] key from the dictionary
del query_d['two']
# Step 5: Put it back to the query string
new_query_s = urllib.parse.urlencode(query_d, True)
# Step 6: Stitch the URL back together
result = urllib.parse.urlunsplit((
    parse_result.scheme, parse_result.netloc,
    parse_result.path, new_query_s, parse_result.fragment))
print(result)

Output:

http://www.example.com/?s=something

The advantage of using the above procedure is that you have more control over the URL. For example, if you only wanted to remove the two argument from the query string even if it occurred earlier in the query string ("two=20&s=something"), this would still be functional and work perfectly fine.

Conclusion

In this article, you have learned some important concepts regarding splitting a string in Python. Select the procedure that suits your requirements and implement them accordingly as demonstrated in this article with the help of numerous scenarios. This brings us to the end of this article; please stay tuned and subscribe for more solutions and interesting discussions.

Where to Go From Here?

Enough theory. Letā€™s get some practice!

Coders get paid six figures and more because they can solve problems more effectively using machine intelligence and automation.

To become more successful in coding, solve more real problems for real people. Thatā€™s how you polish the skills you really need in practice. After all, whatā€™s the use of learning theory that nobody ever needs?

You build high-value coding skills by working on practical coding projects!

Do you want to stop learning with toy projects and focus on practical code projects that earn you money and solve real problems for people?

šŸš€ If your answer is YES!, consider becoming a Python freelance developer! Itā€™s the best way of approaching the task of improving your Python skillsā€”even if you are a complete beginner.

If you just want to learn about the freelancing opportunity, feel free to watch my free webinar ā€œHow to Build Your High-Income Skill Pythonā€ and learn how I grew my coding business online and how you can, tooā€”from the comfort of your own home.

Join the free webinar now!