Python Freelancing: My First Fiverr Gig and How I Solved It

Basic Webscraping Script in Python | Requests | BeautifulSoup | ArgParse

Sold Gig ($35)

This is the gig description I offered on my profile to get my first gig:

An email marketing company hired me to write a Python script that satisfies the following requirements.

Requirements

  1. What is the input? (file, file type, email, text,…) File with list of email addresses (one per line)
  2. What is the output? (file, file type, text, csv, …) File with all email addresses that are from a disposable email provider:
    https://gist.github.com/michenriksen/8710649
  3. Where does the input come from? (user input from the console, specific path,…) How should the input be processed? Where should the output go to? (console, file,…) File to File
  4. What should the script do if the input contains errors or is incomplete? Ignore line

Code

I recorded a video where I go over the code I developed:

Here’s the code I developed to filter email addresses from spam email providers and clean the email list from fake email addresses.

import requests
import sys
import argparse
from bs4 import BeautifulSoup

"""
Input: Text file containing email addresses, one address per line
Output: A file containing all email address from the input file
whose domain was found in the file under the URL
"""
__author__ = 'lukasrieger'

# constant default settings
URL = 'https://gist.github.com/michenriksen/8710649'
PATH_DOMAINS_LOCAL = 'disposable_domains.txt'
DEFAULT_INPUT = 'emails.txt'
DEFAULT_OUTPUT = 'filtered_emails.txt'



def refresh_domains_file():
    """
    This method gets the disposable domains list from the git repo
    as html and scrapes it. Finally all domains are written to a file.
    """
    html = requests.get(URL).content
    soup = BeautifulSoup(html, features="html.parser")
    tds = soup.findAll('td', class_='js-file-line')
    domains = [td.text + '\n' for td in tds]

    with open(PATH_DOMAINS_LOCAL, 'w') as file:
        file.writelines(domains)
        
    print(f'Refreshed disposable domains file under path {PATH_DOMAINS_LOCAL}')


def get_disposable_domains(refresh=False):
    """
    This method loads the entries from the disposable domains file
    into a list and returns the list. If the parameter refresh=True,
    the file is refreshed with the domains given in the git repo.
    """
    if refresh:
        # load data from git repo
        refresh_domains_file()
    
    domains = None
    with open(PATH_DOMAINS_LOCAL, 'r') as file:
        domains = file.readlines()
    # remove linebreaks
    return [domain[:-1] for domain in domains]


def check_mails(in_path, out_path, refresh=False):
    """
    Loads the list of disposable domains and
    checks each address from the input file for those domains.
    Only if the list of disposable domains contains the email's
    domain, the email address will be added to the outfile.
    """
    disposable_domains = get_disposable_domains(refresh=refresh)
    count = 0
    print(disposable_domains)
    with open(in_path, 'r') as in_file, open(out_path, 'w') as out_file:
        for email in in_file:
            try:
                prefix, suffix = email.split('@')
                #print(prefix, suffix, '|')
            except:
                print(f'Invalid email address: {email}')
                continue
                
            # remove blanks around the suffix
            if suffix.strip() in disposable_domains:
                out_file.write(email)
                count += 1
                
    return count



if __name__ == '__main__':
    print('Filtering emails...')

    parser = argparse.ArgumentParser(description='Filter email addresses by disposable domains.')
    parser.add_argument('-i', type=str, nargs='?', help='Path of input file with the email addresses.')
    parser.add_argument('-o', type=str, nargs='?', help='Path where the output will be put.')
    parser.add_argument('-r', action='store_true', help='Refresh local copy of the disposable domains file.')
    
    args = parser.parse_args()

    path_input = args.i if args.i else DEFAULT_INPUT
    path_output = args.o if args.o else DEFAULT_OUTPUT
    refresh = args.r
   
    try:
        mails_count = check_mails(path_input, path_output, refresh)
        print(f'Copied {mails_count} email addresses to the output file.')
        print('Done.')
    except:
        print(f'Sorry, an unexpected error ({sys.exc_info()[1]}) occurred!\nCall filtermails.py -h for help.')
    

You can run the code with this simple command:

$ python filtermails.py -i emails.txt -o fakeEmails.txt -r

The code is stored in a file named filtermails.py. The first argument emails.txt is the file of email addresses, one email address per line. The second argument is fakeEmail.txt which is the output file where all the fake emails are stored.

Where to Go From Here?

Enough theory, let’s get some practice!

To become successful in coding, you need to get out there and solve real problems for real people. That’s how you can become a six-figure earner easily. And that’s how you polish the skills you really need in practice. After all, what’s the use of learning theory that nobody ever needs?

Practice projects is how you sharpen your saw in coding!

Do you want to become a code master by focusing on practical code projects that actually earn you money and solve problems for people?

Then become a Python freelance developer! It’s the best way of approaching the task of improving your Python skills—even if you are a complete beginner.

Join my free webinar “How to Build Your High-Income Skill Python” and watch how I grew my coding business online and how you can, too—from the comfort of your own home.

Join the free webinar now!