This article focuses on analyzing the coronavirus dataset using Python language.
We are not using any of the Python data analysis libraries. Instead, we’ll use our raw Python skills to write a function, slicing, and indexing.
Also, we’ll use Python arithmetic operators such as sum()
and division.
Finally, we’ll use a lambda expression to perform the traditional looping method.
The Jupyter notebook is the preferred IDE (Integrated Development Environment) to write and execute code samples. The dataset we are using is from the data world website. You can download it from the link below.
Our dataset consists of some empty strings. Firstly, we must clean the dataset before performing arithmetic operations or data analysis.
Python open() and reader() Function
We will use CSV (Comma Separated Values) module to open and read the dataset. The csv
module defines the Python reader
method and other methods.
More on that here π https://docs.python.org/3/library/csv.html.
Letβs import the reader()
function from the python csv
module.
from csv import reader
Now, let’s open and read the coronavirus dataset by running the following code.
open_file = open('daily_coronavirus_full_data.csv') read_file = reader(open_file) list_covid_file = list(read_file)
- A Python
open()
function opens a file and returns our datasets into a variableopen_file
. - We are using the primary usage of the
reader()
function. A reader reads datasets in theopen_file
variable. - And
list_covid_file
displayed the contents of the dataset as a Python list.
Execute the following code:
list_covid_file
Here’s the output:
The above screenshot consists of a list of lists. The first item in the list is the header, followed by the rows of the datasets.
Indexing and Slicing
Now, retrieve any row or rows from the dataset using a slice()
function. Fetch the dataset header with the slice()
function.
Code Sample:
list_covid_file[0:1]
– retrieved a dataset contents from index 0 and end at index 1.- Index 0 is the first row, and index 1 is the second row in the dataset.
- However, the
slice()
function would ignore the index 1. - We used Python
print()
function to visualise the dataset header as it should in the csv file.
Output:
The above screenshot consists of ten different variables in the dataset head.
Python Negative Indexing: Get the last row or last element in the list using a negative index.
Code Sample:
print(list_covid_file[-1])
Output:
Using Python len() Function.
The len()
function returns the row number in the datasets. Letβs retrieve the length of our dataset using the len()
function.
Run:
len(covid_dataset)
The Python len()
function accepted dataset as a parameter, which returns the following output:
153482
Using List Comprehension
List comprehension returns a new iterable such as lists, tuples or strings, and it’s a short version of the traditional looping technique.
Code Sample:
get_row = [x for x in covid_dataset]
- We created a variable
get_row
- List comprehension has two angle brackets that consist of expressions that run each element in the list.
- Then, assign the outcome to the
get_row
variable.
Now, execute get_row
variable.
get_row
Output
You should notice from the above screenshot that we have empty strings (''
or '.'
) in the dataset. The next task is to replace all the empty strings(' '
, '.'
) with '0.0'
.
Replacing Empty Strings – Add the result to the list with an append() function
Code Sample:
The above screenshot is a reusable function.
- We created a custom function that accepts two parameters: dataset and row.
- And declared an empty list
fetch_new_data
. - Then iterate over the coronavirus data and assign row into a variable
dataset_row
. - We check if the row has empty strings (
''
,'.'
) - And if it’s true, assign a value
"0.0"
to all empty strings. - Then, we convert the row from the string into a
float()
. - And add the result
dataset_row
into a listfetch_new_data
using Pythonappend()
function.
Outside the loop, return a new list result fetch_new_data
.
Letβs, create an object of the generic_function
function.
Example Code:
get_dataset = generic_function(covid_dataset, 5)
The generic_function
function accepts two arguments: dataset and row 5, which itβs assigned to a variable get_dataset
.
Execute:
get_dataset
Output:
We replaced all empty strings with 0.0. We can do this repeatedly by checking any row with empty strings and replacing them with 0.0.
Python Arithmetic Operations
Using the sum() function
We will reuse a “generic_function
” function we created in an earlier example. Add the total number of deaths using a built-in Python sum()
function and return the total of all data points.
Code Sample:
get_all_deaths = sum(get_dataset)
- We created a variable called
get_all_deaths
. - The
sum()
function acceptedget_dataset
(object) created fromgeneric_function
. - Then, add all the data points in row 5 and assign it to a variable
get_all_deaths
.
Now run:
get_all_deaths
Output:
Average Number of Deaths Using len() Function
We will compute an average death by dividing the total number of deaths by the total length of row 5 (total death).
Code Sample:
avg_deaths = get_all_deaths/len(get_dataset)
- We created an
avg_deaths
variable. get_all_deaths
is the aggregate of all death in row 5 and divided by the length ofget_dataset
using the Pyhtonlen()
function.- Then, return the result in a variable
avg_death
.
Letβs execute average deaths:
avg_death
Output:
The above screenshot shows the average number of deaths in all countries.
Python round() function
Letβs execute:
round(avg_death, 2) # 47705.73
Rounded average death into 2 decimal points.
Python round()
function returns a floating-point and accepts two parameters; the number to rounded and the decimal place number.
Explore New Cases By Country
Python split() Function
Let’s continue exploring our dataset by analyzing the number of new cases in each country.
Below screenshot is a function that gets new cases of coronavirus by country.
The above function consists of the following:
- The function
new_cases_by_country
has three parameters: the dataset, locations, and new cases rows. - We create a variable empty
number_of_cases_by_contry
that holds the new results list. - We iterated over the coronavirus dataset and assigned location and new case rows to variables
location
&new_cases
. - And check if the location is equal to the country in the location row.
- Also, check if the new cases row has empty strings, and if it does, replace them with the value “0.0”.
- Concatenate location and
new_cases
and separate them with a pipe ("|"
)sign, which would help us manipulate the result later. - And add location and new cases into empty list β
number_of_cases_by_country
β using Python built-inappend()
function. - Then, we use a
split()
function to split the string into a list by splitting the output from the pipe ('|'
) sign. - Then, use a
return
keyword to return a new list ofnumber_of_cases_by_country
.
Let’s create an object of new_cases_by_country
.
Sample Code:
new_cases_by_location = new_cases_by_country(covid_dataset, 1, 2, 'United Kingdom')
- We add three arguments into
new_cases_by_country
function: dataset,new cases, location rows and country. - And assigned it to a variable,
new_cases_by_location
.
Execute the following:
new_cases_by_location
Output:
The above screenshot displayed the number of new cases in the United Kingdom.
We can explore the number of cases in each country using the new_cases_by_country
function.
Python Lambda Expression
Python map() Function
The lambda is an anonymous function that indicates a function without a given name. The following code sample combines python map()
and lambda expression to add all the new coronavirus cases in the United Kingdom.
Code Sample:
- We introduced an anonymous python function known as lambdas expression. Check python documentation for more details 6: expressions β Python 3.10.2 documentation.
- We created a variable
sum_new_cases
that holds all the United Kingdom new cases. - The
map()
function makes it simpler and more efficient to iterate over items. - We pass the lambda function and a list as an argument in a
map()
function. - Then, the variable
x[1]
gets the first index innew_cases_by_location
and converts it from string to Pythonfloat()
, and return a new list result.
Output:
Python max() Function
Python max()
function returns the highest number in the list.
Letβs use the max()
function to retrieve the highest number of new cases reported in the United Kingdom.
Code Sample:
max(sum_new_cases)
The variable “sum_new_cases
“ is the lambda expression object we created in the code sample above.
We passed the “sum_new_cases
” into the max()
function, which returns the highest number in the list.
Output:
The maximum number of cases reported in the United Kingdom is 221222.0
Conclusion
There is a lot of data exploration to cover but, this should furnish you with some primary usage of Python built-in functions, function declaration, and function reusability.
All this should come in handy when analyzing a vast dataset.
You can go beyond the code samples shown in this article and play around with the dataset to showcase your python skills.