Rebecca Nowack, Author at Be on the Right Side of Change

Spearman Rank Correlation in Python

July 1, 2022 by Rebecca Nowack

A prerequisite for a Pearson correlation is normal distribution and metrical data. If your data is not normally distributed or you have variables with ordinal data (like grades, or a Likert scale or a ranked variable from “low” to “high”) you can still calculate a correlation with the Spearman rank correlation. This can be done … Read more

Normal Distribution and Shapiro-Wilk Test in Python

June 4, 2022 by Rebecca Nowack

Normal distribution is a statistical prerequisite for parametric tests like Pearson’s correlation, t-tests, and regression. Testing for normal distribution can be done visually with sns.displot(x, kde=true). The Shapiro-Wilk test for normality can be done quickest with pingouin‘s pg.normality(x). 💡 Note: Several publications note that normal distribution is the least important prerequisite for parametric tests and … Read more

Pearson Correlation in Python

June 4, 2022 by Rebecca Nowack

A good solution to calculate Pearson’s r and the p-value, to report the significance of the correlation, in Python is scipy.stats.pearsonr(x, y). A nice overview of the results delivers pingouin’s pg.corr(x, y). What is Pearson’s “r” Measure? A statistical correlation with Pearson’s r measures the linear relationship between two numerical variables. The correlation coefficient r … Read more

How to Calculate z-scores in Python?

May 28, 2022 by Rebecca Nowack

The z-scores can be used to compare data with different measurements and for normalization of data for machine learning algorithms and comparisons. 💡 Note: There are different methods to calculate the z-score. The quickest and easiest one is: scipy.stats.zscore(). What is the z-score? The z-score is used for normalization or standardization to make differently scaled … Read more

The Ultimate Guide to Data Cleaning in Python and Pandas

May 18, 2022 by Rebecca Nowack

What is Data Cleaning? Data cleaning describes the process of turning messy data into clean datasets that can be used for research and data science purposes. For example, tidy data will be in a wide format: every column contains a variable, and every row contains one case. Also, data cleaning means getting rid of corrupt … Read more

Easy Exploratory Data Analysis (EDA) in Python with Visualization

May 1, 2022 by Rebecca Nowack

With Exploratory Data Analysis (EDA) functions in Python, it is easy to get a quick overview of a dataset. The EDA’s goal is the statistical summary and graphical visualization of a dataset. This will help to discover patterns, missing values and help to extract further information for statistical modeling. The first step in the data … Read more