Spearman Rank Correlation in Python

A prerequisite for a Pearson correlation is normal distribution and metrical data. If your data is not normally distributed or you have variables with ordinal data (like grades, or a Likert scale or a ranked variable from “low” to “high”) you can still calculate a correlation with the Spearman rank correlation. This can be done … Read more

Normal Distribution and Shapiro-Wilk Test in Python

Normal distribution is a statistical prerequisite for parametric tests like Pearson’s correlation, t-tests, and regression. Testing for normal distribution can be done visually with sns.displot(x, kde=true). The Shapiro-Wilk test for normality can be done quickest with pingouin‘s pg.normality(x). 💡 Note: Several publications note that normal distribution is the least important prerequisite for parametric tests and … Read more

Pearson Correlation in Python

A good solution to calculate Pearson’s r and the p-value, to report the significance of the correlation, in Python is scipy.stats.pearsonr(x, y). A nice overview of the results delivers pingouin’s pg.corr(x, y).  What is Pearson’s “r” Measure? A statistical correlation with Pearson’s r measures the linear relationship between two numerical variables. The correlation coefficient r … Read more

How to Calculate z-scores in Python?

The z-scores can be used to compare data with different measurements and for normalization of data for machine learning algorithms and comparisons. 💡 Note: There are different methods to calculate the z-score. The quickest and easiest one is: scipy.stats.zscore(). What is the z-score? The z-score is used for normalization or standardization to make differently scaled … Read more

Easy Exploratory Data Analysis (EDA) in Python with Visualization

With Exploratory Data Analysis (EDA) functions in Python, it is easy to get a quick overview of a dataset. The EDA’s goal is the statistical summary and graphical visualization of a dataset. This will help to discover patterns, missing values and help to extract further information for statistical modeling.  The first step in the data … Read more