# Data Science

## What’s the Best NumPy Book?

Fear of missing out in data science? Data science and machine learning are taking over. Data-driven decision making penetrates every single company nowadays. Data science is indeed the “sexiest job in the 21st century“! There is one Python library which is the basis of any data science related computation you can undertake as a Python …

## np.nonzero() – A Simple Guide with Video

This article explains first how the NumPy nonzero() function works. It then goes on to apply it to a practical problem on how to find array elements using the nonzero() function in NumPy in a practical data science example. Syntax numpy.nonzero(a) The np.nonzero(arr) function returns the indices of the elements of an array or Python …

## How to Change the Figure Size for a Seaborn Plot?

Seaborn is a comprehensive data visualization library used for the plotting of statistical graphs in Python. It provides fine-looking default styles and color schemes for making more attractive statistical plots. Seaborn is built on the top portion of the matplotlib library and is also integrated closely with data structures from pandas.                                                             How to change …

## How to Select Multiple Columns in Pandas

The easiest way to select multiple columns in Pandas is to pass a list into the standard square-bracket indexing scheme. For example, the expression df[[‘Col_1’, ‘Col_4, ‘Col_7’]] would access columns ‘Col_1’, ‘Col_4’, and ‘Col_7’. This is the most flexible and concise way for only a couple of columns. To learn about the best 3 ways …

## Python – Inverse of Normal Cumulative Distribution Function (CDF)

Problem Formulation How to calculate the inverse of the normal cumulative distribution function (CDF) in Python? Method 1: scipy.stats.norm.ppf() In Excel, NORMSINV is the inverse of the CDF of the standard normal distribution. In Python’s SciPy library, the ppf() method of the scipy.stats.norm object is the percent point function, which is another name for the …

## NumPy Broadcasting – A Simple Tutorial

Broadcasting describes how NumPy automatically brings two arrays with different shapes to a compatible shape during arithmetic operations. Generally, the smaller array is “repeated” multiple times until both arrays have the same shape. Broadcasting is memory-efficient as it doesn’t actually copy the smaller array multiple times. Here’s a minimal example: Let’s have a more gentle … ## Logistic Regression in Python Scikit-Learn

Logistic regression is a popular algorithm for classification problems (despite its name indicating that it is a “regression” algorithm). It belongs to one of the most important algorithms in the machine learning space. Linear Regression Background Let’s review linear regression. Given the training data, we compute a line that fits this training data so that …

## How to Convert a Boolean Array to an Integer Array in Python?

Problem Formulation Given a NumPy array consisting of Boolean values. How to convert it to an integer array? Convert each True value to integer 1, and Convet each False value to integer 0. Here’s an example Boolean array: What you want is the following integer array: Let’s examine some methods to accomplish this easily. Method …

## How to Concatenate Two NumPy Arrays?

Problem Formulation Given two NumPy arrays a and b. How to concatenate both? Method 1: np.concatenate() NumPy’s concatenate() method joins a sequence of arrays along an existing axis. The first couple of comma-separated array arguments are joined. If you use the axis argument, you can specify along which axis the arrays should be joined. For …

## Division in Python

The double-frontslash // operator performs integer division and the single-frontslash / operator performs float division. An example for integer division is 40//11 = 3. An example for float division is 40/11 = 3.6363636363636362. A crucial lesson you need to master as a programmer is “division in Python”. What does it mean to divide in Python? … ## [Tutorial] K-Means Clustering with SKLearn in One Line

If there is one clustering algorithm you need to know – whether you are a computer scientist, data scientist, or machine learning expert – it’s the K-Means algorithm. In this tutorial drawn from my book Python One-Liners, you’ll learn the general idea and when and how to use it in a single line of Python …

## Smoothing Your Data with the Savitzky-Golay Filter and Python

This article deals with signal processing. More precisely, it shows how to smooth a data set that presents some fluctuations, in order to obtain a resulting signal that is more understandable and easier to be analyzed. In order to smooth a data set, we need to use a filter, i.e. a mathematical procedure that allows …