## What is a Recommendation System?

If you use Netflix or Amazon you have already seen the results of recommendation systems – movie or item recommendations that fit your taste or needs. So, at its core a recommendation system is a statistical algorithm that computes similarities based on previous choices or features and recommends users which movie to watch or what else they might need to buy.

## How Does a Recommendation System Work?

Assume that persons A and B like a movie M1 and person A also likes movie M2. Now, we can conclude that person B will also like movie M2 with a high probability. Well, that’s very little data and probably a rather imprecise prediction. Yet, it illustrates how collaborative filtering works. In a real world application we would need much more data to make good recommendations. The recommendation algorithms based this concept are called **collaborative filtering**.

Another popular way to recommend items is so called **content-based filtering**. Content-based filtering computes recommendations based on similarities of items or movies. In the case of movies we could look at different features like: genre, actors, … to compute similarity.

If a user liked a given movie, the probability is high that the user will also like similar movies. Thus, it makes sense to recommend movies with a high similarity to those the user liked.

## Implementing a Recommendation System

If you want to understand the code below better, make sure to **sign up for our free email course “Introduction to Pandas and Data Science”** on our Email Academy. Throughout the course, we develop a recommendation system for movies. At its core, there is the method *corrwith()* from the Pandas library.

This is the final implementation of our recommendation system:

## How to Use Pandas corrwith() Method?

The Pandas object DataFrame offers the method *corrwith()* which computes pairwise correlations between DataFrames or a DataFrame and a Series. With the parameter axis, you can either compute correlations along the rows or columns. Here is the complete signature, blue parameters are optional and have default values.

**The arguments in detail:**

1.) other: A Series or DataFrame with which to compute the correlation.

2.) axis: Pass 0 or ‘index’ to compute correlations column-wise, 1 or ‘columns’ for row-wise.

3.) drop: Drop missing indices from result.

4.) method: The algorithm used to compute the correlation. You can either choose from: ‘pearson’, ‘kendall’ or ‘spearman’ or implement your own algorithm. So, either you pass one of the three strings or a callable.

Here is a practical example:

`import pandas as pd`

ratings = {

'Spider Man':[3.5, 1.0, 4.5, 5.0],

'James Bond':[1.0, 2.5, 5.0, 4.0],

'Titanic':[5.0, 4.5, 1.0, 2.0]

}

new_movie_ratings = pd.Series([2.0, 2.5, 5.0, 3.5])

all_ratings = pd.DataFrame(ratings)

print(all_ratings.corrwith(new_movie_ratings))

From a given dictionary of lists (ratings) we create a DataFrame. This DataFrame has three columns and four rows. Each column contains the movie ratings of all four users.

The Series new_movie_ratings contains the ratings for a new movie of all four users.

Using the method *corrwith()* on the DataFrame we get the correlation between the new ratings and the old ones.

The output of the snippet above is:

Spider Man 0.566394 James Bond 0.953910 Titanic -0.962312

As you can see, the new movie has the highest correlation with the James Bond movie. This means, a recommendation system which works purely based on ratings, should recommend the James Bond movie to users that liked the new movie.

Yet, what exactly is correlation?

## What is Correlation?

Correlation describes the statistical relationship between two entities. This is to say, it’s how two variables move in relation to one another. Correlation is given as a value between -1 and +1. **However, correlation is not causation!**

There are three types of correlation:

**Positive correlation:**

A positive correlation is a value in the range 0.0 < c <= 1.0. A correlation of 1.0 means that if the first variable moves up, the second one will also move up. This relationship is weaker if the correlation is lower than 1.0.**Negative correlation:**

A negative correlation is a value in the range 0.0 > c >= -1.0. Negative correlation means that two variable have the opposite behaviour. So, if the first one moves up the second one moves down.**Zero or no correlation:**

A correlation of zero means there is no relationship between the two variables. If the first variable moves up, the second one may do anything else.