## Preparation

Before any data manipulation can occur, four (4) new libraries will require installation.

- The
*Pandas*library enables access to/from a*DataFrame*. - The
*NumPy*library supports multi-dimensional arrays and matrices in addition to a collection of mathematical functions. - The
*Matplotlib*library displays a visual graph of a plotted dataset. - The
*Scipy*library allows users to manipulate and visualize the data.

To install these libraries, navigate to an IDE terminal. At the command prompt (`$`

), execute the code below. For the terminal used in this example, the command prompt is a dollar sign (`$`

). Your terminal prompt may be different.

$ pip install pandas

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install numpy

Hit the <Enter> key on the keyboard to start the installation process.

ip install matplotlib

Hit the <Enter> key on the keyboard to start the installation process.

$ pip install scipy

Hit the <Enter> key on the keyboard to start the installation process.

If the installations were successful, a message displays in the terminal indicating the same.

Feel free to view the PyCharm installation guide for the required libraries.

- How to install Pandas on PyCharm
- How to install NumPy on PyCharm
- How to install Matplotlib on PyCharm
- How to install Scipy on PyCharm

Add the following code to the top of each code snippet. This snippet will allow the code in this article to run error-free.

import pandas as pd import numpy as np import matplotlib.pyplot as plt import scipy

## DataFrame Plot Hexbin

The `dataframe.plot.hexbin()`

method establishes a relationship between two (2) numeric values. This occurs when there is a large number of data points. With no overlaps, the chart splits into different `hexbins`

.

💡 **Note**: The darker the color hue, the more concentrated the points.

The syntax for this method is as follows:

DataFrame.plot.hexbin(x, y, C=None, reduce_C_function=None, gridsize=None, **kwargs)

Parameter | Description |
---|---|

`x` | This parameter is a column label/position for x-points. |

`y` | This parameter is a column label/position for y-points. |

`c` | A column integer/string representing the value of an (x, y) point. |

`reduce_c_function` | This function reduces multiple values in a bin to a single value. |

`gridsize` | The number of hexagons in the x-direction. Grid size can also be a tuple with two (2) elements indicating x-y numbers. |

`**kwargs` | Keywords documented in `DataFrame.plot()` . |

For this example, we have a CSV file containing the Sacramento, California, real-estate sales transactions over a five (5) day span. In addition, a **Hexbin **chart displays the square footage and house prices.

df = pd.read_csv('real-estate.csv', usecols=['sq__ft', 'price']) ax = plot.gca() ax = df.plot.hexbin(x='sq__ft', y='price', gridsize=20, ax=ax) plot.show()

- Line [1] reads in two (2) columns from a comma-delimited CSV file and saves it to
`df`

. - Line [2] gets the current axes (
`gca()`

) and saves it to`ax`

. - Line [3] does the following:
- plots the
**Hexbin**chart based on square footage and house prices - sets the grid size to 20
- sets the ax variable created above

- plots the
- Line [4] displays the
**Hexbin**chart on-screen.

**Output**

The buttons on the bottom left can be used to further manipulate the chart.

💡 **Note**: Another way to create this chart is with the `plot()`

method and the kind parameter set to the `'hexbin'`

option.

This example uses the NumPy library to plot random numbers using Hexbin.

n = 900 x = np.random.uniform(-3, 3, size=n) y = np.random.uniform(20, 80, size=n) ob = np.random.randint(1, 5, size=n) df = pd.DataFrame({'x': x, 'y': y, 'ob': ob)}) ax = df.plot.hexbin(x='x', y='y', reduce_C_function=np.sum, gridsize=10, cmap="plasma") plot.show()

- Line [1] sets the size (range) to 900 and saves to
`n`

. - Line [2-3] uses
`np.random.uniform`

to evenly distribute numbers between a specified range. - Line [4] uses
`np.random.randint`

returns random integers between the specified range. - Line [5] creates a DataFrame based on the variables created above and saves it to df.
- Line [6] does the following:
- plots the
**Hexbin**chart based on the variables x, and y - reduces the plot size by adding up the numbers
- sets the grid size to 10
- sets the colormap (cmap) to plasma

- plots the
- Line [7] displays the
**Hexbin**chart on-screen.

**Output**

The buttons on the bottom left can be used to further manipulate the chart.

💡 **Note**: Another way to create this chart is with the `plot()`

method and the kind parameter set to the `'hexbin'`

option.

## More Pandas DataFrame Methods

Feel free to learn more about the previous and next pandas DataFrame methods (alphabetically) here:

Also, check out the full cheat sheet overview of all Pandas DataFrame methods.

At university, I found my love of writing and coding. Both of which I was able to use in my career.

During the past 15 years, I have held a number of positions such as:

In-house Corporate Technical Writer for various software programs such as Navision and Microsoft CRM

Corporate Trainer (staff of 30+)

Programming Instructor

Implementation Specialist for Navision and Microsoft CRM

Senior PHP Coder