Plotting Vector Fields and Gradients for ANN Gradient Descent

👉 This is a follow-up article to Gradient Descent in Neural Nets – A Simple Guide to ANN Learning – Finxter, where a lightweight introduction to Gradient Descent is given.

In this article, you will learn how to produce the graphs in that article, especially the vector fields!

Data visualization is an enlightening task in Exploratory Data Analysis and is based on finding relations and structure in the available data.

There is, however, much more to be plotted.

An example is the Gradient of a cost function. As you might have seen in your Machine Learning path (or in my previous article), a fundamental part of finding the right model is to minimize a function. To do that, one commonly relies on Gradient Descent.

So, why not plotting the gradient to have a clue to where it is taking us?

Quivers: How to Plot 2D Vectors

Vectors and vector fields are plotted in PyPlot objects through the matplotlib.pyplot.quiver() method.

Let us start by creating a figure and then fine-tune the quiver as it is needed in order to obtain the right proportions.

plt.figure()
ax = plt.subplot(111)

A vector plotted by quiver has for main inputs: X, Y, U, V. The X, Y are the coordinates of its tail and U, V have two different roles, depending on the value of the ‘angles’ parameter. The first, when we do not specify angles, U, V stand for the coordinates of the vector head.

We illustrate it with an example.

Say we want to plot a vector starting at the origin [0, 1] and pointing to [2,3]. We then include every entry as a 1D array:

ax.quiver([0], [1], [2], [3])

and voilà, our first vector!

(do not forget the plt.show() if you are not in a notebook)

Way more interesting is to plot several arrows (i.e., a quiver). For instance, let us set the vectors’ starting points as [1, 0], [2, 0], [3, 0], [4, 0] and their heads at [5, 3], [6, 3], [7, 3] and [8, 3].

We gather all first and second coordinates of the tail as 1D arrays:

X = [1, 2, 3, 4] 
Y = [0, 0, 0, 0]

The same for the head:

U = [5, 6, 7, 8]
V = [3, 3, 3, 3]

The resulting quiver is added to a new figure through the command:

plt.figure()
ax1 = plt.subplot(111)
ax1.quiver(X, Y, U, V)

Its looks are not satisfactory, though:

The x-coordinate is definitely messed up since the first head (which should be at [5,3]) is supposed to be crossing above the right-most tail (correctly plotted at [4,0]).

The easiest way to correct this amorphism is to set the parameters angles, scale and scale_units. In the end, our code will look like:

ax1.quiver(X, Y, U, V, angles='xy', scale=1, scale_units='xy')

However, when we set angles='xy', the U, V takes a role more similar to a tangent vector than of a head’s coordinate.

In the following sense: think that you are sitting on the [4, 0] coordinate, and from there you want to move to [8,3]. Then, your X, Y coordinates will be [4, 0] (as it already is) but the new U, V coordinates will be the vector of your movement from one to the other:

U, V = [8 - 4, 3 - 0]

Using np.array’s magic, though, one easily obtains the new U, V coordinates:

X = np.array(X)
Y = np.array(Y)
U = np.array(U)
V = np.array(V)

U_new = U - X
V_new = V - X

Back to the plot:

plt.figure()
ax2 = plt.subplot(111)
ax2.quiver(X, Y, U_new, V_new, angles='xy', scale=1, scale_units='xy')

oops… out of range

plt.figure()
ax2 = plt.subplot(111)
ax2.quiver(X, Y, U_new, V_new, angles='xy', scale=1, scale_units='xy')
ax2.set_xlim((-1,10))
ax2.set_ylim((-1,5))
plt.show()

(Yeah!)

As you saw, you can manage the pyplot.Axes object as you are used to: including a title, setting tick labels, adding coordinate labels, etc.

Before proceeding to vector fields, we observe that column slicing comes into good use when you have your vectors in a list/np.array:

XY = np.array( [ [1,0], [2,0], [3,0], [4,0] ])
UV = np.array( [ [5,3], [6,3], [7,3], [8,3] ])

UV_new = vectors_uv - vectors_xy


plt.figure()
ax2 = plt.subplot(111)
ax2.quiver(XY[:,0], XY[:,1], UV_new[:,0], UV_new[:,1], angles='xy', scale=1, scale_units='xy')
ax2.set_xlim((-1,10))
ax2.set_ylim((-1,5))
plt.show()

The code above produces the same result.

Our next step is to plot a vector field. We continue the article using the angles='xy' for now but go back to the original angles convention during the 3D plots.

How to Plot Vector Fields: np.meshgrid()

A vector field in a subset is a family of vectors, one for each point in the vector field. It is usually given by an algebraic expression, and the gradient field is one example.

As much as a (tangent) vector represents a (infinitesimal) movement, a vector field represents a flow (meaning, every point is moving).

We illustrate with two vector fields: the one that represents rotation around the origin and the second which is the phase-space field of a pendulum (more information on the last here – funnily enough, the first is a first approximation of the second, for small movements).

We start by defining the functions that describe the two fields. These functions will have the coordinates of a point as input and will output the vector that should be attached to that point.

def infinitesimal_rotation(x, y):
    u = y
    v = -x
    return [u,v] 

def phase_pendulum(x, y):
    u = y
    v = -np.sin(x)
    return[u, v]

Notice that we use np.sin instead of math.sin since we will input x as an array.

The second ingredient to the plot are the X, Y tail coordinates. Fortunately, Numpy offers a great solution for that.

All we need is to provide families of X and Y coordinates, say 30 markers from -5 to 5 in each direction.

The function np.meshgrid() will cross-product these sets of coordinates and output two matrices with all possible combinations of the X-markers and the Y-markers:

# x and y markers
x = np.linspace(-5,5,30)
y = np.linspace(-5,5,30)

# np.meshgrid(x,y) outputs two 30x30 matrices which will be our X and Y inputs
X, Y = np.meshgrid(x,y)

Here we have 30×30 = 900 points homogeneously distributed in the square [-5,5]x[-5,5] of the xy-plane. Each point corresponds to one entry (i,j) of both matrices: for example, the x and y coordinates of the 34th point in the grid is given by the (2,4)th entry in X and the (2,4)th entry in Y, respectively.

Moreover, as mentioned, we can directly apply X, Y to our functions, obtaining the U, V coordinates directly:

U_rot, V_rot = infinitesimal_rotation(X,Y)
U_pend, V_pend = phase_pendulum(X,Y)

Finally, the quiver method admits 2D array inputs (urra!) so we are actually ready to put out fields into action (😀):

plt.figure() 
ax_rot = plt.subplot(111)
ax_rot.quiver(X, Y, U_rot, V_rot, angles='xy', scale=1, scale_units='xy')
ax_rot.set_xlim((-5,5))
ax_rot.set_ylim((-5,5))
plt.show()

plt.figure() 
ax_pend = plt.subplot(111)
ax_pend.quiver(X, Y, U_pend, V_pend, angles='xy', scale=1, scale_units='xy')
ax_pend.set_xlim((-5,5))
ax_pend.set_ylim((-5,5))
plt.show()

Cool, aren’t they?

Nevertheless, we tweak a bit for funsake.

plt.figure() 
ax_rot = plt.subplot(111)
ax_rot.quiver(X, Y, U_rot, V_rot, color='r', angles='xy', scale=1, scale_units='xy', alpha=.6)
ax_rot.set_xlim((-5,5))
ax_rot.set_ylim((-5,5))
plt.show()

x_pend = np.linspace(-15,15,90)
y_pend = np.linspace(-5,5,30)

X, Y = np.meshgrid(x_pend,y_pend)

U_pend,V_pend = phase_pendulum(X,Y)



plt.figure(figsize=(15,5)) 
ax_pend = plt.subplot(111)
ax_pend.quiver(X, Y, U_pend, V_pend, angles='xy', scale=4, scale_units='xy', alpha=.8, headaxislength=3, headlength=3, width=.001)
ax_pend.set_xlim((-15,15))
ax_pend.set_ylim((-5,5))
ax_pend.set_title('Phase Space - Non-linear Pendulum')
ax_pend.set_xlabel('Position')
ax_pend.set_ylabel('Momentum')
plt.show()

🤩

3D plots and Gradients

Finally, matplotlib provides the analogous implementation for 3D plots. Only care one has to have is that there is no option to set the angles parameter to ‘xy’, one has to provide coordinates for head and tail explicitly.

Before introducing the method, let us import matplotlib 3D Axes object and instantiate it.

from mpl_toolkits.mplot3d import axes3d

fig = plt.figure(figsize=(14,10))
ax = fig.add_subplot(111,projection='3d')

The quiver method now has 6 main inputs: X, Y, Z, U, V, W. Again, the X, Y, Z stands for the coordinates of the tail and U, V, W the coordinates of the head.

They can be passed as any array-like object (as meshgrid can be done with three arrays, for example).

P = [0, 1, 1]
Q = [1, 2, 1.5]

ax.quiver(*P, *Q)
ax.set_xlim((-1,3))
ax.set_ylim((-1,3))
ax.set_zlim((-1,3))
plt.show()

Although the scaling problem is not present anymore, visualizing 3D in a 2D flat screen is always an issue. If you are using a Notebook, though, you can appeal to the magic command

%matplotlib notebook

Include it in the very beginning of the Notebook. It will allow you to rotate the plot, which drastically improves the visualization.

Next, we recall the surface we plotted in the companion article:

We first define the function of this graph:

def f(x, y):
    return ((x-1)*(x+2)*x*(x-2) + 2*y ** 2)

Every point on the surface is of the form [x, y, f(x,y)]. Therefore, in order to plot it, we define a new grid and recover the third coordinate by applying the function to it.

x = np.linspace(-2, 2, 30)
y = np.linspace(-2, 2, 30)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)

Thus reaching the actual surface plot:

surface = ax.plot_surface(X, Y, Z,  cmap='Blues', edgecolor='none', antialiased=False, alpha=.7)

To add the horizontal colormap bar on the right, type:

fig.colorbar(surface, shrink=0.5, aspect=5)

Now, a vector can be added to the plotted surface harmlessly by using a surface-point coordinates as X, Y, Z. For instance, you can fix a point in xy and then compute its f-value:

px = 2
py = 1.5
P = [px, py, f(px,py)]

Q = [1.5, 2, 1]

ax.quiver(*P, *Q, color='r')  # we change color for better visualization

To see the vector, though, one must rotate the surface. If you are not in Jupyter Notebooks or need to save the figure, it is useful to know the following command:

ax.view_init(elev=0., azim=30)

Nevertheless, this vector is not tangent to the surface. A vector is tangent to a surface when it defines the velocity of a particle moving inside the surface.

On the other hand, a particle moving in the direction above would escape immediately. That said, it is an easy task to correct the last vector, making it tangent. To understand what would be a tangent vector in this case, assume that you are moving with coordinates .

You will stay in the surface if and only if for every instant t. In particular, your velocity must be of the form

Where the last coordinate’s expression follows from the Chain Rule. Let us wrap this vector expression in a function:

def tangent_vector(x,y,u,v):
    '''
    Inputs a point in the plane xy and a direction vector and outputs the corresponding vector tangent to the graph
    '''
    grad_f = np.array([(x+2)*x*(x-2)+(x-1)*x*(x-2)+(x-1)*(x+2)*(x-2)+(x-1)*(x+2)*x , 4*y])
    q = np.array([u,v])
    return [u,v,np.dot(q,grad_f)]

If we keep the two first coordinate direction of Q we get:

Q_tangent = tangent_vector(2, 1.5, 1.5, 2)

ax.quiver(*P, *Q_tangent, color='r', arrow_length_ratio=0.15) # some aesthetical touches

# and we keep a side view
ax.view_init(elev=9., azim=95)

As a last task, we plot the gradient field in the xy-plane of the 3D plot. We start by defining a function that will return the 2D gradient vector of f:

def grad_f(x,y):
    '''
    Input a point in the plane and outputs the gradient at that point
    '''
    return np.array([(x+2)*x*(x-2)+(x-1)*x*(x-2)+(x-1)*(x+2)*(x-2)+(x-1)*(x+2)*x , 4*y])

💡 Do not forget: since we are plotting it in the xy-plane, the third coordinates must correspond to the lowest value in the z-axis. In this case is -8:

U0,V0 = grad_f(X,Y)

Z0 = np.zeros_like(X) - 8 

ax.quiver(X, Y , Z0, X+U0, Y+V0, Z0, color='r', arrow_length_ratio=0.2, length=.1, normalize=True, linewidth=.6)

Also remember that the quiver parameters U, V here stand for the head of the arrow. Therefore, we have to add the base point to the grad_f coordinates. Besides, if you want the reverse gradient, use -U0, -V0:

ax.quiver(X, Y , Z0, X-U0, Y-V0, Z0, color='r', arrow_length_ratio=0.2, length=.1, normalize=True, linewidth=.6)

Yep! We’ve done it!

In a Nutshell

We just reviewed how to plot quivers (= set of arrows) in 2D and 3D axes figures. We can also use np.meshgrid() as an auxiliary tool to plot vector fields, as in the example (repeated below):

def phase_pendulum(x, y):
    u = y
    v = -np.sin(x)
    return[u, v]

x_pend = np.linspace(-15,15,90)
y_pend = np.linspace(-5,5,30)

X, Y = np.meshgrid(x_pend,y_pend)

U_pend,V_pend = phase_pendulum(X,Y)

plt.figure(figsize=(15,5)) 
ax_pend = plt.subplot(111)
ax_pend.quiver(X, Y, U_pend, V_pend, angles='xy', scale=4, scale_units='xy', alpha=.8, headaxislength=3, headlength=3, width=.001)
ax_pend.set_xlim((-15,15))
ax_pend.set_ylim((-5,5))
ax_pend.set_title('Phase Space - Non-linear Pendulum')
ax_pend.set_xlabel('Position')
ax_pend.set_ylabel('Momentum')
plt.show()

Besides, we can use it to plot vectors tangent to surfaces and gradient fields. Here is the full code in our example:

# Importing and instanciating a 3D axes
from mpl_toolkits.mplot3d import axes3d

fig = plt.figure(figsize=(14,10))
ax = fig.add_subplot(111,projection='3d')

# Defining the desired function
def f(x, y):
    return ((x-1)*(x+2)*x*(x-2) + 2*y ** 2)

# Getting a grid to plot
x = np.linspace(-2, 2, 30)
y = np.linspace(-2, 2, 30)

X, Y = np.meshgrid(x, y)
Z = f(X, Y)

# The actual plot
surface = ax.plot_surface(X, Y, Z,  cmap='Blues', edgecolor='none', antialiased=False, alpha=.7)

# To add the horizontal colormap bar on the right, type:
fig.colorbar(surface, shrink=0.5, aspect=5)

# Making a vector tangent
def tangent_vector(x,y,u,v):
    '''
    Inputs a point in the plane xy and a direction vector and outputs the corresponding vector tangent to the graph
    '''
    grad_f = np.array([(x+2)*x*(x-2)+(x-1)*x*(x-2)+(x-1)*(x+2)*(x-2)+(x-1)*(x+2)*x , 4*y])
    q = np.array([u,v])
    return [u,v,np.dot(q,grad_f)]

# Plotting a tangent vector
Q_tangent = tangent_vector(2, 1.5, 1.5, 2)
ax.quiver(*P, *Q_tangent, color='r', arrow_length_ratio=0.15) # some aesthetical touches

# Making a side view
ax.view_init(elev=9., azim=95)

# Plotting the Gradient field
def grad_f(x,y):
    '''
    Input a point in the plane and outputs the gradient at that point
    '''
    return np.array([(x+2)*x*(x-2)+(x-1)*x*(x-2)+(x-1)*(x+2)*(x-2)+(x-1)*(x+2)*x , 4*y])

U0,V0 = grad_f(X,Y)
Z0 = np.zeros_like(X) - 8 # Putting it in the lower bottom of the figure
ax.quiver(X, Y , Z0, X+U0, Y+V0, Z0, color='r', arrow_length_ratio=0.2, length=.1, normalize=True, linewidth=.6)

If you did not have enough fun, we leave here two exercises for the reader 😉

Plot the gradient field not as a vector field in the xy plane, but as a vector field tangent to the graph
Draw the path of a gradient descent (as a quiver)

Happy coding!

Try It Yourself

You can run the code from this article in our Jupyter Notebook online.