Tensors: The Vocabulary of Neural Networks

In this article, we will introduce one of the core elements describing the mathematics of neural networks: tensors. 🧬

Although typically, you won’t work directly with tensors (usually they operate under the hood), it is important to understand what’s going on behind the scenes. In addition, you may often wish to examine tensors so that you can look directly at the data, or look at the arrays of weights and biases, so it’s important to be able to work with tensors.

πŸ’‘ Note: This article assumes you are familiar with how neural networks work. To review those basics, see the article The Magic of Neural Networks: History and Concepts. It also assumes you have some familiarity with Python’s object oriented programming.

Theoretically, we could use pure Python to implement neural networks.

  • We could use Python lists to represent data in the network;
  • We could use other lists representing weights and biases in the network; and
  • We could use nested for loops to perform the operations of multiplying the inputs by the connection weights.

There are a few issues with this, however: Python, especially the list data type, performs rather slowly. Also, the code would not be very readable with nested for loops.

Instead, the libraries that implement neural networks in software packages such as PyTorch use tensors, and they run much more quickly than pure Python. Also, as you will see, tensors allow much more readable descriptions of networks and their data.


ℹ️ Tensors are essentially arrays of values. Since neural networks are essentially arrays of neurons, tensors are a natural fit for describing them. They can be used for describing the data, describing the network connection weights, and other things.

A one-dimensional tensor is known as a vector. Here is an example:

Vectors can also be written horizontally. Here’s the same vector written horizontally:

Switching a vector from vertical to horizontal, or vice versa, is called transposing, and is sometimes needed depending on the math specifics. We will not go into detail on this in this article (see here for more).

Vectors are typically used to represent data in the network. For example, each individual element in a vector can represent the input value for each individual input neuron in the network.

2D Tensor Matrix

A two-dimensional tensor is known as a matrix. Here’s an example:

For a fully connected network, where each neuron in one layer connects to every neuron in the next layer, a matrix is typically used to represent all the connection weights. If there are m neurons connected to n neurons you would need an n x m matrix to describe all the connection weights.

Here’s an example of two neurons connected to three neurons. Here is the network, with connection weights included:

And here is the connection weights matrix:

Why We Use Tensors

Before we finish introducing tensors, let’s use what we’ve seen so far to see why they’re so important to use when modeling neural networks.

Let’s introduce a two-element vector of data and run it through the network we just showed.

ℹ️ Info: Recall neurons add together their weighted inputs, then run the result through an activation function.

In this example, we are ignoring the activation function to keep things simple for the demonstration.

Here is our data vector:

Here’s a diagram depicting the operation:

Let’s calculate the operation (the neuron computations) by hand:

The final result is a 3 element vector:

If you have learned about matrices in grade school and remember doing matrix multiplication, you may note that what we just calculated is identical to matrix multiplication:

ℹ️ Note: Recall matrix multiplication involves multiplying first matrix rows by second matrix columns element-wise, then adding elements together.

This is why tensors are so important for neural networks: tensor math precisely describes neural network operation.

As an added benefit, the equation above showing matrix multiplication is so much more a succinct description than nested for loops would be.

If we introduce the nomenclature of bold lower case for a vector and bold upper case for a matrix, then the operation of vector data running through a neural network weight matrix is described by this very compact equation:

We will see later that matrix multiplication within PyTorch is a similarly compact code equation.

Higher Dimensional Tensors

A three-dimensional (3D) tensor is known simply as a tensor. As you can see, the term tensor generically refers to any dimensional array of numbers. It’s just one-dimensional and two-dimensional tensors that have the unique names “vector” and “matrix” respectively.

You might not think that there is a need for three-dimensional and larger tensors, but that’s not quite true.

A grayscale image is clearly a two-dimensional tensor, in other words, a matrix. But a color image is actually three two-dimensional arrays, one each for red, green, and blue color channels. So a color image is essentially a three-dimensional tensor.

In addition, typically we process data in mini-batches. So if we’re processing a mini-batch of color images we have the three-dimensional aspect already noted, plus one more dimension of the list of images in the mini-batch. So a mini-batch of color images can be represented by a four-dimensional tensor.

Tensors in Neural Network Libraries

One Python library that is well suited to working with arrays is NumPy. In fact, NumPy is used by some users for implementing neural networks. One example is the scikit-learn machine learning library which works with NumPy.

However, the PyTorch implementation of tensors is more powerful than NumPy arrays. PyTorch tensors are designed with neural networks in mind. PyTorch tensors have these advantages:

  1. PyTorch tensors include gradient calculations integrated into them.
  2. PyTorch tensors also support GPU calculations, substantially speeding up neural network calculations.

However, if you are used to working with NumPy, you should feel fairly at home with PyTorch tensors. Though the commands to create PyTorch tensors are slightly different, they will feel fairly familiar. For the rest of this article, we will focus exclusively on PyTorch tensors.

Tensors in PyTorch: Creating Them, and Doing Math

OK, let’s finally do some coding!

First, make sure that you have PyTorch available, either by installing on your system or by accessing it through online Jupyter notebook servers.

🌍 Reference: See PyTorch’s website for instructions on how to install it on your own system.

See this Finxter article for a review of available online Jupyter notebook services:

🌍 Recommended Tutorial: Top 4 Jupyter Notebook Alternatives for Machine Learning

For this article, we will use the online Jupyter notebook service provided by Google called Colab. PyTorch is already installed in Colab; we simply have to import it as a module to use it:

import torch

There are a number of ways of creating tensors in PyTorch.

Typically you would be creating tensors by importing data from data sets available through PyTorch, or by converting your own data into tensors.

For now, since we simply want to demonstrate the use of tensors we will use basic commands to create very simple tensors.

You can create a tensor from a list:

t_list = torch.tensor([[1,2], [3,4]])


tensor([[1, 2],
        [3, 4]])

Note that when we evaluate the tensor variable, the output is labeled to indicate it as a tensor. This means that it is a PyTorch tensor object, so an object within PyTorch that performs just like math tensors, plus has various features provided by PyTorch (such as supporting gradient calculations, and supporting GPU processing).

You can create tensors filled with zeros, filled with ones, or filled with random numbers:

t_zeros = torch.zeros(2,3)


tensor([[0., 0., 0.],
        [0., 0., 0.]])
t_ones = torch.ones(3,2)


tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
t_rand = torch.rand(3,2,4)


tensor([[[0.9661, 0.3915, 0.0263, 0.2753],
         [0.7866, 0.0503, 0.3963, 0.1334]],

        [[0.4085, 0.1816, 0.2827, 0.3428],
         [0.9923, 0.4543, 0.0872, 0.0771]],

        [[0.2451, 0.6048, 0.8686, 0.8148],
         [0.7930, 0.4150, 0.6125, 0.3401]]])

An important attribute to be familiar with to understand the shape of a tensor is the appropriately named shape attribute:

# Output: torch.Size([3, 2, 4])

This shows you that tensor “t_rand” is a three-dimensional tensor composed of three elements of two rows by four columns.

πŸ’‘ Note: The dimensions of a tensor is referred to as its rank. A one-dimensional tensor, or vector, is a rank-1 tensor; a two-dimensional tensor, or matrix, is a rank-2 tensor; a three-dimensional tensor is a rank-3 tensor, and so on.

Let’s do some math with tensors – let’s add two tensors together:

Note the tensors are added together element-wise. Now here it is in PyTorch:

t_first = torch.tensor([[1,2], [3,4]])
t_second = torch.tensor([[5,6],[7,8]])
t_sum = t_first + t_second


tensor([[ 6,  8],
        [10, 12]])

Let’s add a scalar, that is, an independent number (or a rank-0 tensor!) to a tensor:

t_add3 = t_first + 3


tensor([[4, 5],
        [6, 7]])

Note that the scalar is added to each element of the tensor. The same applies when multiplying a scalar by a tensor:

t_times3 = t_first * 3


tensor([[ 3,  6],
        [ 9, 12]])

The same kind of thing applies to raising a tensor to a power, that is the power operation is applied element-wise:

t_squared = t_first ** 2


tensor([[ 1,  4],
        [ 9, 16]])

Recall that after summing weighted inputs, the neuron processes the result through an activation function. Note that the same performance applies here as well: when a vector is processed through an activation function, the operation is applied to the vector element-wise.

Earlier, we pointed out that matrix multiplication is an important part of neural network calculations.

There are two ways to do this in PyTorch: you can use the matmul function:

t_matmul1 = torch.matmul(t_first, t_second)


tensor([[19, 22],
        [43, 50]])

Or you can use the matrix multiplication symbol “@“:

t_matmul2 = t_first @ t_second


tensor([[19, 22],
        [43, 50]])

Recall previously, we showed running an input signal through a neural network, where a vector of input signals was multiplied by a matrix of connection weights.

Here is that in PyTorch:

x = torch.tensor([[7],[8]])


W = torch.tensor([[1,4], [2,5], [3,6]])


tensor([[1, 4],
        [2, 5],
        [3, 6]])
y = W @ x



Note how compact and readable that is instead of doing nested for loops.

Other math can be done with tensors as well, but we have covered most situations that are relevant to neural networks. If you find you need to do additional math with your tensors, check PyTorch documentation or do a web search.

Indexing and Slicing Tensors

Slicing allows you to examine subsets of your data and better understand how the dataset is constructed. You may find you will use this a lot.

Indexing Slicing PyTorch vs NumPy vs Python Lists

Indexing and slicing tensors work the same way it does with NumPy arrays. Note that the syntax is different from Python lists. With Python lists, a separate pair of brackets are used for each level of nested lists. Instead, with Pytorch one pair of brackets contains all dimensions, separated by commas.

Let’s find the item in tensor “t_rand” that is 2nd element, first row, third column. First here is “t_rand” again:



tensor([[[0.9661, 0.3915, 0.0263, 0.2753],
         [0.7866, 0.0503, 0.3963, 0.1334]],

        [[0.4085, 0.1816, 0.2827, 0.3428],
         [0.9923, 0.4543, 0.0872, 0.0771]],

        [[0.2451, 0.6048, 0.8686, 0.8148],
         [0.7930, 0.4150, 0.6125, 0.3401]]])

And here is the item at the 2nd element, first row, and third column (don’t forget indexing starts at zero):

t_rand[1, 0, 2]
# Output: tensor(0.2827)

Let’s look at the slice second element, first row, second through third columns:

t_rand[1, 0, 1:3]
# tensor([0.1816, 0.2827])

Let’s look at the entire 3rd column:

t_rand[:, :, 2]


tensor([[0.0263, 0.3963],
        [0.2827, 0.0872],
        [0.8686, 0.6125]])

ℹ️ Important Slicing Tip: In the above, we use the standard Python convention that a blank before a “:” means “start from the beginning”, and a blank after a “:” means “go all the way to the end”. So a “:” alone means “include everything from beginning to end”.

A likely use for slicing would be to look at a full array (i.e. a matrix) within a set of arrays, i.e. one image out of a set of images.

Let’s pretend our “t_rand” tensor is a list of images. We may wish to sample just a few “images” to get an idea of what they are like.

Let’s examine the first “image” in our tensor (“list of images”):



tensor([[0.9661, 0.3915, 0.0263, 0.2753],
        [0.7866, 0.0503, 0.3963, 0.1334]])

And here is the last array (“image”) in tensor “t_rand”:



tensor([[0.2451, 0.6048, 0.8686, 0.8148],
        [0.7930, 0.4150, 0.6125, 0.3401]])

Using small tensors to demonstrate indexing can be instructive, but let’s see it in action for real. Let’s examine some real datasets with real images.

Real Example

We won’t describe the following in detail, except to note that we are importing various libraries that allow us to download and work with a dataset. The last line creates a function that converts tensors into PIL images:

import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt

import torchvision.transforms as T

conv_to_PIL = T.ToPILImage()

The following downloads the Caltech 101 dataset, which is a collection of over 8000 images in 101 categories:

caltech101_data = datasets.Caltech101(
Extracting data/caltech101/101_ObjectCategories.tar.gz to data/caltech101
Extracting data/caltech101/Annotations.tar to data/caltech101

This has created a dataset object which is a container for the data. These objects can be indexed like lists:

# 8677

# tuple

# 2

The above code shows the dataset contains 8677 items. Looking at the first item of the set we can see they are tuples of 2 items each. Here are the kinds of items in the tuples:

# torch.Tensor

# int

The two items in the tuple are the image as a tensor, and an integer code corresponding to the image’s category.

Colab has a convenient function display() which will display images. First, we use the conversion function we created earlier to convert our tensors to a PIL image, then we display the images.

img = conv_to_PIL(caltech101_data[0][0])

We can use indexing to sample and display a few other images from the set:

img = conv_to_PIL(caltech101_data[1234][0])
img = conv_to_PIL(caltech101_data[4321][0])


We have learned a number of things:

  1. What tensors are
  2. Why tensors are key mathematical objects for describing and implementing neural networks
  3. Creating tensors in PyTorch
  4. Doing math with tensors in PyTorch
  5. Doing indexing and slicing of tensors in PyTorch, especially to examine images in datasets

We hope you have found this article informative. We wish you happy coding!

The next article in the series is the following:

🌍 Recommended Tutorial: Using PyTorch to Build a Working Neural Network

Programmer Humor

It’s hard to train deep learning algorithms when most of the positive feedback they get is sarcastic. — from xkcd