Aaron Glatzer, Author at Be on the Right Side of Change

Using PyTorch to Build a Working Neural Network

Aaron Glatzer — Fri, 18 Nov 2022 20:19:12 +0000

In this article, we will use PyTorch to build a working neural network. Specifically, this network will be trained to recognize handwritten numerical digits using the famous MNIST dataset.

The code in this article borrows heavily from the PyTorch tutorial “Learn the Basics”. We do this for several reasons.

First, that tutorial is pretty good at demonstrating the essentials for getting a working neural network.
Second, just like importing libraries, it’s good to not reinvent the wheel when you don’t have to.
Third, when building your own network, it is very helpful to start with something that is known to work, then modify it to your needs.

Knowledge Background

This article assumes the reader has some necessary background:

Familiarity with Python, and Python object-oriented programming.
Familiarity with how neural networks work. See the Finxter article “The Magic of Neural Networks: History and Concepts” to learn the basic ideas.
Familiarity with how neural networks learn. See the Finxter article “How Neural Networks Learn” to learn this subject.
Familiarity with tensors. See the Finxter article “Tensors: the Vocabulary of Neural Networks” to learn this subject.
Familiarity with Matplotlib. While this is not necessary to follow along, it is necessary if you want to be able to view image data yourself on your own datasets in the future (and you will want to be able to do this).

You can run PyTorch on your own machine, or you can run it on publically available computer systems.

We will be running this exercise using Google Colab, which allows running world-class computing capability, all accessible for free.

Recommended: Other options for publically available computing are shown in the Finxter article “Top 4 Jupyter Notebook Alternatives for Machine Learning”.

Process Overview

This article will cover all the necessary steps to build and test a working neural network using the PyTorch library.

PyTorch provides a framework that makes building, training, and using neural networks easier. Also under the hood, it is written using the very fast C++ language, so that those neural networks can provide world-class performance while using the popular Python language as the interface to create those networks.

Neural networks and the PyTorch library are rich subjects. So while we will cover all the necessary steps, each step will just scratch the surface of its respective subject.

For example, we will get the image data from datasets built into the PyTorch library. However, the user will eventually want to use neural networks on their own data, so the users will need to learn how to build and work with their own datasets.

So for each of these steps, the user will want to learn more on each subject to become a proficient PyTorch user.

Nevertheless, by the end of this article, you will have built your own working neural network, so you can be sure you will know how to do it!

Further learning will enrich those abilities. Throughout the article, we will point out some of the other things you will eventually want to learn for each step.

Here are the steps we will be taking:

Import necessary libraries.
Acquire the data.
Review the data to understand it.
Create data loaders for loading the data into the network.
Design and create the neural network.
Specify the loss measure and the optimizer algorithm.
Specify the training and testing functions.
Train and test the network using the specified functions.

Step 1: Import Necessary Libraries

Before we do anything, we will want to set up our runtime to use the GPU (again, assuming here you are using Colab).

Click on “Runtime” in the top menu bar, and then choose “Change runtime type” from the dropdown. Then from the window that pops up choose “GPU” under “Hardware accelerator”, and then click “Save”.

Next, we will need to import a number of libraries:

We will import the torch library, making PyTorch available for use.
From the torch module we will import the nn library, which is important for building the neural network.
From the torchvision module we will import the datasets library, which will help provide the image datasets.
From the data utilities module, we will import the DataLoader library. Data loaders help load data into the network.
From the torchvision.transforms module we will import the ToTensor library. This converts the image data into tensors so that they are ready to be processed through the network.

Here is the code importing the needed modules:

import torch
from torch import nn
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor

Step 2: Acquire the Data

As mentioned before, in this exercise, we will be getting the MNIST data as available directly through PyTorch libraries. This is the quickest and easiest approach to getting the data.

If you wanted to get the original datasets they are available at:

http://yann.lecun.com/exdb/mnist/

Even though we will get the data through the PyTorch libraries, it can still be helpful to review this page, as it provides some useful information about the dataset. (However we will provide everything you need to understand this dataset in the article).

Note: Firefox has trouble accessing this page, for some reason requiring a login to access it. Either view it using another browser, or view it as recorded on the Internet Archive Wayback Machine.

There are multiple datasets available through the PyTorch dataset libraries. Here are PyTorch webpages linking to Image Datasets, Text Datasets, and Audio Datasets.

To get data from a PyTorch dataset we create an instance from the respective dataset class. Here is the format:

dataset_instance = DatasetClass(parameters)

This creates a dataset object, and downloads the data. The data is then available by working with the dataset object.

Here is the code to create our MNIST datasets:

# Download MNIST data, put it in pytorch dataset
mnist_data = datasets.MNIST(
    root='mnist_nn',
    train=True,
    download=True,
    transform=ToTensor()
)

mnist_test_data = datasets.MNIST(
    root='mnist_nn',
    train=False,
    download=True,
    transform=ToTensor()
)

You’ll use these parameters:

The root parameter specifies the directory where the downloaded data will be placed.
The train parameter determines whether training or testing data is downloaded.
The download=True parameter confirms the data should be downloaded if it hasn’t been already.
The transform parameter converts the data into tensors, in this case.

What parameters are available vary from dataset to dataset, as does how the data is structured, so refer to the dataset web pages mentioned above to review the details of what is available and needed.

While this method of getting data is convenient and easy, remember that you will eventually want to work with your own data, so eventually, you will want to learn how to create your own datasets.

Also, not all datasets contain images with uniform image size, so images may need to be cropped or stretched to fit the fixed number of input neurons.

Also, other transformations can be helpful as well.

For example, you can effectively expand your dataset by including subcrops from your original dataset as additional images to train on. So data transformations is something else you will want to learn that you might use at this stage in the process.

Step 3: Review the Dataset

Now that we have downloaded the data and created a dataset, let’s review the dataset to understand its contents and structure.

type(mnist_data)
# torchvision.datasets.mnist.MNIST

The type() function shows that our dataset is an object of the MNIST dataset class.

Conveniently, PyTorch datasets have been designed to be indexed like lists. Let’s take advantage of this and use the len() function to learn something about our datasets:

len(mnist_data)
# 60000

len(mnist_test_data)
# 10000

So our training dataset contains 60000 items, and our test dataset contains 10000 items, consistent with the number of images specified to be in each respective dataset.

Let’s use the type() and len() functions to examine the first item in the training dataset:

type(mnist_data[0])
# tuple

len(mnist_data[0])
# 2

So the items in the datasets are tuples containing 2 items.

Let’s use the type() function to learn about the first item in the tuple:

type(mnist_data[0][0])
# torch.Tensor

So the first item in the tuple is a tensor, likely some image data.

Let’s examine the shape attribute of the tensor to understand its shape:

mnist_data[0][0].shape
# torch.Size([1, 28, 28])

This is consistent with the 28*28 pixel structure of the image data, plus one additional dimension containing the entire image data.

Let’s examine the second item in the tuple:

type(mnist_data[0][1])
# int

mnist_data[0][1]
# 5

So the second item is the integer '5', apparently the label for an image of the digit '5'.

Let’s use Matplotlib to view the image:

import matplotlib.pyplot as plt
plt.imshow(mnist_data[0][0], cmap='gray')

Output:

TypeError                                 Traceback (most recent call last)
 in 
----> 1 plt.imshow(mnist_data[0][0], cmap='gray')

/usr/local/lib/python3.7/dist-packages/matplotlib/pyplot.py in imshow(X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, data, **kwargs)
   2649         filternorm=filternorm, filterrad=filterrad, imlim=imlim,
   2650         resample=resample, url=url, **({"data": data} if data is not
-> 2651         None else {}), **kwargs)
   2652     sci(__ret)
   2653     return __ret

/usr/local/lib/python3.7/dist-packages/matplotlib/__init__.py in inner(ax, data, *args, **kwargs)
   1563     def inner(ax, *args, data=None, **kwargs):
   1564         if data is None:
-> 1565             return func(ax, *map(sanitize_sequence, args), **kwargs)
   1566 
   1567         bound = new_sig.bind(ax, *args, **kwargs)

/usr/local/lib/python3.7/dist-packages/matplotlib/cbook/deprecation.py in wrapper(*args, **kwargs)
    356                 f"%(removal)s.  If any parameter follows {name!r}, they "
    357                 f"should be pass as keyword, not positionally.")
--> 358         return func(*args, **kwargs)
    359 
    360     return wrapper

/usr/local/lib/python3.7/dist-packages/matplotlib/cbook/deprecation.py in wrapper(*args, **kwargs)
    356                 f"%(removal)s.  If any parameter follows {name!r}, they "
    357                 f"should be pass as keyword, not positionally.")
--> 358         return func(*args, **kwargs)
    359 
    360     return wrapper

/usr/local/lib/python3.7/dist-packages/matplotlib/axes/_axes.py in imshow(self, X, cmap, norm, aspect, interpolation, alpha, vmin, vmax, origin, extent, shape, filternorm, filterrad, imlim, resample, url, **kwargs)
   5624                               resample=resample, **kwargs)
   5625 
-> 5626         im.set_data(X)
   5627         im.set_alpha(alpha)
   5628         if im.get_clip_path() is None:

/usr/local/lib/python3.7/dist-packages/matplotlib/image.py in set_data(self, A)
    697                 or self._A.ndim == 3 and self._A.shape[-1] in [3, 4]):
    698             raise TypeError("Invalid shape {} for image data"
--> 699                             .format(self._A.shape))
    700 
    701         if self._A.ndim == 3:

TypeError: Invalid shape (1, 28, 28) for image data

Oops, that extra one-item dimension (containing the whole image) is causing us problems. We can use the squeeze() method on the tensor to get rid of any one-element dimensions, and instead return a two-dimensional 28*28 tensor, instead of the three-dimensional tensor we had before.

Let’s try again:

plt.imshow(mnist_data[0][0].squeeze(), cmap='gray')
#

Well, it’s a little sloppy, but that’s plausibly a number '5'. (This is reasonable to expect from a hand-written digit!).

So it looks like each item in the dataset is a tuple containing an image (in tensor format) and its corresponding label.

Let’s use Matplotlib to look at the first 10 images, and title each image with its corresponding label:

fig, axs = plt.subplots(2, 5, figsize=(8, 5))
for a_row in range(2):
  for a_col in range(5):
    img_no = a_row*5 + a_col
    img = mnist_data[img_no][0].squeeze()
    img_tgt = mnist_data[img_no][1]
    axs[a_row][a_col].imshow(img, cmap='gray')
    axs[a_row][a_col].set_xticks([])
    axs[a_row][a_col].set_yticks([])
    axs[a_row][a_col].set_title(img_tgt, fontsize=20)
plt.show()

So now we have a clear understanding of how our dataset is structured and what the data looks like. Much of this is explained in the dataset description page, but this kind of analysis is often very useful for getting a precise understanding of the dataset that might not be clear from the description.

Step 4: Create Dataloaders

Datasets make the data available for processing.

However, typically, we will want to process using randomized mini-batches from the dataset.

Data loaders make this easy. Dataloaders are iterables, and you’ll see later that every time you iterate a dataloader it returns a randomized minibatch from the dataset that can be processed through the neural network.

Let’s create some dataloader objects from our datasets:

batch_size = 100

mnist_train_dl = DataLoader(mnist_data,
                      batch_size=batch_size,
                      shuffle=True)

mnist_test_dl = DataLoader(mnist_test_data,
                          batch_size=batch_size,
                          shuffle=True)

So we have created two data loaders, one for the training dataset, and one for the test dataset.

The batch_size parameter specifies the number of image/label pairs in the minibatch that the dataloader will return for each iteration. The shuffle parameter determines whether or not the mini-batches are randomized.

Step 5: Design and Create the Neural Network

Check for GPU

We are about to design and create the neural network, but first, let’s check if a GPU is available.

One of the advantages PyTorch has as a neural network framework is that it supports the use of a GPU. The use of a GPU will implement parallel processing to greatly speed up computation.

Depending on the problem, at least an order of magnitude faster processing can be achieved.

Use of a GPU with PyTorch is very easy. First, use the function torch.cuda.is_available() to test if a GPU is available and properly configured for use by PyTorch (PyTorch uses the CUDA framework for using the GPU).

If a GPU is available, we will send the model and the data tensors to the GPU for processing.

The following tests for availability of a GPU, then sets a variable device to either 'cpu' or 'cuda' depending on what is available.

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
# Using cuda device

Create the Neural Network

Now let’s design and create the neural network. We do this by creating a class, which we have chosen to call NeuralNet, which is a subclass of the nn.Module library.

Here is the code to specify and then create our neural network:

class NeuralNet(nn.Module):
  def __init__(self):
    super().__init__()          # Required to properly initialize class, ensures inheritance of the parent __init__() method
    self.flat_f = nn.Flatten()  # Creates function to smartly flatten tensor
    self.neur_net = nn.Sequential(
        nn.Linear(28*28, 512),
        nn.ReLU(),
        nn.Linear(512, 256),
        nn.ReLU(),
        nn.Linear(256,10)
    )

  def forward(self, x):
    x = self.flat_f(x)
    logits = self.neur_net(x)
    return logits

model = NeuralNet().to(device)

There are a number of important details to review in this code.

First, our neural network definition class must have two methods included: an __init__() method, and a forward() method.

Classes in Python routinely include an __init__() method to initialize variables and other things in the object that is created. The class must also include a forward() method, which tells PyTorch how to process the data during the forward pass of the data.

Let’s go over each of these in more detail.

Creating the Model: init() Method

First, within the __init__() method note the super().__init__() command. When we create a subclass it inherits the parent class variables and methods.

However, when we write an __init__() method in the subclass, that overrides inheritance of the __init__() method from the parent class.

However there are features in the parent class’ __init__() that our class needs to inherit. The super()__.init__() command achieves this. In effect, it says “include the parent class __init__() within our child class”.

To make a long story short, this is necessary to properly initialize our child class, by including some things needed from the parent nn.Module class.

Next, note creating a function from the nn.Flatten() function. Even though our data is a 28×28 pixel two-dimensional image, the processing still works if we convert it into a one-dimensional vector, stacking row by row next to one another to form a 28×28 = 784 element vector (in fact making this change is a common choice).

The flatten() function achieves this. However, the standard flatten() (note the lower case 'f') function will flatten everything, turning a 100 image minibatch tensor of shape (100, 1, 28, 28) into a single vector of shape (78400).

Instead, if we create a function from the nn.Flatten() function (note the upper case 'F'), this is smart enough to know to eliminate the single-element dimension and merge the last two dimensions, resulting in a tensor of shape (100, 784), representing a list of 100 vectors of 784 elements.

Note: double-check to make sure your function is flattening properly. If not, the Flatten() function can include some parameters that specify which dimensions to flatten. See documentation for details.

The last thing we do in the __init__() method is specify the neural network structure using the nn.Sequential() function.

Here we list the neural network layers in sequence from beginning to end.

First, we list an input layer of 28×28=784 neurons, connecting through linear (weights * input + bias) connections to 512 neurons. These 512 neurons then pass data through a non-linear ReLU activation function layer.

Those signals then go through another linear layer connecting 512 neurons to 256 neurons. These signals then go through another ReLU activation function layer. Finally, the signals go through a final linear layer connecting the 256 neurons to 10 final output neurons.

'ReLU' stands for 'Rectified Linear Unit'. It is one of many non-linear activation functions which can be chosen.

It is defined as:

f(x) = x, if x>=0
else f(x) = 0

Here is a graph of the ReLU function:

Creating the Model: forward() Method

The second required method for our class is the forward() method.

As mentioned the forward() method tells PyTorch how to process the data during the forward pass. Here we first flatten our tensor using the flatten function we defined previously under __init__().

Then we pass the tensor through the self.neur_net() function we defined previously using the nn.Sequential() function. Finally, the results are returned.

Important point: the programmer will NOT be using forward() method in any classes or functions, it is just for PyTorch’s use. PyTorch expects such a method, so it must be written, but the programmer will not directly use it in any subsequent code.

Finally, we create the neural network (here named 'model') by creating an instance of our NeuralNet() class. In addition, we move the model to the GPU (if available) by including the .to(device) method.

Finally, we can choose to print the model to examine the neural network object we have built:

print(model)

Output:

NeuralNet(
  (flat_f): Flatten(start_dim=1, end_dim=-1)
  (neur_net): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=10, bias=True)
  )
)

Step 6: Choose Loss Function and Optimizer

Next, we’ll need to specify our loss function and our optimizer algorithm.

Choosing Cross Entropy Loss

Recall the loss function measures how far the model’s guess is from the correct answer for a given input. Adjusting weights and biases to minimize loss is how neural networks learn (see the Finxter article “How Neural Networks Learn” for details.).

There are multiple choices of loss functions available, and learning about these various functions is something you will want to do, because which loss choice is most suitable depends on the particular kind of problem you are solving.

In this case, we are sorting images into multiple categories.

One of the most suitable loss choices for this case is cross-entropy loss. Cross entropy is an idea taken from information theory, and it is a measure of how many extra bits must be sent when sending a message using a sub-optimized code.

This is beyond the scope of this exercise, but we can understand its usefulness to our situation if we examine the calculation involved:

That is, for each category multiply the true probability t by the log of the model’s estimated probability p, and add them all up.

Of course, t is zero for each incorrect category, and 1 for the correct category.

Consequently, for any given image, just the correct category is selected to contribute to the loss calculation, and that loss is the negative of the log of the probability estimate.

Recall this is what the log() function looks like:

Since the network provides a probability estimate we are only interested in the interval (0,1]. Here is what the negative of the log() looks like over that interval:

So the loss is very large when the network gives a low probability estimate (near zero) for the correct category, and the loss is lowest (near zero) when the network gives a high probability estimate (near 1.0) for the correct category.

Here is the code specifying cross entropy loss as the loss function:

loss_fn = nn.CrossEntropyLoss()

Choosing Optimizer Algorithm

We also need to choose the optimizer algorithm. This is the method used to minimize the loss through training. Multiple different optimizers may be chosen, and you will want to learn about the various optimizers available.

All are variations on gradient descent.

For example, some include extinction of the learning rate; others include momentum that helps drive loss away from local minima.

In our case, we will choose plain-old vanilla stochastic gradient descent. Here is the code specifying the optimizer and its learning rate:

learning_rate = 1e-3
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Step 7: Specify Training and Testing Functions

Now we define functions for training and testing the neural network.

Training Function

Here is the code specifying the training function:

def train_nn(dataloader, model, loss_fn, optimizer):
  size = len(dataloader.dataset)
  for batch, (X, y) in enumerate(dataloader):
    X, y = X.to(device), y.to(device)
        
    # For each image in batch X, compute prediction
    pred = model(X)
    # Compute average loss for the set of images in batch
    loss = loss_fn(pred, y)

    # Backpropagation
    optimizer.zero_grad()   # Zero gradients
    loss.backward()         # Computes gradients
    optimizer.step()        # Update weights, biases according to gradients, factored by learning rate

    if batch % 100 == 0:      # Report progress every 100 batches
      loss, current = loss.item(), batch * len(X)
      print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

We pass into the function the dataloader, model, loss function, and optimizer objects.

The function then loops over minibatches from the dataloader.

For each loop, a minibatch of the input images X and the labels y is retrieved and then moved to the GPU (if available).

Then the neural network model calculates predictions from the input images X. These predictions and the correct labels y are used to calculate the loss (note this loss is a single number that is the average loss for the minibatch).

Once the loss is calculated, the function can adjust weights and biases (backpropagate) in three code steps.

First, gradient attributes are zeroed out using optimizer.zero_grad() (PyTorch defaults to accumulating gradient calculations, so they need to be zeroed out on each iteration of the loop, or else they’ll keep accumulating data).

Then the gradients are calculated using loss.backward(). Finally, weights and biases are updated according to the gradients using optimizer.step().

Finally, a small section is included to report progress every 100 batches. This prints out the current loss, and how many images of the total images have been completed.

Testing Function

Here is the code specifying the testing function:

def test_loop(dataloader, model, loss_fn):      # After each epoch, test training results (report categorizing accuracy, loss)
    size = len(dataloader.dataset)              # Number of image/label pairs in dataset
    num_batches = len(dataloader)
    test_loss, correct = 0, 0                   # Initialize variables tracking loss and accuracy during test loop

    with torch.no_grad():                       # Disable gradient tracking - reduces resource use and speeds up processing
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)

            pred = model(X)                     # Get predictions from the neural network based on input minibatch X
            test_loss += loss_fn(pred, y).item()  # Accumulate loss values during loop through dataset
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()    # Accumulate correct predictions during loop through dataset

    test_loss /= num_batches                    # Calculate average loss
    correct /= size                             # Calculate accuracy rate
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")   # Report test results

This function tests the accuracy of the network using the test data.

First, we pass in the testing data loader, the model, and the loss function (for testing loss). Then the function initializes several variables, especially test_loss and correct for accumulating test results during the test loop.

The function does the next few steps within a with torch.no_grad(): subsection.

Here is why: PyTorch stores calculations from the forward pass for later use during the backpropagation gradient calculations.

The torch.no_grad() method turns that off while in this with subsection, since there will be only a forward pass during the testing. This saves resources and speeds up processing. You will want to do the same thing once you have a trained network that is used for classifying in production.

After leaving the with subsection the calculation-storing feature automatically resumes.

Note: be aware that storing calculations is turned on (requires_grad=True) because we are using Modules from the nn library (Linear, ReLU). Otherwise, PyTorch tensors default to requires_grad=False.

Then the function uses a for loop to iterate through the minibatches of the test dataloader. For each iteration, the neural network model computes predictions from the minibatch of images. The loss is calculated for the minibatch, which is then accumulated in test_loss.

Then the number of correct predictions for the minibatch is found as follows: first note that pred is a set of 10-element vectors, with each element an estimate of the probability of that element index being the correct prediction.

The .argmax(1) method returns the index of the largest estimate (the number 1 in the argmax() argument indicates which dimension to use for the operation). This list (tensor) of indices is compared to the list (tensor) of correct labels in y.

This results in a list (tensor) containing True where there is a match, and False otherwise. The type(torch.float) method converts these into floating point 1’s and 0’s.

The sum() method adds all the elements together. Then finally, the .item() method converts the totaled one-element tensor into a raw number (scalar).

Finally, we have the total number of correct predictions for that batch, which is added to the correct variable that accumulates the total number of correct predictions as the for loop iterates through the dataloader.

Train and Test the Network

Now we have written enough code, we can write a small main program loop to train and test the network. We specify how many epochs we wish to run, then we loop through those epochs, training and testing the network for each one.

Here is the code:

# The main program!

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_nn(mnist_train_dl, model, loss_fn, optimizer)
    test_loop(mnist_test_dl, model, loss_fn)
print("Done!")

Output:

Epoch 1
-------------------------------
loss: 2.102096  [    0/60000]
loss: 2.119211  [10000/60000]
loss: 2.068424  [20000/60000]
loss: 2.056982  [30000/60000]
loss: 2.028877  [40000/60000]
loss: 1.995214  [50000/60000]
Test Error: 
 Accuracy: 65.9%, Avg loss: 2.000194 

Epoch 2
-------------------------------
loss: 2.018245  [    0/60000]
loss: 1.996478  [10000/60000]
loss: 1.969913  [20000/60000]
loss: 1.999372  [30000/60000]
loss: 1.944238  [40000/60000]
loss: 1.863184  [50000/60000]
Test Error: 
 Accuracy: 67.8%, Avg loss: 1.866808 

Epoch 3
-------------------------------
loss: 1.921477  [    0/60000]
loss: 1.891367  [10000/60000]
loss: 1.840778  [20000/60000]
loss: 1.751534  [30000/60000]
loss: 1.718531  [40000/60000]
loss: 1.800236  [50000/60000]
Test Error: 
 Accuracy: 69.5%, Avg loss: 1.695623 

Epoch 4
-------------------------------
loss: 1.692079  [    0/60000]
loss: 1.752511  [10000/60000]
loss: 1.600570  [20000/60000]
loss: 1.582768  [30000/60000]
loss: 1.532521  [40000/60000]
loss: 1.569566  [50000/60000]
Test Error: 
 Accuracy: 71.9%, Avg loss: 1.498120 

Epoch 5
-------------------------------
loss: 1.507337  [    0/60000]
loss: 1.515740  [10000/60000]
loss: 1.437465  [20000/60000]
loss: 1.424620  [30000/60000]
loss: 1.409456  [40000/60000]
loss: 1.385026  [50000/60000]
Test Error: 
 Accuracy: 74.6%, Avg loss: 1.300192 

Done!

After just 5 epochs, the accuracy isn’t very good yet, but we can see that things are moving in the right direction.

Obviously, if we wanted to get good performance we would need to train for more epochs. Figuring out how much to train (being careful not to overfit!) is something a neural network engineer has to work out.

Reviewing the Big Picture

It may seem like we have gone over a lot, and we have, but if you step back and look at the big picture there isn’t a lot here.

It may seem like a lot because we have reviewed everything in detail to make sure we convey full understanding.

However, to gain some perspective, let’s show all the essential code, without all the extra description and explanation (note, we’re also skipping the code here used to review the dataset):

Import Necessary Libraries

import torch
from torch import nn
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor

Acquire the Data

# Download MNIST data, put it in pytorch dataset
mnist_data = datasets.MNIST(
    root='mnist_nn',
    train=True,
    download=True,
    transform=ToTensor()
)

mnist_test_data = datasets.MNIST(
    root='mnist_nn',
    train=False,
    download=True,
    transform=ToTensor()
)

Create Dataloaders

batch_size = 100
mnist_train_dl = DataLoader(mnist_data,
                      batch_size=batch_size,
                      shuffle=True)

mnist_test_dl = DataLoader(mnist_test_data,
                           batch_size=batch_size,
                           shuffle=True)

Check for GPU

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")
# Using cuda device

Design and Create the Neural Network

class NeuralNet(nn.Module):
  def __init__(self):
    super().__init__()          # Required to properly initialize class, ensures inheritance of the parent __init__() method
    self.flat_f = nn.Flatten()  # Creates function to smartly flatten tensor
    self.neur_net = nn.Sequential(
        nn.Linear(28*28, 512),
        nn.ReLU(),
        nn.Linear(512, 256),
        nn.ReLU(),
        nn.Linear(256,10)
    )

  def forward(self, x):
    x = self.flat_f(x)
    logits = self.neur_net(x)
    return logits

model = NeuralNet().to(device)

Choose Loss Function and Optimizer

loss_fn = nn.CrossEntropyLoss()
learning_rate = 1e-3
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Specify Training and Testing Functions

def train_nn(dataloader, model, loss_fn, optimizer):
  size = len(dataloader.dataset)
  for batch, (X, y) in enumerate(dataloader):
    X, y = X.to(device), y.to(device)
        
    # For each image in batch X, compute prediction
    pred = model(X)
    # Compute average loss for the set of images in batch
    loss = loss_fn(pred, y)

    # Backpropagation
    optimizer.zero_grad()   # Zero gradients
    loss.backward()         # Computes gradients
    optimizer.step()        # Update weights, biases according to gradients, factored by learning rate

    if batch % 100 == 0:      # Report progress every 100 batches
      loss, current = loss.item(), batch * len(X)
      print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

def test_loop(dataloader, model, loss_fn):      # After each epoch, test training results (report categorizing accuracy, loss)
    size = len(dataloader.dataset)              # Number of image/label pairs in dataset
    num_batches = len(dataloader)
    test_loss, correct = 0, 0                   # Initialize variables tracking loss and accuracy during test loop

    with torch.no_grad():                       # Disable gradient tracking - reduces resource use and speeds up processing
        for X, y in dataloader:
            X, y = X.to(device), y.to(device)

            pred = model(X)                     # Get predictions from the neural network based on input minibatch X
            test_loss += loss_fn(pred, y).item()  # Accumulate loss values during loop through dataset
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()    # Accumulate correct predictions during loop through dataset

    test_loss /= num_batches                    # Calculate average loss
    correct /= size                             # Calculate accuracy rate
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")   # Report test results

Train and Test the Network

# The main program!

epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_nn(mnist_train_dl, model, loss_fn, optimizer)
    test_loop(mnist_test_dl, model, loss_fn)
print("Done!")

Really we have written just a few dozen lines of code, comparable to the size program a hobbyist programmer might write.

Yet we’ve built a world-class neural network that converts hand-written digits to numbers a computer can work with. That’s pretty amazing!

Of course, this is all possible thanks to the efforts of the many engineers who wrote the many more lines of code within PyTorch. Thank you to all of you who have contributed to PyTorch!

This is another example of achieving great things by standing on the shoulders of giants!

Saving and Reloading the Network

We have built, trained, and tested a neural network, and that’s great. But really, the point of training a neural network is to put it to use. To support that, we need to be able to save and reload the network for later use.

Use the following code to save the weights and biases of your neural network (note: the common convention is to save these files with extension .pt or .pth):

torch.save(network_name.state_dict(), 'filename.pth')

Since we named our network model we would save as follows:

torch.save(model.state_dict(), 'model_weights.pth')

To reload, first create an instance of your neural network (make sure you have access to the class/neural network you originally specified). In our example:

user_model = NeuralNet().to(device)

Then load the new instance with your saved weights and biases:

user_model.load_state_dict(torch.load('model_weights.pth'))
#

Some of the modules perform differently when in training rather than when in use.

Specifically, when in training mode, some of them implement various regularization methods which are used to resist the onset of overfitting.

These methods may include some randomness and can cause the network to give inconsistent results. To avoid this, make sure you are in evaluation mode and not training mode:

user_model.eval()

Output:

NeuralNet(
  (flat_f): Flatten(start_dim=1, end_dim=-1)
  (neur_net): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=256, bias=True)
    (3): ReLU()
    (4): Linear(in_features=256, out_features=10, bias=True)
  )
)

As you can see this command conveniently reports the neural network structure.

Let’s make sure our reloaded network works.

It would be best to test with some new handwritten digits, but for the sake of convenience lets just test it with the first ten test images (especially since the network was not trained very heavily).

Let’s look at these first ten images in the test dataset:

fig, axs = plt.subplots(2, 5, figsize=(8, 5))
for a_row in range(2):
  for a_col in range(5):
    img_no = a_row*5 + a_col
    img = mnist_test_data[img_no][0].squeeze()
    img_tgt = mnist_test_data[img_no][1]
    axs[a_row][a_col].imshow(img, cmap='gray')
    axs[a_row][a_col].set_xticks([])
    axs[a_row][a_col].set_yticks([])
    axs[a_row][a_col].set_title(img_tgt, fontsize=20)
plt.show()

Now let’s see if the network detects these images properly:

def eval_image(model, imgno):
  testimg = mnist_test_data[imgno][0]       # assign first image to variable 'testimg'
  testimg = testimg.to(device)              # move image data to GPU
  logits = model(testimg)                   # run image through network
  return logits.argmax().item()             # argmax id's value, returns it

for img_no in range(10):
  img_val = eval_image(model, img_no)
  print(img_val)

Output:

The results are not perfect, but for an incompletely trained network that’s not bad! The few failure are plausible given the incomplete training. Our network works with the saved and reloaded weights and biases!

Conclusion

We hope you have found this article educational, and we hope it inspires you to go and build your own working neural networks using PyTorch!

The post Using PyTorch to Build a Working Neural Network appeared first on Be on the Right Side of Change.

Tensors: The Vocabulary of Neural Networks

Aaron Glatzer — Fri, 26 Aug 2022 13:20:25 +0000

In this article, we will introduce one of the core elements describing the mathematics of neural networks: tensors.

Although typically, you won’t work directly with tensors (usually they operate under the hood), it is important to understand what’s going on behind the scenes. In addition, you may often wish to examine tensors so that you can look directly at the data, or look at the arrays of weights and biases, so it’s important to be able to work with tensors.

Note: This article assumes you are familiar with how neural networks work. To review those basics, see the article The Magic of Neural Networks: History and Concepts. It also assumes you have some familiarity with Python’s object oriented programming.

Theoretically, we could use pure Python to implement neural networks.

We could use Python lists to represent data in the network;
We could use other lists representing weights and biases in the network; and
We could use nested for loops to perform the operations of multiplying the inputs by the connection weights.

There are a few issues with this, however: Python, especially the list data type, performs rather slowly. Also, the code would not be very readable with nested for loops.

Instead, the libraries that implement neural networks in software packages such as PyTorch use tensors, and they run much more quickly than pure Python. Also, as you will see, tensors allow much more readable descriptions of networks and their data.

Tensors

Tensors are essentially arrays of values. Since neural networks are essentially arrays of neurons, tensors are a natural fit for describing them. They can be used for describing the data, describing the network connection weights, and other things.

A one-dimensional tensor is known as a vector. Here is an example:

Vectors can also be written horizontally. Here’s the same vector written horizontally:

Switching a vector from vertical to horizontal, or vice versa, is called transposing, and is sometimes needed depending on the math specifics. We will not go into detail on this in this article (see here for more).

Vectors are typically used to represent data in the network. For example, each individual element in a vector can represent the input value for each individual input neuron in the network.

2D Tensor Matrix

A two-dimensional tensor is known as a matrix. Here’s an example:

For a fully connected network, where each neuron in one layer connects to every neuron in the next layer, a matrix is typically used to represent all the connection weights. If there are m neurons connected to n neurons you would need an n x m matrix to describe all the connection weights.

Here’s an example of two neurons connected to three neurons. Here is the network, with connection weights included:

And here is the connection weights matrix:

Why We Use Tensors

Before we finish introducing tensors, let’s use what we’ve seen so far to see why they’re so important to use when modeling neural networks.

Let’s introduce a two-element vector of data and run it through the network we just showed.

Info: Recall neurons add together their weighted inputs, then run the result through an activation function.

In this example, we are ignoring the activation function to keep things simple for the demonstration.

Here is our data vector:

Here’s a diagram depicting the operation:

Let’s calculate the operation (the neuron computations) by hand:

The final result is a 3 element vector:

If you have learned about matrices in grade school and remember doing matrix multiplication, you may note that what we just calculated is identical to matrix multiplication:

Note: Recall matrix multiplication involves multiplying first matrix rows by second matrix columns element-wise, then adding elements together.

This is why tensors are so important for neural networks: tensor math precisely describes neural network operation.

As an added benefit, the equation above showing matrix multiplication is so much more a succinct description than nested for loops would be.

If we introduce the nomenclature of bold lower case for a vector and bold upper case for a matrix, then the operation of vector data running through a neural network weight matrix is described by this very compact equation:

We will see later that matrix multiplication within PyTorch is a similarly compact code equation.

Higher Dimensional Tensors

A three-dimensional (3D) tensor is known simply as a tensor. As you can see, the term tensor generically refers to any dimensional array of numbers. It’s just one-dimensional and two-dimensional tensors that have the unique names “vector” and “matrix” respectively.

You might not think that there is a need for three-dimensional and larger tensors, but that’s not quite true.

A grayscale image is clearly a two-dimensional tensor, in other words, a matrix. But a color image is actually three two-dimensional arrays, one each for red, green, and blue color channels. So a color image is essentially a three-dimensional tensor.

In addition, typically we process data in mini-batches. So if we’re processing a mini-batch of color images we have the three-dimensional aspect already noted, plus one more dimension of the list of images in the mini-batch. So a mini-batch of color images can be represented by a four-dimensional tensor.

Tensors in Neural Network Libraries

One Python library that is well suited to working with arrays is NumPy. In fact, NumPy is used by some users for implementing neural networks. One example is the scikit-learn machine learning library which works with NumPy.

However, the PyTorch implementation of tensors is more powerful than NumPy arrays. PyTorch tensors are designed with neural networks in mind. PyTorch tensors have these advantages:

PyTorch tensors include gradient calculations integrated into them.
PyTorch tensors also support GPU calculations, substantially speeding up neural network calculations.

However, if you are used to working with NumPy, you should feel fairly at home with PyTorch tensors. Though the commands to create PyTorch tensors are slightly different, they will feel fairly familiar. For the rest of this article, we will focus exclusively on PyTorch tensors.

Tensors in PyTorch: Creating Them, and Doing Math

OK, let’s finally do some coding!

First, make sure that you have PyTorch available, either by installing on your system or by accessing it through online Jupyter notebook servers.

Reference: See PyTorch’s website for instructions on how to install it on your own system.

See this Finxter article for a review of available online Jupyter notebook services:

Recommended Tutorial: Top 4 Jupyter Notebook Alternatives for Machine Learning

For this article, we will use the online Jupyter notebook service provided by Google called Colab. PyTorch is already installed in Colab; we simply have to import it as a module to use it:

import torch

There are a number of ways of creating tensors in PyTorch.

Typically you would be creating tensors by importing data from data sets available through PyTorch, or by converting your own data into tensors.

For now, since we simply want to demonstrate the use of tensors we will use basic commands to create very simple tensors.

You can create a tensor from a list:

t_list = torch.tensor([[1,2], [3,4]])
t_list

Output:

tensor([[1, 2],
        [3, 4]])

Note that when we evaluate the tensor variable, the output is labeled to indicate it as a tensor. This means that it is a PyTorch tensor object, so an object within PyTorch that performs just like math tensors, plus has various features provided by PyTorch (such as supporting gradient calculations, and supporting GPU processing).

You can create tensors filled with zeros, filled with ones, or filled with random numbers:

t_zeros = torch.zeros(2,3)
t_zeros

Output:

tensor([[0., 0., 0.],
        [0., 0., 0.]])

t_ones = torch.ones(3,2)
t_ones

Output:

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])

t_rand = torch.rand(3,2,4)
t_rand

Output:

tensor([[[0.9661, 0.3915, 0.0263, 0.2753],
         [0.7866, 0.0503, 0.3963, 0.1334]],

        [[0.4085, 0.1816, 0.2827, 0.3428],
         [0.9923, 0.4543, 0.0872, 0.0771]],

        [[0.2451, 0.6048, 0.8686, 0.8148],
         [0.7930, 0.4150, 0.6125, 0.3401]]])

An important attribute to be familiar with to understand the shape of a tensor is the appropriately named shape attribute:

t_rand.shape
# Output: torch.Size([3, 2, 4])

This shows you that tensor “t_rand” is a three-dimensional tensor composed of three elements of two rows by four columns.

Note: The dimensions of a tensor is referred to as its rank. A one-dimensional tensor, or vector, is a rank-1 tensor; a two-dimensional tensor, or matrix, is a rank-2 tensor; a three-dimensional tensor is a rank-3 tensor, and so on.

Let’s do some math with tensors – let’s add two tensors together:

Note the tensors are added together element-wise. Now here it is in PyTorch:

t_first = torch.tensor([[1,2], [3,4]])
t_second = torch.tensor([[5,6],[7,8]])
t_sum = t_first + t_second
t_sum

Output:

tensor([[ 6,  8],
        [10, 12]])

Let’s add a scalar, that is, an independent number (or a rank-0 tensor!) to a tensor:

t_add3 = t_first + 3
t_add3

Output:

tensor([[4, 5],
        [6, 7]])

Note that the scalar is added to each element of the tensor. The same applies when multiplying a scalar by a tensor:

t_times3 = t_first * 3
t_times3

Output:

tensor([[ 3,  6],
        [ 9, 12]])

The same kind of thing applies to raising a tensor to a power, that is the power operation is applied element-wise:

t_squared = t_first ** 2
t_squared

Output:

tensor([[ 1,  4],
        [ 9, 16]])

Recall that after summing weighted inputs, the neuron processes the result through an activation function. Note that the same performance applies here as well: when a vector is processed through an activation function, the operation is applied to the vector element-wise.

Earlier, we pointed out that matrix multiplication is an important part of neural network calculations.

There are two ways to do this in PyTorch: you can use the matmul function:

t_matmul1 = torch.matmul(t_first, t_second)
t_matmul1

Output:

tensor([[19, 22],
        [43, 50]])

Or you can use the matrix multiplication symbol “@“:

t_matmul2 = t_first @ t_second
t_matmul2

Output:

tensor([[19, 22],
        [43, 50]])

Recall previously, we showed running an input signal through a neural network, where a vector of input signals was multiplied by a matrix of connection weights.

Here is that in PyTorch:

x = torch.tensor([[7],[8]])
x

Output:

tensor([[7],
        [8]])

W = torch.tensor([[1,4], [2,5], [3,6]])
W

Output:

tensor([[1, 4],
        [2, 5],
        [3, 6]])

y = W @ x
y

Output:

tensor([[39],
        [54],
        [69]])

Note how compact and readable that is instead of doing nested for loops.

Other math can be done with tensors as well, but we have covered most situations that are relevant to neural networks. If you find you need to do additional math with your tensors, check PyTorch documentation or do a web search.

Indexing and Slicing Tensors

Slicing allows you to examine subsets of your data and better understand how the dataset is constructed. You may find you will use this a lot.

Indexing Slicing PyTorch vs NumPy vs Python Lists

Indexing and slicing tensors work the same way it does with NumPy arrays. Note that the syntax is different from Python lists. With Python lists, a separate pair of brackets are used for each level of nested lists. Instead, with Pytorch one pair of brackets contains all dimensions, separated by commas.

Let’s find the item in tensor “t_rand” that is 2nd element, first row, third column. First here is “t_rand” again:

t_rand

Output:

tensor([[[0.9661, 0.3915, 0.0263, 0.2753],
         [0.7866, 0.0503, 0.3963, 0.1334]],

        [[0.4085, 0.1816, 0.2827, 0.3428],
         [0.9923, 0.4543, 0.0872, 0.0771]],

        [[0.2451, 0.6048, 0.8686, 0.8148],
         [0.7930, 0.4150, 0.6125, 0.3401]]])

And here is the item at the 2nd element, first row, and third column (don’t forget indexing starts at zero):

t_rand[1, 0, 2]
# Output: tensor(0.2827)

Let’s look at the slice second element, first row, second through third columns:

t_rand[1, 0, 1:3]
# tensor([0.1816, 0.2827])

Let’s look at the entire 3rd column:

t_rand[:, :, 2]

Output:

tensor([[0.0263, 0.3963],
        [0.2827, 0.0872],
        [0.8686, 0.6125]])

Important Slicing Tip: In the above, we use the standard Python convention that a blank before a “:” means “start from the beginning”, and a blank after a “:” means “go all the way to the end”. So a “:” alone means “include everything from beginning to end”.

A likely use for slicing would be to look at a full array (i.e. a matrix) within a set of arrays, i.e. one image out of a set of images.

Let’s pretend our “t_rand” tensor is a list of images. We may wish to sample just a few “images” to get an idea of what they are like.

Let’s examine the first “image” in our tensor (“list of images”):

t_rand[0]

Output:

tensor([[0.9661, 0.3915, 0.0263, 0.2753],
        [0.7866, 0.0503, 0.3963, 0.1334]])

And here is the last array (“image”) in tensor “t_rand”:

t_rand[-1]

Output:

tensor([[0.2451, 0.6048, 0.8686, 0.8148],
        [0.7930, 0.4150, 0.6125, 0.3401]])

Using small tensors to demonstrate indexing can be instructive, but let’s see it in action for real. Let’s examine some real datasets with real images.

Real Example

We won’t describe the following in detail, except to note that we are importing various libraries that allow us to download and work with a dataset. The last line creates a function that converts tensors into PIL images:

import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt

import torchvision.transforms as T

conv_to_PIL = T.ToPILImage()

The following downloads the Caltech 101 dataset, which is a collection of over 8000 images in 101 categories:

caltech101_data = datasets.Caltech101(
    root="data",
    download=True,
    transform=ToTensor()
)

Extracting data/caltech101/101_ObjectCategories.tar.gz to data/caltech101
Extracting data/caltech101/Annotations.tar to data/caltech101

This has created a dataset object which is a container for the data. These objects can be indexed like lists:

len(caltech101_data)
# 8677

type(caltech101_data[0])
# tuple

len(caltech101_data[0])
# 2

The above code shows the dataset contains 8677 items. Looking at the first item of the set we can see they are tuples of 2 items each. Here are the kinds of items in the tuples:

type(caltech101_data[0][0])
# torch.Tensor

type(caltech101_data[0][1])
# int

The two items in the tuple are the image as a tensor, and an integer code corresponding to the image’s category.

Colab has a convenient function display() which will display images. First, we use the conversion function we created earlier to convert our tensors to a PIL image, then we display the images.

img = conv_to_PIL(caltech101_data[0][0])
display(img)

We can use indexing to sample and display a few other images from the set:

img = conv_to_PIL(caltech101_data[1234][0])
display(img)

img = conv_to_PIL(caltech101_data[4321][0])
display(img)

Summary

We have learned a number of things:

What tensors are
Why tensors are key mathematical objects for describing and implementing neural networks
Creating tensors in PyTorch
Doing math with tensors in PyTorch
Doing indexing and slicing of tensors in PyTorch, especially to examine images in datasets

We hope you have found this article informative. We wish you happy coding!

The next article in the series is the following:

Recommended Tutorial: Using PyTorch to Build a Working Neural Network

Programmer Humor

It’s hard to train deep learning algorithms when most of the positive feedback they get is sarcastic. — from xkcd

The post Tensors: The Vocabulary of Neural Networks appeared first on Be on the Right Side of Change.

How Neural Networks Learn

Aaron Glatzer — Fri, 12 Aug 2022 06:52:25 +0000

Artificial neural networks have become a powerful tool providing many benefits in our modern world. They are used to filter out spam, to perform voice recognition, and are even being developed to drive cars, among many other things.

As remarkable as these tools are, they are readily within the grasp of almost anyone. If you have technical interest and have some experience with computer programming you can build your own neural networks.

But before you learn the hands-on details of building neural networks you should learn some of the fundamentals of how they work. This article will cover one of those fundamentals – how neural networks learn.

Note: This article includes some algebra and calculus. If you’re not comfortable with algebra, you should still be able to understand the content from the graphs and descriptions. The calculus is not done in any detail. Again you should still be able to follow along from the descriptions. You will not learn the details of how the calculations are done. Instead, you will gain an intuitive understanding of what is going on.

Before learning this, you should be familiar with the basics of how neural networks are structured and how they operate. The article “The Magic of Neural Networks: History and Concepts” covers these basics. Still, we offer the following brief refresher.

Basic Fundamentals: How Neural Networks Work

Figure 1 shows an artificial neuron.

Figure 1: artificial neuron

Signals from other neurons come in through multiple inputs, each multiplied by its corresponding weight (Weights express the connection strengths between the neuron and each of its upstream neurons.).

A bias is input as well (bias expresses a neuron’s inherent activation, independent of its input from other neurons.). All these inputs add together, and the resulting total signal is then processed through the activation function (A sigmoid function is shown here.).

Figure 2: neural network classifying an image (Dog photo by Garfield Besa)

Figure 2 shows a network of these neurons. Signals are introduced on the input side, and they progress through the network, passing through neurons and along their connections, getting processed by the calculations described above. How the signals are processed, depends on the weights and biases among all the neurons.

The key takeaway is that it is the settings of the weights and biases that establish how the network as a whole computes. In other words, the learning and memory of the network is encoded by the weights and biases.

So how does one program these weights and biases?

They are set by training the network with samples and letting it learn by example. The details of how that is done is the subject of this article.

Overview of How Neural Networks Learn

As mentioned, a neural network’s learning and memory is encoded by the connection weights and biases of the neurons throughout the network.

These weights and biases are set by training the network on examples by following this six-step training procedure:

Provide a sample to the network.
Since the network is untrained, it will probably get the wrong answer.
Compute how far this answer is from the correct answer. This error is known as loss.
Calculate what changes in the weights and biases will make the loss smaller.
Make adjustments to those weights and biases as determined by those calculations.
Repeat this again and again with numerous samples until the network learns to answer the samples correctly.

Presenting Samples and Calculating Loss

Let’s review some of this in more detail while considering a use case.

Imagine we want to train a network to estimate crowd size.

To do this we must first train the network with a large set of images of crowds. For each image the number of people are counted. We then include labels indicating correct crowd size for each picture. This is known as a training set.

The pictures are submitted to the network, which then indicates its crowd estimate for each picture. Since the network is not trained, it surely gets the estimate wrong for each image.

For each image/label pair, the network calculates the loss for that sample.

Multiple possible choices can be used for calculating loss. One can choose any calculation that appropriately expresses how far the network’s answer is from the correct answer.

An appropriate choice for crowd-size loss estimate is the square error:

where:

Suppose we submit an image showing a crowd size of 500 people. Figure 3 shows how the error varies for crowd estimates around the true crowd size of 500 people.

Figure 3

If the Network guesses 350 people the loss is 22500. If the network guesses 600 people the loss is 10000.

Clearly, the loss is minimized when the network guesses the correct crowd size of 500 people.

But recall we said it is the weights and biases in the network that encode its learning and memory, so it is the weights and biases that determine if the network gets the right answer. So we need to adjust the weights and biases so that the network gets closer to the correct answer for this image.

In other words, we need to change the weights and biases to minimize the loss. To do that, we need to figure out how the loss varies when we vary the weights and biases.

Minimizing Loss: Calculus and the Derivative

So how do we calculate how loss changes when we vary weights and biases?

This is where calculus comes in.

(Don’t worry if you don’t know calculus, we’ll show you everything you need to know, and we’ll keep it intuitive.)

Calculus is all about determining how one variable is affected by changes in another variable.

(Strictly speaking there’s more to calculus than that, but this idea is one of the core ideas of calculus.)

The loss L depends on network output y, but y depends on input, and on weights w and biases b. So there is a somewhat long and complicated chain of dependencies we have to go through to figure out how L varies when w and b vary.

However, for the sake of learning, let’s instead start by just examing how L varies when y varies, since this is simpler and will help develop an intuition for calculus.

How L depends on y is somewhat easy – we saw the equation for it earlier, and we saw the graph of that equation in Figure 3. We can tell by looking at the graph that if the network guesses 350 then we need to increase the output y in order to reduce the loss, and that if the network guesses 600 then we need to decrease the output y in order to reduce the loss.

But with neural networks, we never have the luxury of being able to examine the graph of the loss to figure it out.

We can, however, use calculus to get our answer. To do this, we do what is called taking the derivative.

Here is the derivative of the equation for the graph in Figure 3 (note, we will not explain how this is calculated, that is the domain of a calculus course.):

This is typically referred to as “taking the derivative of L with respect to y”. You can read that dL/dy as saying “this is how L changes when y changes”. Now let’s calculate how L changes when y changes at the point y = 350:

So at y = 350, for every bit y increases, L decreases by 300. That implies that when we increase y the loss will decrease.

Now let’s calculate how L changes when y changes at the point y = 600:

So at y = 600, for every bit y increases, L increases by 200. Since we want to decrease L, that means we need to decrease y.

These calculations match what we concluded from looking at the graph.

You can also read dL/dy as saying “this is the slope of the graph”.

This makes sense: at point y = 350 the slope of the graph is -300 (sloping down steeply), while at point y = 600 the slope of the graph is 200 (sloping up, not quite so steeply).

So by using calculus and taking the derivative, we can figure out which way to change y to reduce the loss L, even when we can’t see the graph to figure it out.

Recall, however, that we want to figure out how to change the weights and biases to reduce the loss L. Also recall there is a chain of dependencies, of L depending on y, which itself depends on w and b (for several layers worth of w and b!), and on input.

So a full description could result in some rather complicated equations and some difficult derivatives. For those curious about the math details, the method for figuring out derivatives when there is such dependencies is called the chain rule.

Fortunately, with modern neural network software, the computer takes care of calculating derivatives and keeping track of and resolving the chains of dependencies in the derivatives. Just understand that, even if we can’t see its graph:

there is some relationship between the loss L and the weights w and biases b (a “graph”)
there is some set of weights and biases where the loss L is at a minimum for a given input
we can use calculus to figure out how to adjust the weights and biases to minimize loss

The Loss Surface and Gradient Descent

Let’s consider a very simple case where there are just two weights, w1 and w2, and no biases. The graph of L as a function of w1 and w2 might look like figure 4.

Figure 4: bowl-shaped error graph

In this example, with two independent weights, we end up with a bowl-shaped surface for the loss graph. In this case, the loss is minimized when w1 = 4 and w2 = 3. In the beginning, when the network is not yet trained the weights (initially set to small random numbers) are almost certainly not at the correct values for the loss to be at a minimum.

We still figure out which direction to change the weights to reduce the loss by taking the derivative.

Only this time, since there are two independent variables, we take the derivative with respect to each independently.

Important: The result is, for any given point on the loss surface, a direction (a vector, or an arrow) pointing in which direction the loss increases the fastest (“uphill”). This is known as the gradient (instead of derivative). Since we want to reduce loss, we move in the opposite direction, the negative of the gradient.

The larger point is we are still using calculus to figure out which direction to change weights to reduce loss. Repeatedly doing this moves the weights closer to the values which make the network give the correct answer for a given input. This is known as gradient descent.

However, most neural networks have many more than two weights, typically dozens for any given layer.

But the same ideas still apply: if we have a layer consisting of 16 weighted connections, the loss is a 16-dimensional surface! You can’t visualize it but it still exists mathematically, and the same principles apply!

You can still calculate the gradient, that is the derivative with respect to all 16 w’s, and figure out which direction to change the w’s to minimize the loss.

So how much do we adjust the weights and biases?

Typically they are adjusted just a small amount. This is because large adjustments can cause problems.

Refer to the loss surface shown in Figure 4. If too large a step is made, you could jump right across the loss surface bowl, even going so far as to make the loss worse!

The adjustment step size is known as the learning rate. Figuring out the best learning rate is one of the tricks to optimizing your network that a neural network engineer has to work out.

Backpropagation

Ultimately all of the weights and biases throughout the network have to be adjusted to minimize loss. This is done back from the loss, working back layer by layer to the beginning of the network, a process called backpropagation.

It has to be done this way because you can’t figure out how the first layer’s weights and biases affect loss until you know how the second layer’s weights and biases affect loss; you can’t tell how the second layer’s weights and biases effect loss until you know how the third layer’s weights and biases effect loss, and so on.

So calculations and adjustments are done starting with the last layer, then working back to the second to the last layer, and so on back to the first layer.

So that’s the core algorithm of training a neural network:

Present example image.
Calculate the loss.
Adjust the network weights and biases through backpropagation, calculating gradient descent, and making adjustments layer by layer.

Batch Size

However, recall that the objective of the training is to adjust the weights and biases for all of the images, not just one.

So how does one train the network, one image at a time, or using the entire set of all training images? Either choice is a possibility.

Ultimately the loss we want to minimize is the loss for the entire set of training samples, so a natural choice might be to run all samples through the network before making adjustments to the weights and biases. This is known as batch processing.

However performing so many calculations before making adjustments can be very demanding on computer resources and can slow the training process down.

How about adjusting weights and biases for each individual training sample? Optimum weights and biases will be different for each training sample, and this variation can introduce large randomness into the gradient descent. This is known as stochastic gradient descent.

To better understand the importance of this refer to the hypothetical loss curve in figure 5:

Figure 5: local and global minimum

Notice that there is more than one minimum: there is a local minimum at point B, which is not quite the lowest loss, and a global minimum at point A that is truly the minimum where the loss is lowest.

It is truly possible (even likely) to get loss curves like this, with multiple local minima, and it’s also possible for the network to get stuck in one of these local minima.

The randomness of single sample training can help knock the network out of a local minimum if it gets stuck in one, so there is some benefit to stochastic gradient descent.

However, the randomness can be so extreme that it can actually knock the network out of the true global minimum if it happens to reach it before a training cycle ends. This can slow the training as the network has to work back down to minimize the loss again.

So in practice, it turns out the best approach is to use minibatches. These are batch sizes of perhaps a few hundred samples that are run through the network, and then adjustments are made.

The network runs through mini batch after many batch until the entire set of training samples has been processed. This has enough randomness to it that it has the same benefit as stochastic gradient descent of pushing the network out of local minima, but not so much randomness that the loss can get worse.

Running through the entire set of training samples once is called an epoch.

Typically networks must run through many epochs to become fully trained. Also the ordering and grouping of training samples within and between batches is randomized from epoch to epoch. This is to avoid overfitting.

Overfitting is when the network performs successfully on the training samples, but fails on samples it has not seen before. This is like a person memorizing a set of samples, rather than generalizing characteritics from those samples so that it can be successful on new samples.

After training the network is then tested on a test set. This is a set of samples the network has not seen before. This allows one to assess how well the trained network performs. It checks to see how effective the network is on unknown samples, and checks to make sure overfitting has not occurred.

How Neural Networks Learn

So that is the full process of how neural networks learn:

Train the network by presenting it minibatches of samples from the training set.
The training algorithm calculates the loss for the minibatch.
The algorithm calculates the gradient of the loss.
The network adjusts weights and biases according to the gradient calculations, through the process of backpropagation and gradient descent.
Running this sequence through all training samples is called an epoch.
This is then repeated for multiple epochs, until the network is successfully trained on the training set.
Finally the network is tested on a test set to make sure it works successfully and does not suffer from overfitting.

We hope you have found this lesson on how neural networks learn informative.

We wish you happy coding!

The post How Neural Networks Learn appeared first on Be on the Right Side of Change.

The Magic of Neural Networks: History and Concepts

Aaron Glatzer — Wed, 27 Jul 2022 12:22:07 +0000

Artificial neural networks have become a powerful tool providing many benefits in our modern world. They are used to filter out spam, perform voice recognition, play games, and drive cars, among many other things.

As remarkable as these tools are, they are readily within the grasp of almost anyone. If you have technical interest and have some experience with computer programming, you can build your own neural networks. Knowledge of some basic algebra and some programming experience is all you need to get started.

And don’t be afraid to read through this article. Don’t worry if you don’t know the algebra – we have tried to make the text understandable by anyone.

What You’ll Learn: In this article, we will go over the fundamentals of how neural networks are built and how they work. When done, you won’t yet know how to build them yourself, but you’ll understand the fundamentals of how they work, which will help you when you get to building your own.

But first we will briefly review a little about real neurons and how this has inspired the development of artificial neural networks.

You can find part 2 of this series in the following tutorial on the Finxter blog:

Part 2: How Do Neural Networks Learn?

A Little History and Inspiration

Throughout the history of artificial neural networks, their development has been influenced by research and understanding of how real neurons operate. Let’s briefly review a simplified understanding of real neurons to provide some inspiration for how artificial neurons might be designed.

Figure 1: A biological neuron with synapsis. Designed by brgfx / Freepik, labeled by author

Figure 1 shows a schematic drawing of a real neuron.

The neuron consists of a collection of dendrites, the soma, which is the cell body, and an axon.

Signals come in through the dendrites. The signals are added together within the soma. If the collection of signals is strong enough the neuron will be triggered to send a spike signal down the axon, thereby sending a signal on to other neurons.

Figure 2 shows real neurons connected together in a network.

Figure 2: Neuroscience vector created by rawpixel.com – www.freepik.com

What kind of signals might these neurons convey?

Even though neurons send a spike signal, other research has shown that with more stimulation of the neuron, the spike signals occur more frequently.

This may suggest it’s actually the frequency of spiking (an analog value) rather than the spike (a digital value) that may be the important signal that neurons convey.

What kind of signal might the neuron finally output?

One can imagine that

with very faint stimulation, a neuron may not output much;
with modest stimulation, a neuron will output more, perhaps in a linear fashion; and
with much more stimulation, the neuron may saturate and not be able to output anymore.

This could result in a sigmoid-shaped output, as in Figure 3.

Figure 3: Low, mid, and high stimulation output

How might these neurons and networks encode and learn their information?

In 1949 Donald O. Hebb proposed a model for how neuron function might contribute to learning in his book “The Organization of Behavior”. He proposed that neural connections are strengthened through use and that this may be the foundation of learning within the brain.

This is sometimes described as “neurons that fire together wire together”, and this is known as Hebbian learning.

The implication here is that through learning and use, some neural connections become stronger than others and that it is the pattern of connection strengths that encodes learning and memory.

Understand that further research has shown real neurons to be more complicated than the simple description here.

However, this description does reflect some properties of real neurons, and it turns out even these relatively simple models can exhibit some remarkable behavior.

Artificial Neurons and Networks

Now that we’ve reviewed some simple properties of real neurons and neural networks, let’s use this simplified understanding of real neurons as inspiration for our design of artificial neurons and networks.

Figure 4 shows our artificial neuron.

Figure 4: Artificial neuron

Like with dendrites in real neurons, signals come in from other neurons through multiple inputs.

Artificial Neural Network Weights

The strengths of those connections are expressed by weights (w1, w2, etc.) shown on each input. Incoming signals are multiplied by the weights so that larger weights result in stronger signals from that connection, reflecting that stronger connection. All of those signals are added up in the node of the neuron.

Artificial Neural Network Bias

For each neuron, there is also one other signal not connected to any other neurons which are added in, which is called the bias. This constant signal determines if that neuron is already enhanced or suppressed on its own, in addition to signals provided by the inputs.

Artificial Neural Network Activation Function

Finally, that total input is passed through a function known as the activation function. This function determines how the neuron responds to the activation by its inputs.

There are multiple different functions that are used as activation functions. We have already justified choosing a sigmoid-shaped function, and sigmoid-shaped functions are a common choice.

Though there are multiple other activation functions considered for use, sigmoid-shaped functions are the easiest to motivate from a biological standpoint.

So to reiterate, here is how we describe the signal processing done by each neuron:

Multiply each input by its weight and add them all up.
Add the bias.
Process the total through the activation function.

And here is how we describe it mathematically:

(add up all the weighted inputs, plus bias)

(process through the activation function)

where:

There we have it – this is our artificial neuron. It’s really quite a simple object: add together weighted inputs, add the bias, and pass that through an activation function for the final output.

This simple object was first introduced by Warren McCulloch and Walter Pitts in their 1943 (!) paper “A logical calculus of the ideas immanent in nervous activity”, except their activation function was a step function instead of the smooth sigmoid-shaped function we discussed before.

Researcher Frank Rosenblatt called this object the perceptron in his 1962 book “Principles of Neurodynamics”.

Figure 5: Multiple layers of neurons (deep neural network) classifying an image. Dog photo by Garfield Besa

Figure 5 shows a network of these simple elements integrated together in a multi-layer artificial neural network. This multi-layer network is sometimes called a multi-layer perceptron (abbreviated MLP).

Signals come in through the input side, say for example a picture of a cat or a dog. The signals then pass through the network, getting processed by neural calculations along the way. Then on the output side the network provides an output indicating whether the image was a cat or a dog.

How Neural Networks Are Programmed

Much of the field of artificial intelligence utilized more traditional computer programming methods using programming languages to model intelligence.

These efforts did achieve some compelling results, such as computers that could play competitive checkers or chess.

However, very simple things that even a young child can do eluded computers, things like being able to recognize what object is in a picture. In fact, this seemingly simple task is actually quite difficult to program. Let’s briefly explore this.

Suppose we want to program the computer to be able to recognize handwritten numerical digits.

In fact, this is a common exercise for beginning neural network students; building a network to solve this problem is considered the neural network version of the “Hello world!” program, and there’s a database called the mnist handwritten digit database for doing just this very problem.

Figure 6 shows a sample of some of the digits from this database.

Figure 6: Hand-writing recognition data set.

Let’s think about how one might program a computer to do this. Just looking at the numbers you recognize them instantly, but how might you write a program to do that?

Look at the various numbers “seven” for example.

Perhaps one could describe it as one horizontal line at the top, and one slanted vertical line below. Would you specify the coordinates where the lines should be? Probably not – what if the number was written off to the side or in a corner of the image? Could there also be a limit to how long or how short these lines are?

As you can see the number of rules for identifying a number “seven” grow quickly and get complicated.

But what about a more sophisticated problem? What about identifying whether a picture is of a cat or a dog?

Not only is there an enormous variety of cats and dogs to distinguish, just figuring out how to describe them so a computer could recognize them is a daunting challenge. Where would one even start?

Instead of writing code to solve this, neural networks are “programmed” in a fashion more like how humans learn – the network is trained with a set of examples, and the network learns from these examples.

More specifically, the network’s learning is encoded in the weights and biases of the network, and computer scientists have figured out algorithms that allow the network to automatically self-adjust its weights and biases.

The process is called back-propogation. This entails the network adjusting weights and biases to get closer to the correct answer, working back from the output back to the input.

Therefore the programmer does not have to figure out how to encode the solution, the network itself figures it out.

Once the network is trained on a set of examples, new cases can be introduced to the network, and the network provides correct answers

(Well, ideally, that is. That is part of the skill of being a neural network engineer – figuring out how best to build and train networks to get the desired performance.).

So what kind of programming does a neural network engineer do?

They write code that describes the structure of the network, such as how many layers, and how many neurons per layer.
They decide which activation function to choose.
They also write other code that specifies how to measure errors, known as loss, that the network makes.
They also make choices about what training data to use and how to adjust network learning.

So even though neural networks learn through example, the neural network engineer has much to do to make that happen.

The “Magic” of Neural Networks

It is astounding that a network of very simple objects could achieve the seeming human-like ability to learn by example and recognize pictures of objects.

It’s not obvious up front that a collection such simple objects could achieve this and, there are so many other things they can do: they can locate objects within an image, they can detect words within a conversation, they can help steer cars, among other things.

It is amazing such simple objects could achieve such sophisticated performance. It really truly does seem almost magical.

We hope you have found this article helpful in gaining a basic understanding of how neural networks work.

Even more, we hope this article has fired your imagination and inspired you to learn more about neural networks, even to the point of building them yourself! Go out there and learn how to build some networks!

We wish you happy coding!

The post The Magic of Neural Networks: History and Concepts appeared first on Be on the Right Side of Change.

Top 4 Jupyter Notebook Alternatives for Machine Learning

Aaron Glatzer — Mon, 25 Apr 2022 12:58:16 +0000

In this article, we review some of the online options for running Python using online (Jupyter) Notebooks.

The Python Landscape

There are a number of platforms available for running Python. Some of these include:

Install Python on your own machine.
Use Jupyter notebooks on your own machine.
Use a data science platform like Anaconda on your own machine to set up the above.
Use one of the numerous online Python shells or interpreters or shells.
Use one of the numerous online Jupyter-Notebook-like online services.

It’s this last option we will review in this article. This is a popular choice in the data science and machine learning fields.

Quick Overview of Online Options

Installing Python on your own machine is perhaps the best approach when writing software. But if you want access to Python online for use anywhere there are a number of available options.

There are a number of sites where you can use an online Python shell, such as www.python.org/shell for example.

There are also script-based implementations of Python online, such as https://www.online-python.com/.

But these free offerings are often limited in how much code you can run and how many resources you can use. They are great for learning Python but can be too limited to use for more ambitious needs.

If you want to run some more demanding processes online in data science or machine learning fields an online Jupyter Notebook service is an effective alternative.

Before we review some of those, let’s review the classic Jupyter Notebook.

A Quick Review of Jupyter Notebooks

When installing and using Python on your own machine you either issue commands in the shell which are executed immediately; or more commonly you write commands in a program file, and then call the interpreter to execute the commands in that file, as a script.

Jupyter Notebooks implement a sort of hybrid version of these two approaches. Jupyter Notebooks are active documents that help an analyst both analyze data and communicate that analysis effectively.

Here are their features and what they do:

Jupyter Notebooks are displayed in a web browser, an interface widely familiar and accessible to all.
They resemble math and science textbooks, where equations and graphs are mixed within explanatory text which describes the subject matter in question.
Most significantly the “equation” portions of Jupyter Notebooks consist of code that can be executed, so that the reader can actually run the code to duplicate the analysis. When the code is run the results (numbers or graphs) are displayed below the code.
In this way they resemble lab notebooks, but where descriptive text is mixed within executable code where the data analysis and experimenting is done.

Jupyter Notebooks are created and edited within a web browser.

When creating a notebook the creator enters content in fields called “cells”. These are simply fields that allow the two kinds of entry, either markdown text or code.

The code cells can be run by hand one at a time, potentially out of order if desired (sort of like the Python shell); or the entire document can be run, cells in order, in a typical script-like manner.

The online services we will review implement the same kind of Jupyter Notebook interface, but provide the service online.

Classic Jupyter Notebook on a home PC (i.e. not online), with one markdown cell, one code cell with results below it, and one empty cell below that.

Advantages to Online Jupyter Notebooks

There are a number of reasons one might choose to use an online Jupiter Notebook service:

You can run Python anywhere you have a computer and an online connection.
These platforms typically provide all the data analysis and machine learning applications (pandas, Numpy, scikit-learn, etc.) which are needed for data analysis and machine learning. Typically most all other Python libraries are available as well.
Typically they provide systems with high-performing GPUs so that your data processing is fast and efficient. These often implement world-class computing capabilities. This is often essential for machine learning models to be effective and efficient. It is the server that provides the computing power, your own computer just needs to be able to display the webpage.
They take care of managing the computer system, so you don’t have to. You can be sure you have the computing resources and packages you need, and that they’ll work out of the box. You can focus on using the tools, rather than working on making sure you have a system up to the task. This can be one of the most beneficial aspects: with no effort you can have access to world-class computing resources.

Now that we understand Jupyter Notebooks, and we have seen the reasons one might choose to use an online platform, let’s review some of them to see what they offer.

Google Colab

Try it here: https://colab.research.google.com/

Google Colaboratory, or Colab for short, is Google’s implementation of online Jupyter Notebooks.

Features

Jupyter-like web interface.
Customizable keystrokes.
Google colab documents are Jupyter Notebook files, so they can be downloaded and viewed in Classic Jupyter Notebook.
These files can be saved in Google Drive and Github. If in Google Drive they can be shared with others there.
Data science packages like pandas, etc. are supported with the import command.
Machine learning packages like scikit-learn, etc. are supported with the import command.
Several tutorial notebooks available for training in data science and machine learning.
Free use of GPU and TPU.
Unable to support voila. (voila combined with ipywidgets hides code cells so that notebooks can look like a normal GUI application.)

Tiers

Colab	Colab Pro	Colab Pro+
free	$9.99/month	49.99/month
	Faster GPUs and TPUs	Priority access to faster GPUs and TPUs
	More memory	Significantly more memory
	Longer runtimes	Even longer runtimes
		Background execution after the browser is closed

The details here are admittedly vague. Google says they are not able to report specifics because they fluctuate, and that they need to maintain that flexibility to maintain their ability to provide free service.

See more details on their FAQ page https://research.google.com/colaboratory/faq.html#resource-limits.

Paperspace Gradient

Learn more: https://gradient.run/

Paperspace is a GPU accelerated cloud computing service. Their Gradient product is dedicated to machine learning.

Features

Jupyter-like web interface.
Can switch to full Jupyter Notebook mode within the browser.
Many available datasets to work with.
Notebooks publicly visible; private access with paid account.
Website storage of notebooks. However notebooks can also be downloaded to be run in Classic Jupyter Notebook on a PC.
Data science packages like pandas, etc. are supported with the import command.
Machine learning packages like scikit-learn, etc. are supported with the import command.
Multiple templates are available pre-configured with notebooks for Jupyter Notebook or various ML platforms.
Three “entry points”: (1) Notebooks; (2) Workflows, which help automate tasks in creating production-grade systems; (3) Deployments, which assist preparing for production.
Free use of GPUs.
Able to support voila because of full Jupyter Notebook support when in the Classic Jupyter Notebook mode.

Tiers

Free	Pro	Growth
free	$8/month	$39/month
Public projects	Private projects	Private projects
5GB storage	15GB storage	50GB storage
Basic instances	Mid-range instances	High-end instances
	Faster free GPUs	Expert support

Kaggle

source

Learn more: https://www.kaggle.com/

Kaggle is arguably an online community or meeting space for data scientists and machine learning people.

As well as providing online notebooks, it includes a newsfeed, datasets, competitions, forums, and free data and machine learning courses, all accessible from a well-organized and intuitive dashboard.

Beyond the notebooks, you might want to join this site just because of all the resources it provides.

Features

Both Jupyter-like web interface and script-like (“normal” program files) interfaces available.
Notebooks can be downloaded, then opened in Jupyter Notebook elsewhere.
Many available datasets to work with.
Data science packages like pandas, etc. are supported with the import command.
Machine learning packages like scikit-learn, etc. are supported with the import command.
Multiple free courses on data science and machine learning.
Free use of GPU and TPU.
Voila probably not supported.

Tiers

All Kaggle functions are free to use.

JetBrains DataLore

Learn More: https://datalore.jetbrains.com/

JetBrains is the company that provides the PyCharm Python IDE. Datalore is their online implementation of Jupyter Notebooks.

Features

Both Jupyter-like web interface and script-like (“normal” program files) interfaces available. Other modes/features are available as well (see their website for details).
Notebooks can be downloaded, then opened in Jupyter Notebook elsewhere.
Data science packages like pandas, etc. are supported with the import command.
Machine learning packages like scikit-learn, etc. are supported with the import command.
Well-written and easy to use help documentation.
Free CPU use; GPU use with paid tier.
Voila is available as a package.

Tiers

Community	Professional
Free	$19.90/month
120 hours of computations on a basic CPU machine	Unlimited computations on a basic CPU machine
	120 hours of computations on a powerful CPU machine
	20 hours of computation on a GPU machine
10 GB of cloud storage + S3 bucket support	20 GB of cloud storage + S3 bucket support
Keep machine running for 6 hours after you’ve left notebook	Keep machine running for unlimited time

Conclusion

Online Jupyter Notebooks can be a valuable resource for Python computing anywhere, and ensure you have access to world-class resources for your computing.

To give you an idea of what is available we have reviewed a small sample of some of those resources.

However, this is just the tip of the iceberg of what is available. See this article for a much larger list of other available sites:

https://www.topbestalternatives.com/google-colab/

And this review is also only the tip of the iceberg of what these sites offer.

If this is something that interests you, definitely go to their sites to see what they offer; and since most have free options, try them out to see which you like best, and which best meets your Python, data science, or machine learning needs.

Also note this is a snapshot of offerings as of April 2022. This can be a fast-changing field, so examining the offerings yourself is highly encouraged to see what the latest changes are.

We wish you happy coding!

The post Top 4 Jupyter Notebook Alternatives for Machine Learning appeared first on Be on the Right Side of Change.

The Ultimate Guide to Installing Ghostscript

Aaron Glatzer — Sat, 02 Apr 2022 12:45:59 +0000

In this article we explore how to install Ghostscript on numerous different platforms and operating systems.

What is Ghostcript? Why install it?

What is Ghostscript, and why would we want to install it? To understand this we should first learn about Postscript.

Postscript

Postscript is a page description language geared towards desktop publishing documents.

If you want really professional-looking typesetting, layout, and graphics in your documents, desktop publishing software is what you use.

It was first created at Adobe Systems starting in 1982. As a language, it is similar to Python in that documents contain human-readable and writable commands in the language that can be parsed by an interpreter to get something done.

In the case of Python, text files containing Python commands can be parsed by the Python interpreter to create any kind of program imaginable.

In the case of Postscript, files containing Postscript commands can be parsed by a Postscript interpreter to render professional-looking documents, either to the screen or to a printer.

In addition, the PDF format is an extension of the Postscript language which adds more functionality and is now one of the most commonly used document formats.

Ghostscript

Ghostscript is a free open-source interpreter to render Postscript and PDF documents.

One of the reasons you might want to install it is to use a program that requires it.

Even without a program that needs it, installing Ghostscript can be useful:

Ghostscript can be used to modify PDF documents, such as converting PDF to images, or extracting text, among other things.

Even better, since Ghostscript provides a language-binding API, Ghostscript functions can be implemented in other languages, allowing us to write our own programs for modifying PDF documents. Supported languages are C#, Java, and Python.

Checking if Ghostscript is Already Installed

You may already have Ghostscript installed – your system may have come with it, or it may have been installed in support of a program you have installed. So save yourself some effort and check first.

Checking for Ghostscript on Windows

Press Windows+R to open the “Run” box.
In the “Run” box type “cmd”.
A command line window opens.
In the command line window type “GSWIN64 -h” if your system is 64 bit (most machines these days), or “GSWIN32 -h” if your system is 32 bit (older machines). If Ghostscript is installed you will see Ghostscript help information. If you see an error then Ghostscript is not installed.
Type “exit” to close the command line window.

Checking for Ghostscript on Mac

In the Finder, open the /Applications/Utilities folder, then double-click Terminal.
In the terminal window type “gs -h”. If Ghostscript is installed you will see Ghostscript help information. If you see an error then Ghostscript is not installed.
In the Terminal app on your Mac, choose Terminal > Quit Terminal.

Checking for Ghostscript on Linux

Open a terminal window. How to do this varies depending on which distribution of Linux you are using.
In the terminal window type “gs -h”. If Ghostscript is installed you will see Ghostscript help information. If you see an error then Ghostscript is not installed.

Installing Ghostscript on Windows

Go to the Ghostscript download page at https://www.ghostscript.com/releases/gsdnld.html
There are two license versions available: Affero GPL (AGPL), and commercial. Review the license information at https://artifex.com/licensing/. For casual use most users will chose AGPL.
Choose 64 bit or 32 bit depending on your system.
Download your choice by clicking on the chosen link.
The installer program will download.
The downloaded program will be gsxxxxw64.exe or gsxxxxw32.exe. The ‘xxxx’ will be numbers indicating the release version. The most current version as of this writing is 9.55.0, so the installer program would be gs9550w64.exe for the 64 bit version.
Double-click the downloaded installer program.
Follow the prompts to do the installation.

Installing Ghostscript on Unix

Use this for any UNIX-based machine, so this should work for Mac or Linux.

Most UNIX systems have much easier ways of installing Ghostscript, so you will almost certainly not need to do this.

However, if you have trouble with those easier approaches you might try this as a backup.

This method usually works, but sometimes it does not, and then you need to do some troubleshooting to figure out why (the configure file might not be configured properly for your system, for example).

Also note that you will need to make sure that compiling software for Linux or Mac is installed on your system, which is beyond the scope of this article. So choose this approach as a last resort.

Go to the Ghostscript download page and download the source code version. As of this writing this file is ghostscript-9.55.0.tar.gz
Move this file to some folder where you want to work.
Unarchive the downloaded file. Usually your system will be configured to do so by double-clicking the file. If not, you can unarchive using this command in the terminal: tar -xzf ghostscript-9.55.0.tar.gz. The file will unpack into sub-directories and files.
In the terminal go to the top unpacked sub-directory.
Run the configure file by typing ./configure in your terminal. This will review your system and get ready to compile the code.
Compile the code by typing make in your terminal.
Install the compiled code by typing this: sudo make install

Here are the commands for ease of copy&paste:

tar -xzf ghostscript-9.55.0.tar.gz
./configure
make
sudo make install

Installing Ghostscript on Mac

The easiest way to install Ghostscript on Mac is to use the Homebrew or Macports systems. These are package management systems for Mac that make available to the Mac the wide world of Unix open-source software.

In these systems, much of the configuring is done for you by others so that downloading and installing software is as easy as a single command, just like downloading an app for the Mac is as simple as clicking an icon in the Mac App Store.

What programs are available depends on what has been prepared by others for the system.

Fortunately, Ghostscript is available for these systems.

Setting up these systems is beyond the scope of this article. This page has a nice summary of those systems (and of the Fink system, another package management system). Follow their respective links to learn more about each system.

Install Ghostscript using Homebrew using the following command:

brew install ghostscript

Install Ghostscript using Macports using the following command:

sudo port install ghostscript

Installing Ghostscript on Ubuntu

It is often most intuitive to install software on Ubuntu using the GUI-based software application.

This accesses the repositories of extensive software available for Ubuntu.

However, it is often the fastest to do a command line install. Do so for Ghostscript as follows:

sudo apt install ghostscript

Installing Ghostscript on Other Debian-based Distributions

There are many distributions that, like Ubuntu, are based on Debian.

Many also have GUI applications for installing software, and often these can be used to install Ghostscript. But like Ubuntu, it is often the fastest to use the command line install.

The command is still the same:

sudo apt install ghostscript

Installing Ghostscript on Centos 7, and Other Red Hat/ Fedora-based Distributions

Centos 7 is a free version of the Red Hat Linux distribution, without Red Hat branding or technical support from Red Hat.

Fedora is the “bleeding-edge” freely available distribution in the Red Hat family of distributions that serves as the development foundation for the more robust and stable Red Hat distribution.

Since these are all in the same distribution family, they are all most quickly updated by the same command. The many other distributions in this family are also most quickly updated by the same command.

The command is:

sudo yum install ghostscript

Installing Ghostscript for Anaconda

If you are a data scientist more comfortable with data analysis in Anaconda than you are comfortable with OS management, you can still make sure you have ghostscript through Anaconda.

Open the Anaconda command line interface and enter the following command to install Ghostscript:

conda install -c conda-forge ghostscript

Installing Ghostscript in Google Colab

Ghostscript can even be installed in Google Colab.

Cells in Colab are in-effect like the Python shell. Therefore users can use the exclamation mark to submit OS shell commands, then enter the command to install Ghostscript.

The OS behind Colab operates like Ubuntu, so the installation command mirrors that of Ubuntu. Therefore, to install Ghostscript enter the following command in a Colab cell:

!apt get install ghostscript

Conclusion

Ghostscript is a free open-source interpreter that renders Postscript and PDF documents either to the screen or to a printer.

Ghostscript can also be used to process or modify such documents.

Even better, because Ghostscript includes a language-binding API, programmers can use it to write programs in other languages to modify PDF documents.

Supported languages are C#, Java, and Python.

As you can see, Ghostscript is available on many different platforms and operating systems. We have exhibited commands to install Ghostscript on many of these various platforms.

We hope you have found this helpful, and we wish you happy coding!

The post The Ultimate Guide to Installing Ghostscript appeared first on Be on the Right Side of Change.

How to Compress PDF Files Using Python?

Aaron Glatzer — Tue, 22 Mar 2022 20:07:28 +0000

Problem Formulation

Suppose you have a PDF file, but it’s too large and you’d like to compress it (perhaps you want to reduce its size to allow for faster transfer over the internet, or perhaps to save storage space).

Even more challenging, suppose you have multiple PDF files you’d like to compress.

Multiple online options exist, but these typically allow a limited number of files to be processed at a time. Also of course there is the extra time involved in uploading the originals, then downloading the results. And of course, perhaps you are not comfortable sharing your files with the internet.

Fortunately, we can use Python to address all these concerns. But before we learn how to do this, let’s first learn a little bit about PDF files.

About Compressing PDF Files

According to Dov Isaacs, former Adobe Principal Scientist (see his discussion here) PDF documents are already substantially compressed.

The text and vector graphics portions of the documents are already internally zip-compressed, so there is little opportunity for improvement there.

Instead, any file compression improvements are achieved through compression of image portions of PDF documents, along with potential loss of image quality.

So compression might be achievable, but the user must choose between how much compression versus how much image quality loss is acceptable.

Setup

A programmer going by the handle Theeko74 has written a Python script called “pdf_compressor.py”. This script is a wrapper for ghostscript functions that do the actual work of compressing PDF files.

This script is offered under the MIT license and is free to use as the user wishes.

Hint: make sure you have ghostscript installed on your computer. To install ghostscript, follow this detailed guide and come back afterward.

Now download pdf_compressor.py from GitHub here.

URL: https://github.com/theeko74/pdfc/blob/master/pdf_compressor.py

Ultimately we will be writing a Python script to perform the compression.

So we create a directory to hold the script, and use our preferred editor or IDE to create it (this example uses Linux command line to make the directory, and uses vim as the editor to make script “bpdfc.py”; use your preferred choice for creating the directory and creating the script within it):

$ mkdir batchPDFcomp
$ cd batchPDFcomp
$ vim bpdfc.py

We won’t write out the script just yet – we’ll show some details for the script a little later in this article.

When we do write the script, within it we’ll import “pdf_compressor.py” as a module.

To prepare for this we should create a subdirectory below our Python script directory.

Also, we’ll need to copy pdf_compressor.py into that subdirectory, and we’ll need to create a file __init__.py within the same subdirectory (those are double underscores each side of ‘init’):

$ mkdir pdfc
$ cp ~/Downloads/pdf_compressor.py ~/batchPDFcomp/pdfc/
$ cd pdfc
$ vim __init__.py

What we have done here is created a local package pdfc containing a module pdf_compressor.py.

Note: The presence of file __init__.py indicates to Python that that directory is part of a package, and to look there for modules.

Now we are ready to write our script.

The PDF Compression Python Script

Here is our script:

from pdfc.pdf_compressor import compress
compress('Finxter_WorldsMostDensePythonCheatSheet.pdf', 'Finxter_WorldsMostDensePythonCheatSheet_compr.pdf', power=4)

As you can see it’s a very short script.

First we import the “compress” function from “pdf_compressor” module.

Then we call the “compress” function. The function takes as arguments: the input file path, the output file path, and a ‘power’ argument that sets compression as follows, from least compression to most (according to the documentation in the script):

Compression levels:

0: default
1: prepress
2: printer
3: ebook
4: screen

Running the Script

Now we can run our script:

$  python bpdfc.py
Compress PDF...
Compression by 51%.
Final file size is 0.2MB
Done.
$

We have only compressed one PDF document in this example, but by modifying the script to loop through multiple PDF documents one can compress multiple files at once.

However, we leave that as an exercise for the reader!

We hope you have found this article useful. Thank you for reading, and we wish you happy coding!

Recommended Tutorial: How to Compress Images in Python

The post How to Compress PDF Files Using Python? appeared first on Be on the Right Side of Change.

Mutable vs. Immutable Objects in Python

Aaron Glatzer — Tue, 22 Feb 2022 10:38:49 +0000

Overview:

Mutable objects are Python objects that can be changed.
Immutable objects are Python objects that cannot be changed.
The difference originates from the fact the reflection of how various types of objects are actually represented in computer memory.
Be aware of these differences to avoid surprising bugs in your programs.

Introduction

To be proficient a Python programmer must master a number of skills. Among those is an understanding of the notion of mutable vs immutable objects. This is an important subject, as without attention to it programmers can create unexpected and subtle bugs in their programs.

As described above, at its most basic mutable objects can be changed, and immutable objects cannot be changed. This is a simple description, but for a proper understanding, we need a little context. Let’s explore this in the context of the Python data types.

Mutable vs. Immutable Data Types

The first place a programmer is likely to encounter mutable vs. immutable objects is with the Python data types.

Here are the most common data types programmers initially encounter, and whether they are mutable or immutable (this is not a complete list; Python does have a few other data types):

Data type	Mutable or Immutable?
`int`	immutable
`float`	immutable
`str`	immutable
`list`	mutable
`tuple`	immutable
`dict`	mutable
`bool`	immutable

Let’s experiment with a few of these in the Python shell and observe their mutability/immutability.

First let’s experiment with the list, which should be mutable. We’ll start by creating a list:

>>> our_list1 = ['spam', 'eggs']

Now let’s try changing the list using a slicing assignment:

>>> our_list1[0] = 'toast'

Now let’s view our list and see if it has changed:

>>> our_list1
['toast', 'eggs']

Indeed, it has.

Now let’s experiment with integers, which should be immutable. We’ll start by assigning an integer to our variable:

>>> our_int1 = 3
>>> our_int1
3

Now let’s try changing it:

>>> our_int1 = 42
>>> our_int1
42

It changed. If you’ve already worked with Python this should not surprise you.

So in what sense is an integer immutable? What’s going on here? What do the Python language designers mean they claim integers are immutable?

It turns out the two cases are actually different.

In the case of the list, the variable still contains the original list but the list was modified.
In the case of the integer, the original integer was completely removed and replaced with a new integer.

While this may seem intuitive in this example, it’s not always quite so clear as we’ll see later.

Many of us start out understanding variables as containers for data. The reality, where data is stored in memory, is a little more complicated.

The Python id() function will help us understand that.

Looking Under the Hood: the id() Function

The common understanding of variables as containers for data is not quite right. In reality variables contain references to where the data stored, rather than the actual data itself.

Every object or data in Python has an identifier integer value, and the id() function will show us that identifier (id).

In fact, that id is the (virtualized) memory location where that data is stored.

Let’s try our previous examples and use the id() function to see what is happening in memory

Note: be aware that if you try this yourself your memory locations will be different.

>>> our_list1 = ['spam', 'eggs']
>>> id(our_list1)
139946630082696

So there’s a list at memory location 139946630082696.

Now let’s change the list using a slicing assignment:

>>> our_list1[0] = 'toast'
>>> our_list1
['toast', 'eggs']
>>> id(our_list1)
139946630082696

The memory location referenced by our_list1 is still 139946630082696. The same list is still there, it’s just been modified.

Now let’s repeat our integer experiment, again using the id() function to see what is happening in memory:

>>> our_int1 = 3
>>> our_int1
3
>>> id(our_int1)
9079072

So integer 3 is stored at memory location 9079072. Now let’s try to change it:

>>> our_int1 = 42
>>> our_int1
42
>>> id(our_int1)
9080320

So our_int1 has not removed the integer 3 from memory location 9079072 and replaced it with integer 42 at location 9079072.

Instead it is referencing an entirely new memory location.

Memory location 9079072 was not changed, it was entirely replaced with memory location 9080320. The original object, the integer 3, still remains at location 9079072.

Depending on the specific type of object, if it is no longer used it will eventually be removed from memory entirely by Python’s garbage collection process. We won’t go into that level of detail in this article – thankfully Python takes care of this for us and we don’t need to worry about it.

We’ve learned lists can be modified. So here’s a little puzzle for you. Let’s try modifying our list variable in a different way:

>>> our_list1 = ['spam', 'eggs']
>>> id(our_list1)
139946630082696
>>> our_list1  = ['toast', 'eggs']
>>> our_list1
['toast', 'eggs']
>>> id(our_list1)

What do you think the id will be? Let’s see the answer:

>>> id(our_list1)
139946629319240

Woah, a new id!

Python has not modified the original list, it has replaced it with a brand new one.

So lists can be modified, if something like assigning elements is done, but if instead a list is assigned to the variable, the old list is replaced with a new one.

Remember: What happens to a list, whether being modified or replaced, depends on what you do with it.

However if ever you are unsure what is happening, you can always use the id() function to figure it out.

Mutable vs. Immutable Objects

So we’ve explored mutability in Python for data types.

However, this notion applies to more than just data types – it applies to all objects in Python.

And as you may have heard, EVERYTHING in Python is an object!

The topic of objects, classes, and object-oriented programming is vast, and beyond the scope of this article. You can start with an introduction to Python object-orientation in this blog tutorial:

Introduction to Python Classes

Some objects are mutable, and some are immutable. One notable case is programmer-created classes and objects — these are in general mutable.

Modifying a “Copy” of a Mutable Object

What happens if we want to copy one variable to another so that we can modify the copy:

normal_wear = ['hat', 'coat']
rain_wear = normal_wear

Our rainy weather wear is the same as our normal wear, but we want to modify our rainy wear to add an umbrella. Before we do, let’s use id() to examine this more closely:

>>> id(normal_wear)
139946629319112
>>> id(rain_wear)
139946629319112

So the copy appears to actually be the same object as the original. Let’s try modifying the copy:

>>> rain_wear.append('umbrella')
>>> rain_wear
['hat', 'coat', 'umbrella']
>>> normal_wear
['hat', 'coat', 'umbrella']

So what we learned from id() is true, our “copy” is actually the same object as the original, and modifying the “copy” modifies the original. So watch out for this!

Python does provide a solution for this through the copy module. We won’t examine that here, but just be aware of this issue, and know that a solution is available.

Note: immutable objects behave almost the same. When an immutable value is copied to a second variable, both actually refer to the same object. The difference for the immutable case is that when the second variable is modified it gets a brand new object instead of modifying the original.

Bug Risk, and Power: Mutable Objects in Functions

If you’re not careful, the problem we saw in the last section, modifying a “copy” of a variable, can happen when writing a function.

Suppose we had written a function to perform the change from the last section.

Let’s write a short program dressForRain.py which includes such a function:

def prepForRain(outdoor_wear):
    outdoor_wear.append('umbrella')
    rain_outdoor_wear = outdoor_wear
    return rain_outdoor_wear

normal_wear = ['hat', 'coat']
print('Here is our normal wear:', normal_wear)
rain_wear = prepForRain(normal_wear)
print('Here is our rain wear:', rain_wear)
print('What happened to our normal wear?:', normal_wear)

We know that the data is passed into the function, and the new processed value is returned to the main program.

We also know that the variable created within the function, the parameter outdoor_wear, is destroyed when the function is finished.

Ideally this isolates the internal operation of the function from the main program.

Let’s see the actual results from the program (A Linux implementation is shown. A Windows implementation will be the same, but with a different prompt):

$ python dressForRain.py
Here is our normal wear: ['hat', 'coat']
Here is our rain wear: ['hat', 'coat', 'umbrella']
What happened to our normal wear?: ['hat', 'coat', 'umbrella']

Since variables normal_wear and outdoor_wear both point to the same mutable object, normal_wear is modified when outdoor_wear is appended, which you might not have intended, resulting in a potential bug in your program.

Had these variables been pointing to an immutable object such as a tuple this would not have happened. Note, however, tuples do not support append, and a concatenation operation would have to be done instead.

Though we have shown some risk using lists in a function, there is also power as well.

Functions can be used to modify lists directly, and since the original list is modified directly, no return statement would be needed to return a value back to the main program.

Tuple Mutable(?) ‘Gotcha’

Here is one last, perhaps surprising, behavior to note. We’ve mentioned that tuples are immutable.

Let’s explore this a little further with the following tuple:

>>> some_tuple = ('yadda', [1, 2])

Let’s try modifying this by adding 3 to the list it contains:

>>> some_tuple[1].append(3)

What do you think happens? Let’s see:

>>> some_tuple
('yadda', [1, 2, 3])

Did our tuple change? No it did not. It still contains the same list – it is the list within the tuple that has changed.

You can try the id() function on the list portion of the tuple to confirm it’s the same list.

Why Bother with Mutable vs. Immutable?

This mutable/immutable situation may seem a bit complicated.

Why did the Python designers do this? Wouldn’t it have been simpler to make all objects mutable, or all objects immutable?

Both mutable and immutable properties have advantages and disadvantages, so it comes down to design preferences.

Advantage: For instance, one major performance advantage of using immutable instead of mutable data types is that a potentially large number of variables can refer to a single immutable object without risking problems arising from overshadowing or aliasing. If the object would be mutable, each variable would have to refer to a copy of the same object which would incur much higher memory overhead.

These choices are affected by how objects are typically used, and these choices affect language and program performance. Language designers take these factors into account when making those choices.

Be aware that other languages address the mutable/immutable topic as well, but they do not all implement these properties in the same way.

We will not go into more detail on this in this article. Your appreciation of these choices will develop in the future as you gain more experience with programming.

Conclusion

We have noted that Python makes some of its objects mutable and some immutable.
We have explored what this means, and what some of the practical consequences of this are.
We have noted how this is a consequence of how objects are stored in memory, and
We have introduced Python’s id() function as a way to better follow this memory use.

High-level programming languages are an ever-advancing effort to make programming easier, freeing programmers to produce great software without having to grapple with the minute details as the computer sees it.

Being aware of how mutable and immutable objects are handled in memory is one case where a bit more awareness of the details of the computer will reap rewards. Keep these details in mind and ensure your programs perform at their best.

The post Mutable vs. Immutable Objects in Python appeared first on Be on the Right Side of Change.

Aaron Glatzer, Author at Be on the Right Side of Change

Using PyTorch to Build a Working Neural Network

Knowledge Background

Process Overview

Step 1: Import Necessary Libraries

Step 2: Acquire the Data

Step 3: Review the Dataset

Step 4: Create Dataloaders

Step 5: Design and Create the Neural Network

Check for GPU

Create the Neural Network

Creating the Model: __init__() Method

Creating the Model: forward() Method

Step 6: Choose Loss Function and Optimizer

Choosing Cross Entropy Loss

Choosing Optimizer Algorithm

Step 7: Specify Training and Testing Functions

Training Function

Testing Function

Train and Test the Network

Reviewing the Big Picture

Import Necessary Libraries

Acquire the Data

Create Dataloaders

Check for GPU

Design and Create the Neural Network

Choose Loss Function and Optimizer

Specify Training and Testing Functions

Train and Test the Network

Saving and Reloading the Network

Conclusion

Tensors: The Vocabulary of Neural Networks

Tensors

2D Tensor Matrix

Why We Use Tensors

Higher Dimensional Tensors

Tensors in Neural Network Libraries

Tensors in PyTorch: Creating Them, and Doing Math

Indexing and Slicing Tensors

Indexing Slicing PyTorch vs NumPy vs Python Lists

Real Example

Summary

Programmer Humor

How Neural Networks Learn

Basic Fundamentals: How Neural Networks Work

Overview of How Neural Networks Learn

Presenting Samples and Calculating Loss

Minimizing Loss: Calculus and the Derivative

The Loss Surface and Gradient Descent

Backpropagation

Batch Size

How Neural Networks Learn

The Magic of Neural Networks: History and Concepts

A Little History and Inspiration

Artificial Neurons and Networks

Artificial Neural Network Weights

Artificial Neural Network Bias

Artificial Neural Network Activation Function

How Neural Networks Are Programmed

The “Magic” of Neural Networks

Top 4 Jupyter Notebook Alternatives for Machine Learning

The Python Landscape

Quick Overview of Online Options

A Quick Review of Jupyter Notebooks

Advantages to Online Jupyter Notebooks

Google Colab

Features

Tiers

Paperspace Gradient

Features

Tiers

Kaggle

Features

Tiers

JetBrains DataLore

Features

Tiers

Conclusion

The Ultimate Guide to Installing Ghostscript

What is Ghostcript? Why install it?

Creating the Model: init() Method