Artificial neural networks have become a powerful tool providing many benefits in our modern world. They are used to filter out spam, perform voice recognition, and drive cars, among many other things.
As remarkable as these tools are, they are readily within the grasp of almost anyone. If you have technical interest and have some experience with computer programming, you can build your own neural networks. Knowledge of some basic algebra and some programming experience is all you need to get started.
And don’t be afraid to read through this article. Don’t worry if you don’t know the algebra – we have tried to make the text understandable by anyone.
🧠 What You’ll Learn: In this article, we will go over the fundamentals of how neural networks are built and how they work. When done, you won’t yet know how to build them yourself, but you’ll understand the fundamentals of how they work, which will help you when you get to building your own.
But first we will briefly review a little about real neurons and how this has inspired the development of artificial neural networks.
You can find part 2 of this series in the following tutorial on the Finxter blog:
🌍 Part 2: How Do Neural Networks Learn?
Throughout the history of artificial neural networks, their development has been influenced by research and understanding of how real neurons operate. Let’s briefly review a simplified understanding of real neurons to provide some inspiration for how artificial neurons might be designed.
Figure 1 shows a schematic drawing of a real neuron.
The neuron consists of a collection of dendrites, the soma, which is the cell body, and an axon.
Signals come in through the dendrites. The signals are added together within the soma. If the collection of signals is strong enough the neuron will be triggered to send a spike signal down the axon, thereby sending a signal on to other neurons.
Figure 2 shows real neurons connected together in a network.
What kind of signals might these neurons convey?
Even though neurons send a spike signal, other research has shown that with more stimulation of the neuron, the spike signals occur more frequently.
This may suggest it’s actually the frequency of spiking (an analog value) rather than the spike (a digital value) that may be the important signal that neurons convey.
What kind of signal might the neuron finally output?
One can imagine that
- with very faint stimulation, a neuron may not output much;
- with modest stimulation, a neuron will output more, perhaps in a linear fashion; and
- with much more stimulation, the neuron may saturate and not be able to output anymore.
This could result in a sigmoid-shaped output, as in Figure 3.
How might these neurons and networks encode and learn their information?
In 1949 Donald O. Hebb proposed a model for how neuron function might contribute to learning in his book “The Organization of Behavior”. He proposed that neural connections are strengthened through use and that this may be the foundation of learning within the brain.
This is sometimes described as “neurons that fire together wire together”, and this is known as Hebbian learning.
💡 The implication here is that through learning and use, some neural connections become stronger than others and that it is the pattern of connection strengths that encodes learning and memory.
Understand that further research has shown real neurons to be more complicated than the simple description here.
However, this description does reflect some properties of real neurons, and it turns out even these relatively simple models can exhibit some remarkable behavior.
Now that we’ve reviewed some simple properties of real neurons and neural networks, let’s use this simplified understanding of real neurons as inspiration for our design of artificial neurons and networks.
Figure 4 shows our artificial neuron.
Like with dendrites in real neurons, signals come in from other neurons through multiple inputs.
Artificial Neural Network Weights
The strengths of those connections are expressed by weights (w1, w2, etc.) shown on each input. Incoming signals are multiplied by the weights so that larger weights result in stronger signals from that connection, reflecting that stronger connection. All of those signals are added up in the node of the neuron.
Artificial Neural Network Bias
For each neuron, there is also one other signal not connected to any other neurons which are added in, which is called the bias. This constant signal determines if that neuron is already enhanced or suppressed on its own, in addition to signals provided by the inputs.
Artificial Neural Network Activation Function
Finally, that total input is passed through a function known as the activation function. This function determines how the neuron responds to the activation by its inputs.
There are multiple different functions that are used as activation functions. We have already justified choosing a sigmoid-shaped function, and sigmoid-shaped functions are a common choice.
Though there are multiple other activation functions considered for use, sigmoid-shaped functions are the easiest to motivate from a biological standpoint.
So to reiterate, here is how we describe the signal processing done by each neuron:
- Multiply each input by its weight and add them all up.
- Add the bias.
- Process the total through the activation function.
And here is how we describe it mathematically:
(add up all the weighted inputs, plus bias)
(process through the activation function)
There we have it – this is our artificial neuron. It’s really quite a simple object: add together weighted inputs, add the bias, and pass that through an activation function for the final output.
This simple object was first introduced by Warren McCulloch and Walter Pitts in their 1943 (!) paper “A logical calculus of the ideas immanent in nervous activity”, except their activation function was a step function instead of the smooth sigmoid-shaped function we discussed before.
💡 Researcher Frank Rosenblatt called this object the perceptron in his 1962 book “Principles of Neurodynamics”.
Figure 5 shows a network of these simple elements integrated together in a multi-layer artificial neural network. This multi-layer network is sometimes called a multi-layer perceptron (abbreviated MLP).
Signals come in through the input side, say for example a picture of a cat or a dog. The signals then pass through the network, getting processed by neural calculations along the way. Then on the output side the network provides an output indicating whether the image was a cat or a dog.
These efforts did achieve some compelling results, such as computers that could play competitive checkers or chess.
However, very simple things that even a young child can do eluded computers, things like being able to recognize what object is in a picture. In fact, this seemingly simple task is actually quite difficult to program. Let’s briefly explore this.
Suppose we want to program the computer to be able to recognize handwritten numerical digits.
💡 In fact, this is a common exercise for beginning neural network students; building a network to solve this problem is considered the neural network version of the “Hello world!” program, and there’s a database called the mnist handwritten digit database for doing just this very problem.
Figure 6 shows a sample of some of the digits from this database.
Let’s think about how one might program a computer to do this. Just looking at the numbers you recognize them instantly, but how might you write a program to do that?
Look at the various numbers “seven” for example.
Perhaps one could describe it as one horizontal line at the top, and one slanted vertical line below. Would you specify the coordinates where the lines should be? Probably not – what if the number was written off to the side or in a corner of the image? Could there also be a limit to how long or how short these lines are?
As you can see the number of rules for identifying a number “seven” grow quickly and get complicated.
But what about a more sophisticated problem? What about identifying whether a picture is of a cat or a dog?
Not only is there an enormous variety of cats and dogs to distinguish, just figuring out how to describe them so a computer could recognize them is a daunting challenge. Where would one even start?
💡 Instead of writing code to solve this, neural networks are “programmed” in a fashion more like how humans learn – the network is trained with a set of examples, and the network learns from these examples.
More specifically, the network’s learning is encoded in the weights and biases of the network, and computer scientists have figured out algorithms that allow the network to automatically self-adjust its weights and biases.
The process is called back-propogation. This entails the network adjusting weights and biases to get closer to the correct answer, working back from the output back to the input.
Therefore the programmer does not have to figure out how to encode the solution, the network itself figures it out.
Once the network is trained on a set of examples, new cases can be introduced to the network, and the network provides correct answers
(Well, ideally, that is. That is part of the skill of being a neural network engineer – figuring out how best to build and train networks to get the desired performance.).
So what kind of programming does a neural network engineer do?
- They write code that describes the structure of the network, such as how many layers, and how many neurons per layer.
- They decide which activation function to choose.
- They also write other code that specifies how to measure errors, known as loss, that the network makes.
- They also make choices about what training data to use and how to adjust network learning.
So even though neural networks learn through example, the neural network engineer has much to do to make that happen.
It is astounding that a network of very simple objects could achieve the seeming human-like ability to learn by example and recognize pictures of objects.
It’s not obvious up front that a collection such simple objects could achieve this and, there are so many other things they can do: they can locate objects within an image, they can detect words within a conversation, they can help steer cars, among other things.
It is amazing such simple objects could achieve such sophisticated performance. It really truly does seem almost magical.
We hope you have found this article helpful in gaining a basic understanding of how neural networks work.
Even more, we hope this article has fired your imagination and inspired you to learn more about neural networks, even to the point of building them yourself! Go out there and learn how to build some networks!
We wish you happy coding!