Moving Towards Real Artificial Intelligence: Part 2 – Dissecting the Three Main Components

Blog / Nicola Pastorello / January 18, 2017

This post is the second in the series Moving Towards Real Artificial Intelligence that explores the HTM approach to create artificial intelligent machines by mimicking the processes that govern human learning. The first post in this series introduced the idea of the brain as an inspiration for modelling computation.

In the first post of this series, I introduced the Hierarchical Temporal Memory (HTM) model, which is based on Jeff Hawkins’ theory of how the brain neocortex learns. In this post I will give some context for the application of HTMs by outlining the three main components of this model: hierarchical, temporal and memorial.

Hierarchical Component

While one may assume that neurons in the neocortex are randomly distributed, in fact they actually possess a uniform hierarchical structure, showing 6 parallel “horizontal” layers, each exhibiting different types of neurons and connections.

The bottom layers, receiving input almost straight from the sensory organs, process low-level patterns and short-term sequences of signals.

The higher layers receive pre-processed signals from the lower ones, and therefore are linked with recognition and prediction of more abstract patterns.

The feedback is two-way. Input signals go from bottom to top layers in higher-abstraction and higher-complexity patterns, while predictions move the other way. For example, a sequence of notes in a melody will be matched with pre-known notes in the lower layers, with the melody they belong to in higher layers, and with the title of the song in the top layers.

Algorithmically, this is not dissimilar to the hierarchical structure of deep learning neural networks. In fact, in the latter the early layers of perceptrons are closer to the input and learn to identify lower-level information. Following layers, instead, learn higher order patterns in the data. A clear difference, though, is the presence in HTM models of intra-layer connectivity, since cells belonging to the same layer are linked together and are able to inhibit each other’s responses.

Fig.1 – Left: Diagram of a 4-layer HTM model. Information is shared across different layers, both ways, and among neurons in the same layer. Bottom levels process lower-level information, higher levels receive more abstract patterns.
Right: Slice of the neocortex exposed using three different staining methods. The 6 different layers, and the different types of neurons, can be seen in the three different panels. Extracted from HTM Whitepaper, courtesy of Numenta 2011.

Temporal Component

In the last post in this series, I touched on how our learning is based on repeated temporal patterns. No information is processed in a static sense, and even visual inputs (i.e. from the optical nerves) are processed as a sequence of different images (no pictures, only videos). In the latter case, it’s well known that microsaccades (continuous involuntary small eye movements) cause the input to be always different.

Since the HTM algorithm is designed to work similarly to the neocortex, it works pretty efficiently for learning/predicting tasks where inputs are time-dependent (e.g. time series, audio). Currently, the Nupic open-source community is actively working on ways to convert non-temporal/sequential (e.g. image classification) inputs into processable patterns.

Neocortex neurons learn when to fire in response to previous activations of other neurons. In a simplistic setting, let’s assume that different inputs cause different neurons to fire1. If a series of inputs A->B->C->D->A->… is repeated, the link between the neurons mapped to input A and B will get stronger than, for example, the link between A- and C-mapping neurons. Therefore, at the next event A, the neuron associated with B will be ready (i.e. its activation threshold/potential will be lower) to fire before its neighbours and, if the event B actually occurs after A, will inhibit nearby neurons from firing. If instead, a new unexpected signal occurs after A, the link between A and B will weaken.

Fig.2 – Sequence ascending from C real, courtesy of Hyacinth.

In the example of melody processing, our ability to recognise a song from just a few notes and anticipate the next notes in the sequence is due to a combination of this learnt temporal pattern and the hierarchical structure. Since the number of notes used in music is limited, lower levels send the recognised sequence information to the following layer. Such sequence will be encoded by this layer in a higher order sequence (e.g. stanza) and passed to even higher layers, where this abstraction process will increase.

Memorial Component

Fig.3 – Courtesy of Kristine Keller (link).

The final component is the memory of the machine, which is where the actual learning occurs. The learning process is based on storing the observed patterns at each level of the neocortex, replacing old patterns with new, and updating the links among different cells. These links are strengthened or weakened every time a pattern is observed, according to whether it was correctly predicted or not, in an infinite loop of prediction->observation->verification->new prediction.

Learning is then equivalent to either (1) registering a previously unknown/unexpected sequence of inputs, and/or (2) reinforcing an already known sequence. Therefore, it’s intrinsically dependent on the flexibility of the synapses that link neurons.

Young brains with high synaptic flexibility can therefore easily learn a large number of patterns, while their older counterparts have already a complex network in place built from more years of sensorial experiences. An example of this is the high efficiency in learning a new language shown by children in contrast to adults.

Going back to the song example, the layer that is able to match the melody with a known song will then pass that information back to lower layers, where it will be decoded into lower and lower level information, down to the expected following notes at the first layer.

In the case an expected new note is realised, the sequence at all layers will cause a strengthening of the connection between the neurons involved. Instead, if a new, unexpected note occurs, the anomaly will be escalated up to upper layers, until a known, higher-abstraction, match is found. When a higher layer is able to match the new sequence (e.g. with a different song), it will push back the prediction regarding the next notes again, in a continuous and iterative process.

Instead, if even the highest layer is not able to fit the new information onto a learnt sequence, an anomaly is detected (and the new melody will start being mapped to a new pattern).

In this process, all the synaptic links that are not used by this sequence mapping will get weaker with time.

A consequence of this is linked with our attention management. If we are working and listening (known) music in the background, our highest brain activity is focused on the work task and only the lowest levels process the music information. However, if in a known song a new, unexpected note is listened, this anomaly can escalate all the way up to the highest layers, and our focus will shift immediately to the music sequence.


The HTM theory is a promising approach to reverse-engineering human intelligence with the purpose of building real artificial intelligence. From such a theory, a number of verifiable predictions are made (most of them are in the final pages of Jeff Hawkins’ book On Intelligence). The theory itself is still a work in progress and I strongly recommend any interested reader to explore Numenta’s online forum, where most ideas and implementations are discussed. Finally, it is worth adding that very recently, Hawkins and his collaborators published a couple of easy-to-read and peer-reviewed papers regarding the theory (“Continuous online sequence learning with an unsupervised neural network model” and “Why Neurons Have Thousands of Synapses, A Theory of Sequence Memory in Neocortex”).

1: This is not exactly how the HTM theory explains input processing, which is instead based on sparse distributed representation (SDR). This will be the topic of a future blog post.

Header image courtesy of OSA Student Chapter at UCI Art in Science Contest.

Thanks to Maria Mitrevska, Rodney Pilgrim and Allan Jones for proofreading and providing suggestions.