Learning to remember more with less mental work

News / Simon Parker / December 18, 2019

We are what we can remember. Without memory, we are no better than bacteria which can only react to the chemical trigger of the environment like swimming toward places with more nutrition. The current AI systems, despite all of their jaw dropping accuracy, would mostly behave in the same reactive way – they will tell you that is a banana when they see an image of the fruit, but forget what came a minute before. 

How can we dream of having a digital companion if it does not remember what we told it in the morning? A recent paradigm of neural networks has emerged for solving this memorization issue. They are known as Memory-augmented Neural Networks (MANNs).

Unlike other neural networks that often rely on short-term memory of recent stimuli, MANNs make use of external memory and thus, can remember far longer. However, learning to read ultra-long sequences with limited memory is very challenging. Our recent paper published in a top conference for Machine Learning (ICLR 2019) aims to provide a balance between memorization and forgetting, by optimizing memory operations in MANN. The work was one of twenty-four oral papers chosen from five thousand submissions.

Being lazy can be more effective 

When reading a long text, we do not want to memorize every word. Neither does the computer. It will be much more useful and efficient to commit to memory once in a while the gist of the last piece of text. Not only is this more meaningful, it will take less time than memorizing everything word-by-word. 

One may ask: What is the optimal interval between two memory committing events? In the paper, it is theoretically proven that when the information of every element is equally important, the best interval is the length of the sequence divided by the size of our memory. 

Paying attention to what will be remembered 

To get the gist out of the text read between two memory operations, our model maintains a temporary cache. It then identifies which parts in the cache should be kept in memory. Our model, like a human, learns to attend to what is truly important in the long run. 

Equipped with the new caching and attention technique, neural networks with external memory now outperform other approaches in sequence modelling tasks such as predicting a sequence or reading a text then classifying its category. 


Hung Le, Truyen Tran, and Svetha Venkatesh. Learning to remember more with less memorization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.  Read the paper.      See the video of the presentation.

Written by Hung Le, Truyen Tran, and Svetha Venkatesh