Learning to add with LSTM

I know, Wolfram’s stuff ain’t the most popular when it comes to machine learning and AI. What pleases in their sandbox is the ease with which one can experiment with ideas without having to go very deep into various frameworks.

Their neural network API is based on MXNet and is straightforward to use. Below, for example, is a handful of lines of code through which addition is learned. Can’t beat the fun here.

First you need a bit of data of course. Just a matter of assembling a dictionary with simple additions and their result.
This rule takes two integers and returns a rule corresponding to the addition:

LSTMAdd_1.png

With this, one can compile a dataset consisting of all combinations from zero to hundred:

LSTMAdd_2.gif

Out[3]//InputForm=

{"30+90" -> 120, "97+15" -> 112, "34+29" -> 63, "91+26" -> 117}

A neural network does not understand strings so one has to convert things to a tensor. In this case a vector corresponding to the characters in the input string:

LSTMAdd_3.png

For instance the following addition is converted to a vector

LSTMAdd_4.png

LSTMAdd_5.png

This encoder will fail if applied to something it wasn’t made for:

LSTMAdd_6.png

LSTMAdd_7.png

LSTMAdd_8.png

Finally, the actual neural network is assembled using a couple of LSTM layers. You are free to take anything you like, composing neural networks are a bit of an art rather than a science. A LSTM layer has an internal memory across the input axis, which makes sense since the output of an addition is directly related to previous input (unlike, say, poems or music):

LSTMAdd_9.png

LSTMAdd_10.gif

LSTMAdd_11.png

LSTMAdd_12.gif

The training is also straightforward and we use 10% of the data for validation:

LSTMAdd_13.png

LSTMAdd_14.gif

Now you can give any addition with numbers below hundred:

LSTMAdd_15.png

LSTMAdd_16.png

Simple, powerful and enticing at the same time.

Created with the Wolfram Language