Neural Numbers

Mathematics
Why numbers can contain the world.

Neural networks have a remarkable capacity to accumulate knowledge and exhibit signs of intelligence. These days companies creating GPT-like models claim that there is not enough public data and if you consider the amount of stuff on the internet (Wikipedia, YouTube and news websites not the least), it’s quite remarkable that one can crush so much into a neural network. The combined history and knowledge of human kind in organized numbers, how can this be?

This condensation of information into numbers relates to a fact that few people acknowledge and that even scientists usually put aside: you can put the whole universe in a single numbers. At least, in theory.

There is a surprising theorem in set theory (Cantor’s cardinality of Euclidean spaces) stating that all (Euclidean) spaces have the same cardinality. Simply said, there are as many point in one dimension, as in two…as in any (finite) dimension you like. Take a cube as small as you like and it has as many points in it as the whole universe. This proven fact is a bit of a burden for fundamental physics but that’s another topic. It is on its own quite remarkable but the way it is proven is even more interesting. In essence, you can concatenate as much as you like in a real number. The collected works of Shakespeare fit in a number, Wikipedia fits in a number, you name it. You only have to pick up some numeric encoding (some tensorization if you wish) and cancatenate things (in a numpy fashion). For example, take the Bible, replace every letter and character with its ASCII code, put it in an array and join it into a single number (i.e. drop the separator).

Mathematically speaking, any transformer model could be downsized to a single number rather than billions of paramters. Computers can’t of course deal with arbitrary large numbers and it’s, hence, impractical to use floating point numbers of arbitrary precision. Plus, having information structured in neural layers also speeds up the processing of things.

In computer science and at the root of AI the same infinite-information idea is present in the Universal Approximation Theorem. This theorem states that a feedforward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a closed and bounded subset of (^n) to any desired degree of accuracy, provided the activation function is a non-constant, bounded, and continuous function. The theorem highlights the theoretical power of neural networks, showing that even simple neural networks are capable of approximating complex functions. However, it’s important to note that while the theorem guarantees the existence of such an approximation, it doesn’t provide a practical way to determine the optimal architecture or the number of neurons required. Additionally, the theorem doesn’t address the challenges of training the network, such as finding the appropriate weights through optimization. What matters is that no matter how complex a smooth function is, you can approximate it to any degree of accuracy with a neural net. The ‘arbitrary complexity’ can be interpreted as ‘arbitrary amount of information’.

From a theoretical perspective, the capacity of neural networks is infinite provided you increase either the amount of parameters or increase the precision of the weights. Even a single paramter can contain the whole internet. It’s more a hardware and technology issue to handle it.

The capacity of numbers is sometimes used in movies and books but often a particular number is picked out. The statement ‘the history of the world sits in the digits of Pi’ could be true but is, as far as I know, not proven. Special numbers like Pi or the square root of 2 are irrational but not necessarily one-to-one with the whole space. The cardinality theorem is true for arbitrary small open intervals but it’s not very likely that it is true for a single point.