A lot of papers mention that recurrent neural networks (RNNs) are Turing-complete: that they can simulate a Turing machine. This belief is extremely pervasive; a paper1 from DeepMind states, without qualification:
Moreover, it is known that RNNs are Turing-Complete (Siegelmann and Sontag, 1995), and therefore have the capacity to simulate arbitrary procedures…
The referenced paper, titled On the Computational Power of Neural Nets,2 is responsible for a slew of misunderstandings. It provides a proof that, under certain conditions, a recurrent neural network can simulate a Turing machine. The trick is to treat the decimal digits of the RNN’s state vector like the Turing machine’s tape, in order to store arbitrary and unlimited data. Of course, everyone uses 32- or 64-bit floating point representations, which means that in practice an RNN can’t store very much data this way. The proof gets around this by assuming infinite decimal precision.
I mentioned in the Turing-equivalence article that infinite memory isn’t necessary for a Turing machine; only unbounded memory. So the problem here is not so much that they use infinite precision, but rather that this idea of being able to arbitarily increase floating-point precision is completely disjoint from reality. It has no connection to the way RNNs actually work.
This is not the same as the distinction between idealized Turing machines and real computers. Real computers can simulate a Turing machine as long as they don’t run out of memory, and there is a mechanism by which processes can obtain more memory. An RNN cannot even do that, because there is no mechanism to allocate more floating point precision beyond whatever 32 or 64 bits are available. Even if we were interested in computations that fit within that limit, it’s highly unlikely that gradient descent could find such a specialized construction for storing data.
Building on the Siegelmann & Sontag paper, many other works have analyzed all sorts of neural networks (for example, Transformers3) in the infinite-precision setting, which has now, unfortunately, become standard. The result: most papers on the computational power of neural networks study an unrelated, meaningless system.
Worse, some more recent works discuss how certain kinds of recurrent neural networks are actually more powerful than Turing machines (super-Turing computation).4 Be very, very suspicious of any super-Turing claims.5
Back to index
Graves, Alex, Greg Wayne, and Ivo Danihelka. “Neural turing machines.” arXiv preprint arXiv:1410.5401 (2014). ↩︎
Siegelmann, Hava T., and Eduardo D. Sontag. “On the computational power of neural nets.” Journal of computer and system sciences 50.1 (1995): 132-150. ↩︎
Bhattamishra, Satwik, Arkil Patel, and Navin Goyal. “On the computational power of transformers and its implications in sequence modeling.” arXiv preprint arXiv:2006.09286 (2020). ↩︎
Cabessa J, Siegelmann HT. The super-Turing computational power of plastic recurrent neural networks. Int J Neural Syst. 2014 Dec;24(8):1450029. ↩︎
Broersma, Hajo, Susan Stepney, and Göran Wendin. Computability and complexity of unconventional computing devices. Computational Matter. Springer, Cham, 2018. 185-229. ↩︎