Suppose there’s a binary string, and we want to find a program that generates the string. Further assume that we are in possession of a universal Turing machine, that accepts programs and input in a predetermined binary representation. Then, how do we find such a program? Naively searching (by evaluating all length-\(n\) programs for all \(n\)) is dangerous, since we cannot know if any particular program halts; our procedure might get stuck while evaluating some program. A better way would be to somehow “parametrize” programs that halt, and only try those.

For any specific length, this is impossible. Suppose we have a program for the aforementioned universal Turing machine, \(\mathbf{E}(n)\), that enumerates all programs of length \(n\) that halt, and then halts itself. Pick any program \(\mathbf{P}\), and let its length be \(k\). Then the output of \(\mathbf{E}(k)\) either contains \(\mathbf{P}\) or doesn’t, which determines whether \(\mathbf{P}\) halts. This is uncomputable, so \(\mathbf{E}\) cannot exist.

If we just want to *eventually* output all halting programs, enumeration is possible with dovetailing. The strategy is best explained by the following image:

Each row is a program, and each column is one step of each program’s execution. By executing along the red path, and returning the index of every program that terminates, we will eventually output every program that halts. However, after any finite number of timesteps, there will still be infinitely many programs that haven’t even begun running yet. Furthermore, nothing can be said about whether programs that are currently running will halt in the future—so the uncomputability of the halting problem is respected.

These two examples imply that any system that learns algorithms (like the brain probably does) cannot be perfect; it likely has to apply a runtime cutoff while searching for algorithms. Of course, really slow programs would be useless for responding quickly to an environment anyway. Simpler (shorter) programs are probably better, since they compress the input data more efficiently. Programs that are longer than the input data can directly encode the data, so they’re not really “explaining” it, and can be ignored. Heuristics like these probably guide algorithm search in the brain.

*If you liked this post, you can also follow me on Twitter.*