- under Blog
I read Russ Cox(Cox 2010)'s solution to the self-reproducing zip file problem today. I have to admit that, due to my lack of intelligence I guess, I don't understand the proof of the solution very well. Thus, I'll post here some clarification on the solution I have to myself, for those like me who get confused about the solution.
I'm going to jump directly into the final Lempel-Ziv program, and I won't talk about the actual encoding of the zip file, which makes the problem harder.
Problem rephrasing
Suppose we have the following language, where the minimal unit is one code:
- literal,
Lx - following by an arbitrary array of codes of length
x, prints the array into the output as-is. That is,L3 aa bb ccresults asaa bb ccin the output. - repeat,
Rx - prints the last
xcodes in the output into the output again. That is, if the output isaa bb cc ddat that moment,R3turns it intoaa bb cc dd bb cc dd. - No-op,
L0 - Notice that as a special case for repeat,
L0, is no-op. - (no term)
- Any other "illegal" codes, i.e. not in the literal array following
Lxand is notLxorRx, are considered comments. That is, they will not be printed to the output. And they can only appears at the beginning or at the end.
Then, given a p-code long prefix [P], and an s-code long suffix [S], can we find a quine prefixing with [P] and suffixing with [S]? that is, if we note the process of decoding as f() then can we find a string of codes [C] such that (++ means concatenation here)
f([P] ++ [C] ++ [S]) = [P] ++ [C] ++ [S]
Actions
We say that we have two array of codes, Code and Output, and, assume that eventually we have Code == Output, we say that the shorter one at a moment is "left behind" for the sub-string [X].
OutputCatchUp([X])- for an
x-code-long string[X], this action can be invoked when theOutputis left behind for[X], and as a result theOutputis left behind forRx+1. The actual encoding for this action isLx+1 [X] Lx+1 Rx+1, and the output is[X] Lx+1 [X] Lx+1. Swap()- This action can be invoked when the
Outputis left behind forR4, and as a result theCodeis left behind forL4, like a swap. This is a magic fixed stringL4 R4 L4 R4 L4 R4, and the output isR4 L4 R4 L4 R4 L4 R4 L4. CodeWait([X])- for an
x-code-long-string[X], this action can be invoked when theCodeis left behind forLx+1, and as a result theCodeis left behind for[X]. The actual encoding for this action isLx+1 Rx+1 [X] Rx+1, and the output isRx+1 [X] Rx+1 [X]. Note thatSwap()is a special case of this, where[X] = L4 R4 L4. OutputHold(X)- for a single code
X, this action can be invoked whenOutputis left behind forX, and as a resultOutputis left behind (further) forX L1 L1. The actual encoding for this action isL1 X L1 L1, and the output isX L1. CodeLift(n)- For a length
n, this action can be invoked whenCodeis left behind forL4, and as a result theCodeis left behind forLn. The actual encoding for this action isL4 R4 L0 L0 Ln R4 L0 L0, and the output isR4 L0 L0 Ln R4 L0 L0 Ln.
Properties
- When
Codeis left behind, the only thing it can do is to "catch up" by filling the diff sub-string compared withOutputuntil it is no longer left behind, i.e. it is the same asOutput(which means we find the solution) or theOutputis left behind. In other words, the set of actions can be performed at each momentCodeleft behind is fixed. The crucial core to the solution is to build this set into the one we want. OutputCatchUp([X])basically should not be called when[X]'s lengthxis 1, resulting in an hard-to-break loop.
Solution demystified
- To begin with, filling the prefix comment
[P],Outputis left behind for[P] - Apply
OutputCatchUp([P]), now theOutputis left behind forRp+1 - Apply
OutputHold(Rp+1), now theOutputis left behind forRp+1 L1 L1 - Apply
OutputCatchUp(Rp+1 L1 L1), now theOutputis left behind forR4 - Apply
Swap(), now theCodeis left behind forL4 - Apply
CodeLift(s+1), now theCodeis left behind forLs+1 - Apply
CodeWait([S]), now theCodeis left behind for[S] - Fill the suffix comment
[S]. We are done!
Notice that
- Before
Swap()all actions requireOutputto be left behind, and after that all actions requireCodeto be left behind. - The core idea before
Swap()is to buildSwap()'s input requirement, i.e lackingR4onOutputsomehow, and get rid of the interference of the initial prefix comment as soon as possible. - The core idea after
Swap()is to build the moment requiringLs+1onCode. Only then can[S]be printed.