PROBLEM LINK:Author: Sergey Nagin DIFFICULTY:MEDIUM PREREQUISITES:Modular arithmetic, bitwise operations PROBLEM:Two pseudorandom generators of binary sequences are described (LCG and Xorshift), where each generator is initialized with a $32$bit unsigned integer. Now, for each input string, you have to determine which generator generated it. The details and code implementation of the two generators can be read from the problem statement. QUICK EXPLANATION:In the LCG generator, only the $17$ least significant bits of the seed matter: it can be shown that the higher order bits won't affect the generated binary sequence. Thus, there are at most $2^{17} = 131072$ distinct binary sequences. Furthermore, the first $33$ bits of all these $2^{17}$ sequences are distinct. Therefore, one can simply test the first $33$ bits of the input string against each of these, and find a candidate seed that could generate the input string. If there is none, or if the seed doesn't actually generate the whole input string, output "Xorshift". Otherwise, output "LCG". EXPLANATION:In the first subtask, there are only $501$ seeds, and the length of the input string is at most $500$. Therefore, one can simply generate all $501$ sequences by the two generators. Afterwards, for each input string, we simply compare it against each of these sequences, and output the correct generator accordingly. One can roughly halve the running time by only comparing the input string to sequences from just one of the generators, say LCG. This is because there is guaranteed to be a unique answer, so if LCG doesn't generate it, then surely Xorshift generates it, so there's no need to check. For the second subtask, there are now $31314$ seeds, so it doesn't seem feasible to generate all the sequences up front. However, there is actually no need to generate the sequences up front. One can simply generate each sequence on the fly, as it is being compared to the input string. Now, this might seem slow, because $N$ can be up to $100000$. However, keep in mind that these are real pseudorandom generators that are actually being used in practice. Therefore, they produce reasonably good randomlike sequences, so there is a really really low probability that two random seeds will generate a common prefix of even $100$ bits! Therefore, you can expect to break out of the checking loop early when comparing the input string against the incorrect sequence, and this algorithm should pass the second subtask. However, for the last two subtasks, the seeds can now be up to $10^9$. This means that any technique involving comparing bit sequences will fail. This means that we cannot consider any more that the given generators are "black boxes" which generate an undecodable random bit sequence for every seed, so we need to somehow study the internals of these generators. LCGA linear congruential generator (LCG) has three parameters $(a,b,m)$, and it generates a sequence of integers such that every member $x_i$ can be computed from the previous element $x_{i1}$ as the following: $$x_i = (ax_{i1} + b) \bmod m$$ The sequence is kickstarted by a seed, which is the initial value $x_0$. The LCG given in the problem has the following parameters: $$(a,b,m) = (1103515245,12345,2^{32})$$ The modulus $2^{32}$ was selected because 32bit arithmetic are implicitly done modulo $2^{32}$, so this speeds up the computation a bit. However, the sequence $(x_i)_{i\ge 0}$ generates integers from $0$ to $2^{32}1$, so how does the LCG generate the binary sequence from it? To answer that, we need to look at the
and
Notice that $65536 = 2^{16}$ and $32768 = 2^{15}$, and But remember that we are working modulo $2^{32}$, which means that the $18$th LSB and above will not affect the lowerorder bits during addition and multiplication (this can be easily shown and is left for the reader to show). Therefore, we can safely ignore all but the $17$ LSBs of the seed, because the others will not affect the generated binary sequence in any way. But then, there are only $2^{17}$ possibilities for the last $17$ bits! Therefore, we have just shown that there are $2^{17}$ distinct binary sequences generated by the LCG. $2^{17}$ is just $131072$, so we can now easily compare the input string against each of these. By the same argument above, we can expect that we will break out of the checking loop early if we are comparing with the incorrect sequence, so this should be fast enough for the remaining subtasks! Some optimizationsWe will discuss a few optimizations to the above approach. First, we are assuming that we will break out of the checking loop early, but how early do we break in the worst case? By playing around with the $2^{17}$ sequences, one can deduce that the prefixes of these sequences become unique after just $33$ bits (for $32$ bits, two pairs of seeds generate the same $32$bit prefix: $(64042, 80441)$ and $(14905, 129578)$)! This means that we now guarantee that in the above approaches we will really break out of the checking loop early. In fact, we can do even better: once we find a sequence that matches the first $33$ bits of the input string, we don't need to check the remaining sequences any more, because we know that this is the only sequence with that $33$bit prefix! Therefore, once we find this candidate sequence, we can just compare the input string against it, and stop. On the other hand, if none of the sequences match the first $33$ bits of the input string, then we know that LCG won't generate it at all, so we immediately output "Xorshift". A further optimization is to precompute the first $33$ bits of all $2^{17}$ LCG sequences, and then compress each into a $64$bit integer. Then we do the same for the input string, and instead of comparing $33$ bits, we can simply do one comparison between $64$bit integers. This should reduce the running time by a bit! Finally, another possible optimization is to use a map whose keys are the first $33$ bits, and the values are the seeds of the sequence. This way, one can compute the candidate LCG sequence in $O(1)$ time by a single lookup, so each input string can now be answered without checking its first $33$ bits against all the others! Time Complexity:$O(N)$, with a precomputation that takes $33\cdot 2^{17}$ steps. AUTHOR'S AND TESTER'S SOLUTIONS:
This question is marked "community wiki".
asked 16 Mar '15, 01:46

Is there a simple solution for a case when higher bit is used for binary sequence generation (e.g. 30 instead of 16)? I missed the 17 bits observation during contest and wrote nasty brute force for all 2^32 possible seeds :/ answered 16 Mar '15, 17:54

another hacky way of solving this question would be to observe that there is a 50 chars input test, which clearly states that 50 chars are sufficiently enough to decide which generator it belongs to... as with all LCG's(by reading some wikipedia) one can expect that there is a pattern to this one since we are always retrieving the 17th bit. so precompute the sequence of large length(say 200000, since n <= 200000) and check if the first 50 chars of the testcase are in the precomputed LCG. (this is ok since t <= 30) answered 16 Mar '15, 21:46
