SEAPROAR - Editorial

kevinsogo · March 16, 2015, 1:46am

PROBLEM LINK:

Practice
Contest

Author: Sergey Nagin
Tester: Hiroto Sekido
Editorialist: Kevin Atienza

DIFFICULTY:

MEDIUM

PREREQUISITES:

Modular arithmetic, bitwise operations

PROBLEM:

Two pseudorandom generators of binary sequences are described (LCG and Xorshift), where each generator is initialized with a 32-bit unsigned integer. Now, for each input string, you have to determine which generator generated it.

The details and code implementation of the two generators can be read from the problem statement.

QUICK EXPLANATION:

In the LCG generator, only the 17 least significant bits of the seed matter: it can be shown that the higher order bits won’t affect the generated binary sequence. Thus, there are at most 2^{17} = 131072 distinct binary sequences.

Furthermore, the first 33 bits of all these 2^{17} sequences are distinct. Therefore, one can simply test the first 33 bits of the input string against each of these, and find a candidate seed that could generate the input string. If there is none, or if the seed doesn’t actually generate the whole input string, output “Xorshift”. Otherwise, output “LCG”.

EXPLANATION:

In the first subtask, there are only 501 seeds, and the length of the input string is at most 500. Therefore, one can simply generate all 501 sequences by the two generators. Afterwards, for each input string, we simply compare it against each of these sequences, and output the correct generator accordingly.

One can roughly halve the running time by only comparing the input string to sequences from just one of the generators, say LCG. This is because there is guaranteed to be a unique answer, so if LCG doesn’t generate it, then surely Xorshift generates it, so there’s no need to check.

For the second subtask, there are now 31314 seeds, so it doesn’t seem feasible to generate all the sequences up front. However, there is actually no need to generate the sequences up front. One can simply generate each sequence on the fly, as it is being compared to the input string. Now, this might seem slow, because N can be up to 100000. However, keep in mind that these are real pseudorandom generators that are actually being used in practice. Therefore, they produce reasonably good random-like sequences, so there is a really really low probability that two random seeds will generate a common prefix of even 100 bits! Therefore, you can expect to break out of the checking loop early when comparing the input string against the incorrect sequence, and this algorithm should pass the second subtask.

However, for the last two subtasks, the seeds can now be up to 10^9. This means that any technique involving comparing bit sequences will fail. This means that we cannot consider any more that the given generators are “black boxes” which generate an undecodable random bit sequence for every seed, so we need to somehow study the internals of these generators.

LCG

A linear congruential generator (LCG) has three parameters (a,b,m), and it generates a sequence of integers such that every member x_i can be computed from the previous element x_{i-1} as the following:

x_i = (ax_{i-1} + b) \bmod m

The sequence is kickstarted by a seed, which is the initial value x_0. The LCG given in the problem has the following parameters:

(a,b,m) = (1103515245,12345,2^{32})

The modulus 2^{32} was selected because 32-bit arithmetic are implicitly done modulo 2^{32}, so this speeds up the computation a bit.

However, the sequence (x_i)_{i\ge 0} generates integers from 0 to 2^{32}-1, so how does the LCG generate the binary sequence from it? To answer that, we need to look at the nextInteger1 and generator1 functions. The relevant lines are the following:

return (X / 65536) % 32768;

and

A[i] = nextInteger1() % 2;

Notice that 65536 = 2^{16} and 32768 = 2^{15}, and / means integer division, so the first line simply means “remove the last 16 bits”. However, the second line shows that we only need the least significant bit (LSB) of this result. Therefore, combining the two, we see that the binary sequence generated is simply the sequence of the $17$th LSB of the sequence (x_i)_{i\ge 1}.

But remember that we are working modulo 2^{32}, which means that the $18$th LSB and above will not affect the lower-order bits during addition and multiplication (this can be easily shown and is left for the reader to show). Therefore, we can safely ignore all but the 17 LSBs of the seed, because the others will not affect the generated binary sequence in any way. But then, there are only 2^{17} possibilities for the last 17 bits! Therefore, we have just shown that there are 2^{17} distinct binary sequences generated by the LCG. 2^{17} is just 131072, so we can now easily compare the input string against each of these. By the same argument above, we can expect that we will break out of the checking loop early if we are comparing with the incorrect sequence, so this should be fast enough for the remaining subtasks!

Some optimizations

We will discuss a few optimizations to the above approach.

First, we are assuming that we will break out of the checking loop early, but how early do we break in the worst case? By playing around with the 2^{17} sequences, one can deduce that the prefixes of these sequences become unique after just 33 bits (for 32 bits, two pairs of seeds generate the same 32-bit prefix: (64042, 80441) and (14905, 129578))! This means that we now guarantee that in the above approaches we will really break out of the checking loop early.

In fact, we can do even better: once we find a sequence that matches the first 33 bits of the input string, we don’t need to check the remaining sequences any more, because we know that this is the only sequence with that 33-bit prefix! Therefore, once we find this candidate sequence, we can just compare the input string against it, and stop. On the other hand, if none of the sequences match the first 33 bits of the input string, then we know that LCG won’t generate it at all, so we immediately output “Xorshift”.

A further optimization is to precompute the first 33 bits of all 2^{17} LCG sequences, and then compress each into a 64-bit integer. Then we do the same for the input string, and instead of comparing 33 bits, we can simply do one comparison between 64-bit integers. This should reduce the running time by a bit!

Finally, another possible optimization is to use a map whose keys are the first 33 bits, and the values are the seeds of the sequence. This way, one can compute the candidate LCG sequence in O(1) time by a single lookup, so each input string can now be answered without checking its first 33 bits against all the others!

Time Complexity:

O(N), with a precomputation that takes 33\cdot 2^{17} steps.

AUTHOR’S AND TESTER’S SOLUTIONS:

setter
tester

azukun · March 16, 2015, 5:54pm

Is there a simple solution for a case when higher bit is used for binary sequence generation (e.g. 30 instead of 16)? I missed the 17 bits observation during contest and wrote nasty brute force for all 2^32 possible seeds

eightnoteight · March 16, 2015, 9:46pm

another hacky way of solving this question would be to observe that there is a 50 chars input test, which clearly states that 50 chars are sufficiently enough to decide which generator it belongs to…

as with all LCG’s(by reading some wikipedia) one can expect that there is a pattern to this one since we are always retrieving the 17th bit. so precompute the sequence of large length(say 200000, since n <= 200000) and check if the first 50 chars of the testcase are in the precomputed LCG. (this is ok since t <= 30)