HOB - Editorial

editorial
hob
jan13
medium
pigeonhole

#1

PROBLEM LINK:

Practice
Contest

Author: Vinayak Garg
Tester: Anton Lunyov
Editorialist: Anton Lunyov

DIFFICULTY:

MEDIUM

PREREQUISITES:

Pigeonhole principle

PROBLEM:

It is best described in the problem statement. See the next section for details of how to get rid of some cumbersome stuff in the problem statement and reveal the actual non-trivial problem we have here.

EXPLANATION:

  1. Time stamp. To restore the actual time stamp of the visit we should have variable day that equals to the day of the visit assuming that days are numbered from zero. Also we should save the previous visit day minute. Then for the current visit with parameters H and M we calculate its day minute as 60 * H + M and compare it with the previous visit day minute. If the current visit day minute is strictly smaller, then the current visit happens next day and we increase day by 1, otherwise the day does not change. Now the time stamp of the current visit is simply 24 * 60 * day + 60 * H + M which we denote as cur.

  2. Total inconvenience. To model inconvenience caused to all previous guests we can declare variable inconv that is initially set to zero and after analyzing each guest it is increased by the inconvenience caused to this guest. To avoid integer overflow you should apply mod R after each increasing.

  3. RNG value. The simplest way to calculate RNG value is to use Horner algorithm performing all operations modulo R and then add inconv in the end (also modulo R). See tester’s solution and function hash() there as a reference.

  4. Freeing time. To process each guest we need to know for each room when it will be free. Hence denote by time[r] the time when the room r will be free. So initially time[r] = 0 for all r from 0 to R-1, inclusive. The room r should be considered as free for the current visit if and only if time[r] ≤ cur.

  5. Waiting guest. If for the current guest we did not find the room, then waiting time for him is minimum of time[r] − cur over all rooms r we are visited during the lift boy’s scheme.

  6. Settled guest. If we find the free room, say room r, (so we have time[r] ≤ cur), then we update time[r] to be cur + 60 * G, since the guest will occupy this room for G hours (60 * G minutes) starting from the time of his visit, which is cur.

  7. Lift boy’s scheme. So now the only non-trivial part is to perform the lift boy’s scheme. Stating it using modulo operation we get the following problem. Let A[k] = k for k = 0, 1, 2, and A[k] = A[k-1] + A[k-2] + A[k-3] for k = 3, 4, 5, …. Set sumA[k] = A[0] + A[1] + … + A[k] for k = 0, 1, 2, 3, … (we also set sumA[-1] = 0 for convenience). Then the infinite sequence of rooms we should process in the lift boy’s scheme is r[k] = (r0 + sumA[k]) mod R, where k = 0, 1, 2, 3, … and r0 is the initial room assigned to the guest.

  8. Modeling is too slow. It is important to realize that without some clever preprocess we can’t perform the lift boy’s scheme fast enough. Namely, when R = 483, the guest was assigned to the room 0 and the only free room in the hotel is 481 then we need to perform 13333 steps by the lift boy’s scheme in order to find this room for the guest. Probably Chef will fire the poor lift boy after he will notice this :slight_smile:

  9. Avoid repetitions. So we should do some analysis. Clearly we could meet some rooms several times during the lift boy’s scheme. So actually the only essential thing, we should know for each room r, is the smallest index k such that r[k] = r. But some rooms could be not present in this sequence (the smallest R when this happens is R = 8). So for each room r we should know the value firstVisit[r], which is either -1 if the room does not belong to the sequence {r[k]} or the smallest index k for which r[k] = r. Having this information for each room we now can easily process each guest in time. Namely, let’s iterate over all different rooms that belongs to the sequence {r[k]} in order they were visited in the lift boy’s scheme. For each room we compare time[r] and cur and do analysis and actions as in paragraphs 5 and 6, where inconvenience caused to the settled guest coincide with firstVisit[r] if it was settled to the room r. See how this step is implemented in tester’s solution to clear your doubts.

  10. Only r0 = 0 is essential. Indeed, if (ki, ri), i = 1, 2, …, L, is the sequence of rooms that belongs to the sequence r[k] with r0 = 0, paired with their corresponding indexes k, then (ki, (r0 + ri) mod R), i = 1, 2, …, L, is the same sequence but for arbitrary r0. So we are in position to find all different values that belong to the sequence W = {sumA[k] mod R : k = 0, 1, 2, 3, …}.

  11. Periodicity. The key observation on the way to the correct (I mean proved ;)) solution is to notice that the sequence W is periodic, that is, there exists such n > 0 that sumA[k+n] mod R = sumA[k] mod R for all k ≥ 0. The number n is called the period. Below we will prove that such n exists for each R and does not exceeds R4. But probably due to some randomness in the behavior of the function f() constructed in the proof, birthday paradox takes place and the period grows like O(R2) (at least for values of R up to 500). Namely, we have only 8 values of R up to 500 for which this period exceed 200,000:
    R = 345,   n = 222859,
    R = 411,   n = 245791,
    R = 426,   n = 265876,
    R = 445,   n = 248341,
    R = 449,   n = 202051,
    R = 467,   n = 218557,
    R = 483,   n = 345072,
    R = 497,   n = 245424,

    with maximal value of n / R2 equals to 1.87 for R = 345 and maximal value of n equals to 345,072 for R = 483.
    Now it is clear that we can consider only first n steps in the lift boy’s scheme in order to meet all rooms that belong to the sequence W.

  12. The final step. As we will see in the proof the value of n is period if we have
    sumA[n-1] mod R = 0, A[n] mod R = 0, A[n+1] mod R = 1 mod R, A[n+2] mod R = 2 mod R
    (it is important to apply mod R to 1 and 2 to handle corner cases R = 1 and R = 2 properly).
    Hence for the given R we loop over k calculating A[k] and sumA[k] modulo R by their definition and check this periodicity condition in order to know when to break the loop (as we know that we will not meet any new room after this). Also for r = sumA[k] mod R we check whether it was visited before and if no then assign firstVisit[r] = k and add pair (k, r) to the list of ordered pairs which we need later to analyze each guest. See tester’s solution as a direct implementation of this approach. In author’s solution only array firstVisit[r] is calculated which is called probes[r] there and then for each guest we iterate over all rooms in order 0, 1, 2, …, R-1 and seek for the free room with the minimal probes[r] value.

  13. Lazy calculations. Calculation of the list (ki, ri) for each test from scratch is too slow since we could have test having T = 1000 and R = 483 for each test. Hence after calculation of this sequence for some R we should save it and next time we meet this value R simply take the saved value. Refer to the tester’s solution. Now we have a complete and fast enough solution for this problem.

PROOF OF THE PERIODICITY:

Here we prove that the sequence {sumA[k] mod R} is periodic and period does not exceed R4.
The proof will be based on the Pigeonhole principle.

Consider quadruples Q[k] = {sumA[k-1] mod R, A[k] mod R, A[k+1] mod R, A[k+2] mod R}.
We will prove that the sequence {Q[k]} is periodic. Then clearly sequence W also will be periodic with the same period. Note also that condition Q[n] = Q[0] is equivalent to the condition of periodicity used in the paragraph 12 above.

Since
sumA[k] = sumA[k-1] + A[k]
and
A[k+3] = A[k] + A[k+1] + A[k+2],
then
Q[k+1] = f(Q[k]),
where
f({a, b, c ,d}) = {(a + b) mod R, c, d, (b + c + d) mod R}.

Note that each Q[k] is a quadruple of the form {a, b, c, d} where 0 ≤ a, b, c, d < R. There are R4 such quadruples.
Hence by pigeonhole principle among quadruples Q[0], Q[1], …, Q[R4] there exist at least two equal ones.
Namely, let Q = Q[j]*, where 0 ≤ i < j ≤ R4.

Next note that
Q[k] = g(Q[k+1]),
where
g({a, b, c ,d}) = {(a − (d − c − b)) mod R, (d − c − b) mod R, b, c},
which follows from the equations
A[k] = A[k+3] − A[k+2] − A[k+1]
and
sumA[k-1] = sumA[k] − A[k] = sumA[k] − (A[k+3] − A[k+2] − A[k+1]).

Hence if i > 0 then Q = Q[j]* implies Q[i-1] = g(Q) = g(Q[j]) = Q[j-1]*.
Continue this we obtain that Q[i-k] = Q[j-k] for 0 ≤ k ≤ i
and hence setting n = j − i and putting k = i in this equation we get Q[n] = Q[0].

But then Q[n+1] = f(Q[n]) = f(Q[0]) = Q[1], Q[n+2] = f(Q[n+1]) = f(Q[1]) = Q[2], and so on.
So we get that Q[n+k] = Q[k] for all k ≥ 0.
Since 0 ≤ i < j ≤ R4 then n = j − i ≤ R4 and we are done.

ALTERNATIVE SOLUTIONS:

Instead of iterating over the whole period of the sequence sumA[k] mod R we could cheat and break from the loop earlier. The above example in the paragraph 8 with R = 483 actually contains the largest possible value of firstVisit[r] over all values of R and r. But the only fair way to notice this is, of course, to use periodicity reasoning. Now we could simply choose some constant larger than this value (for example 14000) and break the loop for R after 14000 iterations. This cheat allows to calculate array firstVisit[] every time without saving previously calculated results. See author’s solution as a reference to this cheat.

Another cheat that probably could allow to analyze values of R up to 5000 or even 10000 is to store in the program code all pairs (k, r) with say k > 10000 over all R. This should be done using some exhaustive search over all R that could take several minutes (or even hours for large limit on R) using periodicity reasoning. Refer to the ACRush solution where he uses 2000 as the lower bound on k.

ALTERNATIVE PROOF OF THE PERIODICITY:

We can use the following formula
sumA[k] = sumA[k-1] + sumA[k-2] + sumA[k-3] + 2
to prove that period does not exceed R3.
This formula could be easily proved by induction.

Then in the proof above we should consider the triples
P[k] = {sumA[k] mod R, sumA[k+1] mod R, sumA[k+2] mod R}
and use formulas
P[k+1] = f(P[k]), where f({a, b, c}) = {b, c, (a + b + c + 2) mod R},
and
P[k] = g(P[k+1]), where g({a, b, c}) = {(c − b − a − 2) mod R, a, b}.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.
Tester’s solution can be found here.

RELATED PROBLEMS:

Will be provided soon. All readers are welcomed to provide related problems in comments.


#2

I think the upper bound on period of Q(k) = {sumA[k] mod R} can be proved to be R^3 by using this recurrence: Q(k) = Q(k-1) + Q(k-2) + Q(k-3) + 2. Only 3 past states in this recurrence and a constant in addition, so there exists only R^3 possible residue triplets. Correct me if i am wrong.


#3

As I saw the Author and Tester’s solution, I found that solutions of complexity Nr (per testcase) should be accepted.
Please have a look on my solution which also have complexity N
(r+log®) (which is nearly equal to N*r) per testcase.
But It was always TLE :frowning:


#4

“Total inconvenience. To model inconvenience caused to all previous guests we can declare variable inconv that is initially set to zero and after analyzing each guest it is increased by the inconvenience caused to this guest. To avoid integer overflow you should apply mod R after each increasing.”. Where in the problem statement it is written that we need to take mod R of inconvenience ? Am i missing something ?


#5

This Solution of mine also gives a TLE however i look to be following the same approach can anyone please tell where can i save mor time and optimize the code http://www.codechef.com/viewsolution/1729405


#6

Yes, you are right.
I’ve noticed during the contest that several contestants use this formula.
Maybe I will add corresponding explanation later.


#7

you are excessively using % operator. remove that, especially while querying.


#8

The time limit is a bit strict so not any O(N * R) solution could pass. You don’t need long long type at all in the solution. You could do all calculations modulo R. Long long may slow down your program 2-3 times and no allow to pass the TL.

BTW, problem setter wanted to set TL as 1 second having solution with running time near 0.6-0.7. I’ve hardly persuaded him to set TL at least as 1.5 seconds :slight_smile:


#9

@anton: my solution is O(N * R) and it passed only when i removed the % from the calculation of next room to look for. basically he is using % operator O(N * R) times which is slowing down his solution and yes long long can be easily avoided. BTW, can you tell me the number of test files in this problem and how much time my AC code takes on the worst test file? :slight_smile:


#10

@sid_gup 21 test files. The max time is 1.27.


#11

You should figure it out by yourself. Note that RNG value is the sum of the hash value of the name and the total inconvenience and this sum is taken modulo R then to get the room. Hence it makes sense to store total inconvenience also mod R.

Actually if you will not use this the total inconvenience could overflow 32-bit integer so you should use 64-bit integer type instead. IMHO it is easier to realize that we should store inconvenience mod R than in 64-bit integer type :slight_smile:


#12

Thanks for clarifying it:) This point will come handy in future!!!


#13

You have (rng+i)%R 4 times in the main loop over rooms.
Modulo operation is very slow.

At first I simply store this value at some variable:
int rr=(rng+i)%R,
and use everywhere in the loop which reduces number of modulo operations from 4 to 1. But still get TLE:
http://www.codechef.com/viewsolution/1745003

Then I replace this by:
int rr = rng+i;
if(rr>=R) rr-=R;
to get rid of modulo operation completely and get AC:
http://www.codechef.com/viewsolution/1745019
With maximal time per test equal to 0.98 (out of 1.5)

Conclusion is: always try to avoid modulo operation.


#14

Thanks for telling me . I had been trying to figure the problem since the challenge . :slight_smile: