×

# CHEFPAR- Editorial

Setter- Misha Chorniy
Tester- Misha Chorniy
Editorialist- Abhishek Pandey

CHALLENGE

# PRE-REQUISITES:

Varying. Challenge problem is usually an application of your skills in competitive programming in general.

# PROBLEM:

Given an array $A[]$ of randomly generated sequences, we have to add some integer $D_i$ (need not be distinct) to each array element $A_i$, where $0\le D_i\le K$. Our goal is to maximize $\frac{1}{M}\sum_{i=1}^{M} B_i$ where $B_i=(A_1*A_2...*A_N)\%P_i$

# QUICK ANALYSIS:

It seemed that contestants faced problems in getting a good solution. We're concluding that, because, Some of the trivial solutions got too high points than what they should have got. Majority of the contestants simply printed back the input, or used $input+rand()\%K$ &etc.

# ANALYSIS:

The first thing I want to say is that, this editorial is incomplete. And it will remain so, until you guys dont contribute! Its impossible to have any hard and fast approach for the challenge problems, and for the editorial to be at its full potential, I want to request you guys to discuss your approach. Benefit in challenge problem is better gained by discussion, seeing flaws of your approach and analyzing other's strengths- and seeing what worked well, and to what degree. Hence, I would request the community to put forward their approach and intuition behind the question :).

As for editorial, I will try to discuss some of the approaches I saw, in both div1 and div2. I hope you people like it :)

1. Div2-

Not even 10 minutes passed from start of contest on the historical date of $6th$ $April,2018$ when Div2 had got the first few accepted solutions. I had a guess in mind, and I,curious as a doomed cat, decided to see what they did and confirm my intuition. And I dont know if it was fate, or destiny, or perhaps something else, but what I saw was an exact picture of what I had in my mind...

cin>> arr[i]; //Take array input. cout<< arr[i];//Print it back.

This approach got something $\approx 85-88$ points. It was $88.7$ when I checked last on $14th$.
Further solutions were also on similar lines. Eg-

cin>>arr[i]; cout<< arr[i]+rand()%k;

This one got $85.8$ points then. Sad luck for that guy :/

cin>>arr[i]; cout<< arr[i]+k;

This solution performed better, and got around$88-89$ points on average.

Some of the better solutions at div2 which got $\ge90$ involved-

1. Choose one of the primes. Lets call it $P$ Either largest, smallest, middle one or randomly any.
2. Make array ideal according to that prime, i.e. add $D_i$ so that $(A_1*A_2..*A_N)\%P$ is maximum.
3. Pray that this solution gets better points.

By roughly around 20-25 submissions, people experimented with what prime to take. Most of them settled on the median prime.

A good number of approaches used simulation and storing array and its result. Eg-

cin>>arr[i]; Store arr[i]+rand()%k;//Store in a vector etc. Compute score for the just stored array. Repeat above 250-400 times. Print the configuration with maximum score

Depending on luck and number of simulation, the above approach fetched $88-94.7$ points. I saw quite a few with $94.7$ points.

Some of the top approaches also use the concept of simulating till just about to time-out. The contestants chosed a distribution (random, or some other) which they simulated for $\approx3.8-3.95$ seconds where they sought to see which choice of $D_i$ is increasing score for a particular $A_i$. When about to time out, they aborted the process and printed the output they got.

2. Div1

The performance of Div1 was kind of similar to Div2 xD. One of the codes which got $91.1$ points at pretest was-

cin>>A[i]; cout<< A[i]+K/2;

Most of the approaches were common- like simulation till best answer, or take $rand()$ values $300-400$ times. Omitting the common approaches, the approaches of top solutions were quite distinct. (However, most of them are hard to decipher due to 300+ lines of code).

Some crucial optimizations were, however, seen. For example, lets say I got some values of $D_1,D_2...D_N$ and calculated the value of $Score=(A_1+D_1)*(A_2+D_2)*...*(A_N+D_N)\%P_i$. The top solutions preferred to change $D_i$ one by one, and re-calculate $Score$ in $O(Log(A_i+D_i))$ by using inverses as - $NewScore=({A_i+D_{old}})^{-1}*Score*(A_i+D_{new})\%P_i$. This method got $95+$ points on pretest.

Some of the good top codes deserve a mention here. These codes are what one can call crisp :)

1. Python Code by @gorre_morre
2. C++ by @anoxx_23
3. C++ by @mihaibunget

As usual, I want to invite everybody (yes, everybody, not just the top scorers) to discuss their approaches so that we can have an engaging and learning insights into various intuitions :)

# CHEF VIJJU'S CORNER:

1. The very first solution submitted by tester to test this problem was also simply printing back the input xD. After that as many as $16$ more submissions were made.

2.Lets discuss the setter's approach here. Lets take a random subset (serveral times) of array $P[]$ and multiply the primes in this subset. Lets call their multiple $MUL$. Now, we now that $MUL\%P_i=0$ where $P_i$ is a prime in the subset.Now, our aim is to get the value of $(A_1+D_1)*(A_2+D_2)....*(A_N+D_N)$ closer to (preferably exactly equal to) $MUL-x$. This is because, by applying $\%$ operation for various $P_i$, we will be left with $MUL\%P_i-x\%P_i=(-x)\%P_i$. We can go greedily and try to factorize $MUL-1,MUL-2,...,MUL-x$ (The exact value of $x$ can be experimented upon). The factorization will help us in manipulation of $D_i$ values.

One thing to check is that, $(A_1+K)*(A_2+K)..*(A_N+K)$ must have a value more than $MUL$ else the above scenario may not be possible. We can check this by using $log$ function, i.e. by changing expression $(A_1+K)*(A_2+K)..*(A_N+K)$
to-
$Log(A_1+K)+Log(A_2+K)+...+Log(A_N+K)$.

Similarly, $MUL$ can be expressed as $(Log(P_1) +Log(P_2)+....+Log(P_N))$

You can find good practice problem here

3.There is also a better random generator in C++ known as Mersenne Twister or mt19937. It is available from C++11 and later. Some of the advantages it has over rand() are that-

• mt19937 has much longer period than that of rand. This means that it will take its random sequence much longer to repeat itself again.
• It much better statistical behavior.
• Several different random number generator engines can be initiated simultaneously with different seed, compared with the single “global” seed srand() provides
This question is marked "community wiki".

15.5k12066
accept rate: 18%

19.8k350498541

# Problem description:

Given an integer $K$ and integers $A_i,\ldots, A_N$, and primes $P_1,\ldots, P_M$, find $D_1,\ldots,D_N$ satisfying $0\le D_i\le K$, such that $\frac{1}{M}\sum_{j=1}^M B_j$ is as large as possible, where $B_j=\prod_{i=1}^N (A_i+D_i) \mod P_j$.

We are given four different parameter sets. In each case $P_1,\ldots, P_M$ are consecutive primes with $P_1$ between $1.1\times10^9$ and $1.5\times10^9$, and $1\le A_i\le 10^9$ are independently random.

The four parameter types are (1) $N=100$, $M=10000$, $1\le K\le1000000$, (2) $N=1000$, $M=1000$, $1\le K\le1000000$, (3) $N=1000$, $M=1000$, $1\le K\le1000$, (4) $N=10000$, $M=100$, $1\le K\le1000000$.

# First attempt

In general this problem is a bit awkward because we want to fix an additive property of the $B_j$s (like being in the range $0.9P_j$ to $P_j-1$), but if we vary one of the $D_i$, it changes all of the $B_j$ multiplicatively, which more-or-less randomises $B_j$ additively. Changing a single $D_i$ makes a huge change to the product of $(A_1+D_1)\ldots(A_N+D_N)$, and reducing a huge number (approximately $31N$ bits) modulo $P_j$ (approx $31$ bits) is a "chaotic" operation.

So I believe that to first approximation this "machine" that we are given that generates $B_j$ behaves as a random number generator and we are being asked to choose a seed (the initial $A_i+D_i$) to maximise the sum of uniform random numbers. This means that "hill climbing" from one solution to another by small changes in the $D_i$ won't be any better than starting afresh with a new configuration of $D_i$s.

Looking at it like this allows you to write a simple program that would end up in about 5th position in the combined final Div1+Div2 ranklist. Example program here.

This first attempt just generates instances of $B_1,\ldots,B_M$ as fast as possible, and takes the best score it finds. You can do this moderately fast by fixing $D_2,\ldots,D_N$ and making $k=D_1$ range over $0,\ldots,K$. That way $B_j$ is of the form $AA_{0,j}+k.AA_{1,j}$ where $AA_{i,j}=\prod_{i'=i+1}^N (A_{i'}+D_{i'}) \mod P_j$ (and $D_1=0$ in this product). If we invert the order of the inner two loops (over $1\le j\le M$ and $0\le k\le K$) then we don't need to do a multiplication or full modulo reduction. The inner loops look like (replacing $K$ with $K+1$):

val=[0]*K
for j in range(M):
p=P[j]
b=AA[0][j];c=AA[1][j]
for k in range(K):
val[k]+=b
b+=c
if b>=p: b-=p


This inner loop goes at about $160\times10^6$ loop iterations per second under pypy, and about $620\times10^6$ iterations per second under C++ (measured on my desktop, which seems to be similar to CodeChef's servers).

The factor of 5 that CodeChef allows Python over C++ seems to be slightly favour pypy here since the measured speed ratio is about 4, but this pypy speed is a bit fragile and depends on using integers in a simple way: the C++ version admits optimisations that wouldn't work well under pypy, so I think the factor of 5 is reasonably fair here. In particular, oleg_b's C++ program cleverly reorders the modular reduction in a way that avoids unpredictable branches and hops through the k loop in steps of about 4. This manages a hefty $2350\times10^6$ loop iterations per second.

Anyway I prefer writing in Python, so I'll stick with that here.

## Analysis

Let's rescale and pretend that we are adding up $M$ independent uniform real random numbers in the range -1 to 1. The normal approximation works pretty well here, even at the tail. The variance of $U[-1,1]$ is $1/3$ and in the allowed time (20s for Python) we can generate about $166\times10^6\times20/M$ instances, so to work out the largest sum we might manage we take the $M/(20.166\times10^6)$ point of the of tail of the normal distribution:

from scipy.stats import norm
for M in [10000,1000,100]: print M,-norm.ppf(M/(166e6*20))*(M/3.)**.5


which gives (unnormalised) scores of about 261, 91, and 31 for $M$ equal to 10000, 1000, 100. The rules of the contest meant that we had to divide these sums by $M$ to get the actual scores (rather than $\sqrt{M}$ which would equalise the variance), which means that type 4 ($M=100$) is much more important to optimise than type 1 ($M=10000$), because the possible relative changes (after dividing by $M$) are about 10 times greater for type 4.

Notice that 31 in the above scale has the interpretation as the number of "top" values of $B_j$. Scoring a sum of 31 is equivalent (on average) to fixing 31 of the $B_j$s equal to their maximum value of $P_j-1$ and leaving the other $B_j$s as random.

It is relevant that this number, 31, is quite big. If instead of this random brute force we try to do something clever that fixes some of the $B_j$, then it's no use just fixing, say, 15 of them if that's all we do, because that will only score 15 on this scale and so will lose out to the sampling as above. We need to be able to fix some of the $B_j$ in such a way that it is fast to try lots of random variations, and that is harder to do (but possible - see below).

# Second attempt

If you imagine doing the inner loops in the other order, with the loop over primes in the inside, then it's clear the process is wasteful. After you add up some of the $B_j$s it might that the total isn't very high, and so it's unlikely that the total for all of the $M$ $B_j$s is to be exceptionally high, and we only care about exceptionally high totals. In such a situation it would pay to abandon the current instance midway and start a new one.

So the inner two loops should look something like this

for k in range(K):
v=0
for j in range(M):
if v < thr[j]: break
p=P[j];b=AA[0][j];c=AA[1][j]
v+=(b+k*c)%p


where thr[j] are some suitably-chosen thresholds. Unfortunately we more-or-less have to write the loops this way around ($K$ then $M$) to use this method and can't take that much advantage of the fact that $k$ increments by 1, which means we take an efficiency hit before we can realise the gains. But the gains are quite substantial, so we still win out. There is a question of how to measure the effectiveness this method. It's no longer possible just to count inner loop iterations per second, because they are not all equal. (Reaching large $j$ is better than small $j$.) It's also too random to measure how well it does in terms of its final answer - that way it would take hours to collect reliable statistics. One solution is to decide on a benchmark score, some quite rare level, but not too rare, and count how many times this is achieved. For the discussion here we can take the benchmark level to be $B=\frac{1}{2}MP(1+\left(\frac{2\log(10^6/M)}{3M}\right)^\frac{1}{2})$, derived from the normal approximation mentioned above, where $P$ is a typical prime in the sequence $P_1,\ldots,P_M$ (they are all very close).

The optimal threshold curve, thr[j], can be worked out from a standard Gittins Index calculation, but the formula thr[j]=$\frac{1}{2}(j-1-65/K)P+(1-(1-j/M)^\frac{1}{2})(B-\frac{1}{2}MP)$ seems to approximate this reasonably well. For type 4, the first attempt above achieves about 13 benchmarks per second (bm/s), and this second attempt with the above threshold curve achieves about 28 bm/s.

# Continued fraction method

The previous methods effectively improve the constant factors: they still require $\Theta(M)$ time to evaluate a configuration of $(A_i+D_i)$s because on average the product is evaluated modulo some definite proportion of the $M$ primes, say $M/10$ primes on average.

It's possible to reduce this to (amortised time) $O(\sqrt{M(1+M/K)})$. This sounds good, though actually it won't turn out to be enormously useful here because the parameters for this challenge are not so favourable for this method.

Go back to the loop order from the first method ($K$-loop on the inside). We'd like to calculate the $KM$ values $B_j+kC_j \mod P_j$ for $1\le j\le M$ and $0\le k\lt K$ as fast as possible.

For each $j$, pick the best rational approximation to $C_j/P_j$, say $c_j/p_j$, with denominator $p_j$ at most some fixed constant $m$ to be optimised. $c_j, p_j$ can be calculated effectively using a continued fraction. Then $B_j+kC_j\mod P_j$ is nearly periodic in $k$ with period $p_j$. More exactly, if $k_0+k_1.p_j$ ($0\le k_0< p_j$), then $$B_j+kC_j = B_j+(k_0+k_1p_j)C_j = B_j+k_0C_j+k_1(p_jC_j-P_jc_j) = B_j+k_0C_j+k_1\Delta_j \mod P_j,$$ where $\Delta_j=p_jC_j-P_jc_j$.

For each $j$, we would like to store some information that allows us to calculate $B_j+k_0C_j+k_1\Delta_j \mod P_j$ for all $k< K$, but do so in a way that only takes $O(p_j+K/p_j)$ time, rather than $O(K)$ time, and also do so in a way that can be summed over all $j$. This is possible because $\Delta_j/P_j$ is usually small, so the $k_0$ and $k_1$ pieces of $B_j+k_0C_j+k_1\Delta_j \mod P_j$, which interact via the $\mod P_j$, don't interact too much.

Fixing $k_0$ for the moment, the effect of the $\mod P_j$ term is to subtract off multiples of $P_j$ at intervals of $k_1$, and these intervals be large if $\Delta_j$ is small, which it usually is, so we can use a sparse array. Instead of looping over $k_1$, we record the $k_1$s which will require an extra multiple of $P_j$ to be subtracted. When we process this in the next phase, we will have to take cumulative sums to recover the relevant multiple of $P_j$. In symbols, $$(B_j+k_0C_j+k_1\Delta_j)\%P_j = (B_j+k_0C_j)\%P_j+k_1\Delta_j - P_j\sum_{n\ge1}I(k_1\ge (nP_j-(B_j+k_0C_j)\%P_j)/\Delta_j),$$ so we subtract $P_j$ from an adjustment array in positions $(nP_j-(B_j+k_0C_j)\%P_j)/\Delta_j$ for $n=1,2,\ldots$, until this position reaches $K/p_j$. This first phase code could look something like this (a slight variation of the above):

D=p*C-P*c
for k0 in range(p):
vl[p][k0]+=B
dl[p][k0]+=D
if D>0:
x=P-1-B
while 1:
q=(x//D)*p+k0
if q>=K: break
x+=P
elif D<0:
x=B
while 1:
q=(x//-D)*p+k0
if q>=K: break
x+=P
B+=C
if B>=P: B-=P


When this has been done for all $1\le j\le M$, we can recover the sum we want, val[k]=$\sum_{j=1}^M (B_j+kC_j)%P_j$ by adding over the $k_0-based$ arrays and accumulating over the adjustment arrays, like this:

for p in range(1,maxp):
for k0 in range(p):
v=vl[p][k0];D=dl[p][k0]
for k in range(k0,K,p):
val[k]+=v


There are about $K/(P_jp_j/|\Delta_j|)+1$ iterations in the inner loop of the first phase, so about $K|\Delta_j|/P_j+p_j$ steps altogether. Taking $C_j/P_j$ to be a random number in $[0,1)$, the average value of $|\Delta_j|/P_j=p_j|C_j/P_j-c_j/p_j|$ will be around $0.42/m$ and the average value of $p_j$ around $0.47m$, though the actual time constants depend on the implementation. So the first phase might take arout $M(K/m+m)$ or so steps. The second phase takes $K m$ steps, so the total time is around $M(K/m+m)+Km=MK/m+(M+K)m$. The optimal value of $m$ is $\sqrt{MK/(M+K)}$ and overall it takes about $\sqrt{MK(M+K)}$ steps, which is a big saving over $MK/10$ (something like "second attempt" running time) if $M$ and $K$ are both large. This is an example implementation.

The main benefit of this method is for type 1 ($M=10000$) instances, but slightly unfortunately for this method, the scoring system in this contest weights $\sum_j B_j$ by $1/M$ (as opposed to $1/\sqrt{M}$ which would equalise the variances), so there is less possible variation here. Also, this method doesn't work well in conjunction with the "second attempt" threshold-based speedups. Still, there is a large speedup benefit, even if this doesn't translate to a huge score gain: if $K$ is moderately large it manages about 400 bm/s for type 1 compared with about 42 bm/s from the "second attempt" above.

For type 4, it manages about 46 bm/s, which is better than the above, but we can do better as we'll see below.

# Is there a way to "invert the random number generator"?

Is it possible to make some of the $B_j$ equal to $P_j-1$ (the best value) or, for example, force them to be greater than $0.9P_j$ by somehow solving for $D_i$ in a way that doesn't require too big a search?

For example, it is suggested in the above editorial that you choose a set of $N$ (out of $M$) primes, say $P_1,\ldots,P_N$, and a smallish number $x$ and try to factorise $Z=P_1 P_2 \ldots P_N-x$ into $(A_1+D_1)\ldots(A_N+D_N)$. The trouble with this is that it requires (at a minimum) that the number $Z$ doesn't have any factors bigger than $1/N$ times the size of $Z$. The chance of this happening is governed by the Dickman function and is roughly $N^{-N}$. In our examples, $N$ is at least 100, so it's obviously out of the question to search for them one-by-one.

A variant of this would be to try to factorise $\lambda.P_1\ldots P_r-x$ for some $r$ smaller than $N$, and some suitable multiple $\lambda$, but even though such choices of $\lambda$ and $P_1,\ldots,P_r$ statistically should exist, it's still no use trying to factorise such a number head-on.

But there is something that can be done to give an edge over the direct searches mentioned above, at least in the type 4 case.

# The $N$-tree algorithm

We'd like to find a way of fixing some of the $B_j$s at good values (like $P_j-1$) while also having the freedom to vary the $D_i$ choice without affecting the fixed $B_j$. It's not enough to find a single example of $B_1=P_1-1, B_2=P_2-1$ if there were no way of varying the other $B_j$ other than rerunning the process that found the fixed $B_j$. So the aim here is to find a collection of possible solutions that all have favourable $B_j$s for some fixed set of $j$, and then to try each one to see which gives the highest total $B_j$ for the non-fixed $j$s.

Let's stick with type 4 ($M=100$, $N=10000$, $\max(K)=1000000$) because more can be done in this case.

Assume $K\ge1000$, which is usually true, and reduce $K$ to $(100P_1)^{1/4}$ (about 600). (The exact number you should use here depends on how fast the program is going to run.) Fix $D_{17}, D_{18}, \ldots, D_N$ to some random values up to $K$. Let $R=\prod_{i=17}^N(A_i+D_i)$. We are going to use the birthday method. Make a list of $(A_1+D_1)(A_2+D_2)\mod P_1$ for $0\le D_1,D_2\lt K$, and a second list of $-(R(A_3+D_3)(A_4+D_4))^{-1}\mod P_1$ for $0\le D_3,D_4\lt K$. The lists contain $K^2$ elements each, and statistically you expect about $K^4/P_1=100$ of them to agree modulo $P_1$. These agreements can be found in $O(K^2)$ or so time, and each such pair corresponds to the relationship $(A_1+D_1)(A_2+D_2)(A_3+D_3)(A_4+D_4)R=-1\mod P_1$. Do the same for $D_5, D_6, D_7, D_8$ except that instead of $-R$, just use $1$. So we end up with a hundred or so quadruples such that $(A_5+D_5)(A_6+D_6)(A_7+D_7)(A_8+D_8)=1\mod P_1$. Similarly we get can 100 quadruples $D_9, D_{10}, D_{11}, D_{12}$ such that $(A_9+D_9)(A_{10}+D_{10})(A_{11}+D_{11})(A_{12}+D_{12})=1\mod P_1$ and 100 quadruples such that $(A_{13}+D_{13})(A_{14}+D_{14})(A_{15}+D_{15})(A_{16}+D_{16})=1\mod P_1$.

Taken together, we have $100^4=10^8$ combinations of $D_1, \ldots, D_{16}$ such that $\prod_{i=1}^N(A_i+D_i)=-1\mod P_1$, and these can be checked quickly, in the manner of the "second attempt" above, using only $M$ modular multiplications to check each instance (you precalculate the product of each quadruple modulo $P_1$).

This has given us a "free" maximum (meaning one of the $B_j$s is guaranteed to take its maximum value), and so pushes out the distribution, resulting in about 43 bm/s for type 4, compared with the previous best of 28 bm/s. Of course it's not quite free because there is a setup time, but using the above parameters this is only about 5% of the total time, so there is plenty of time to find an extreme point of the remaining distribution.

This method can be pushed further, starting from $4^3=64$ lists instead of $4^2=16$, and using two stages of "birthday-refinement" instead of one, which ends up fixing $B_1=-1\mod P_1$ and $B_2=-1\mod P_2$. Unfortunately, the setup time, though it just about fits within the allotted time limit, doesn't allow enough search time for the second phase, so it's not worth fixing more than one $B_j$ using this method.

This was as far as I got in the competition, and this is my best answer which implements a single fixed $B_j$.

But after the competition ended, I got to thinking that it should be possible to do better... and of course it is. The trouble with the above method is that the lists of allowable $D_i$ need to get quite big to overcome the "cutdown" factor of $P_j$, and if you want to do several stages to fix several $B_j$, the setup costs will be prohibitive.

Fortunately for many $P_j$, $P_j-1$ will split into several factors so the multiplicative group, which is cyclic of order $P_j-1$, will admit a series of subgroups. This potentially allows us to "cutdown" by smaller factors at each stage in the tree-like refinement process, which means we need fewer items in the list of allowable values at each stage, so we have a chance of making the setup phase run in a reasonable time while fixing more $B_j$s.

For example, if we are fixing with respect to $P=1400000023$, then $P-1=2.3.3.41.263.7213$, so one stage might be to fix with respect to the factor 7213. This means we have two lists, $L_1$, $L_2$ of numbers modulo $P_1$ and a target product, $t$. We aim to make a single list of pairs, $(x_1,x_2)$, from $L_1\times L_2$ such that $x_1 x_2 t^{-1}$ is in the index 7213 subgroup of the multiplicative group. We can do this using the birthday method by working in the quotient group: make a list of $(t.x_1^{-1})^r$ and $x_2^r$ where $r=(P_1-1)/7213$. Elements in common from both lists must satisfy $(x_1 x_2)^r = t^r$ which is what we want. Next stage would be to combine pairs of lists where the $r^\mathrm{th}$ powers are constant over each list to make a new list whose $s^\mathrm{th}$ powers are constant, where $s$ divides $r$, e.g., $s=(P_1-1)/(7213.3.263)$. In other words we're fixing things in increasingly big quotient groups, until you end up with the whole group. Eventually at the last stage you end up with a few lists whose values are constant modulo a set of primes.

(It would be simpler to work additively, but it's probably going to be too expensive to take discrete logs modulo each $P_j$ we are fixing with respect to, which we'd need to do it for each $i$ and each choice of $D_i$, so we have to stay in the multiplicative group.)

A good configuration (for the time limits for this problem) is something like: at the $i^\mathrm{th}$ stage ($i=0,\ldots,11)$, you have $2^{13-i}$ lists each of size roughly $40.2^i$, except at $i=11$, the final stage, where there are four lists of size about 150. 150 is chosen so that $150^4\approx5\times10^8$ is a good number of candidates to test in the second phase. (That is simplifying slightly. If $K$ is small, then you need to build up the sizes first before cutting down.)

This is very similar to Wagner's algorithm which uses a tree of lists of binary vectors to choose one element from each top-level list so that the choices XOR to zero. (Though he uses equal size lists at all levels, which is not optimal as it doesn't balance the work.)

This is an example implementation. It tries to use this method for type 4, and for some values of $K$ in types 2 and 3.

The program is a bit fiddly because it has to choose a suitable set of primes to fix with respect to, put the factors in order to form a plan of action in such a way that it won't run out of list elements on the way, and do this robustly on-the-fly.

In the type 4 case it usually manages to fix $B_j$ for four primes. This isn't a lot, but it's enough to give it a big advantage over not doing this, because moving out a bit in the extreme tail of a normal distribution makes a big difference to the probability. In terms of the benchmarks per second mentioned above, the "unfixed" method gave about 28 bm/s, the single-fixed method about 42 bm/s, and this method about 200 bm/s. (It's possible the bm/s measure is slightly underestimating the benefit of this method, since the further out in the tail of the distribution you go, the bigger the improvement.)

2024
accept rate: 25%

1

Very, VERY DECENT!!

I already gave the karma reward, and I think the community will seriously acknowledge this effort of yours!!

(04 May '18, 13:56)
1

I really dont know how do i thank you....,thank you soo much!!!,very few professionals actually share their ideas in such a clear way..!!

(04 May '18, 14:02)
2

Thank you - I am glad you like it!

(06 May '18, 02:12)
1

Wow!! @alexthelemon, thanks for your explanation!

We will try change challenges in two ways: multiple tests for the stabilization of test data in the input now. In JUNE we'll try to use some kind of provisional tests.

(06 May '18, 03:34) 6★
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×858
×311
×172
×98
×49
×46
×2

question asked: 16 Apr '18, 16:28

question was seen: 1,387 times

last updated: 06 May '18, 03:34