### PROBLEM LINK:

**Author:** Devendra Aggarwal

**Tester:** Kevin Atienza

**Editorialist:** Kevin Atienza

### DIFFICULTY:

Medium

### PREREQUISITES:

sqrt decomposition, offline range sums

### PROBLEM:

Given an array [A_1,\ldots,A_N], all initially zeros. You need to perform U updates to the array, where each update asks you to increment the array at the indices ax+b for all integers x\ge 0. What is the resulting array?

### QUICK EXPLANATION:

Perform all updates where a > \sqrt{U} manually. Store all other updates (i.e. those with a \le \sqrt{U}).

Let’s fix an a \le \sqrt{U} and perform all updates with this a simultaneously. To do so:

- Compute [C_1,\ldots,C_N], where C_i is the number of updates (a,b) with b = i.
- Compute [D_1,\ldots,D_N] where D_i = C_i + D_{i-a} for i > a and D_i = C_i for i \le a.
- Finally, increment A_i with D_i for all i.

### EXPLANATION:

Obviously, brute-force will not work because the worst case complexity of each update is O(N), so the overall worst-case running time becomes O(UN).

The problem is a lot easier though if the updates all have a = 1. In that case, we’re only dealing with *range sum updates*, which are famously easy to solve with a *segment tree*. In fact, we don’t need a segment tree at all! A much simpler algorithm exists for the *offline* version of the problem, where it is assumed that all updates are known beforehand:

- Compute [C_1,\ldots,C_N], where C_i is the number of updates (1,b) with b = i. This can be done with a linear scan among all updates (1,b).
- Compute [D_1,\ldots,D_N], where D_i is the amount to increment to A_i. D_i is easily seen to be C_1 + C_2 + C_3 + \cdots + C_i, so D_i can easily be calculated using the recurrence D_i = C_i + D_{i-1} for i \ge 1 with base case D_i = 0 for i < 1.
- Finally, increment A_i with D_i for all i.

For example, suppose N = 11 and we want to process the queries (1,2), (1,5), (1,6), (1,8), (1,8) and (1,10). We will have:

One can check that D_i is indeed the value to be added to A_i for all i, and that D_i = C_i + D_{i-1} for i \ge 1.

Now, what about the updates a > 1? Well, we don’t know yet. But, we can in fact use a similar algorithm as above for the case a = 2: there are two types of updates of the form (2,b) — those with even b and and odd b — and we can use the algorithm above separately for the even and odd ones. And in fact, we can combine both the *even* and *odd* half into a single unified algorithm: Just replace the recurrence D_i = C_i + D_{i-1} with D_i = C_i + D_{i-2}. The running time is still O(N) overall.

Actually, we can perform a similar algorithm for all the other $a$s! Each particular a takes O(N) to process. So we now have an alternative algorithm: group the updates (a,b) according to a, then perform the algorithm above for every a. Unfortunately, this algorithm is also slow. Each time we perform the algorithm above we incur a running time of O(N), giving an overall running time of O(U + N^2), which is still too slow.

But what if we only perform the algorithm above for *some* $a$s? For updates with other $a$s, we can simply perform the update manually. Let’s say we perform the algorithm above only for *small* a, i.e. those that are \le s for some (still unknown) value s. Since we perform the algorithm above only s times, and the running time of all other updates is at worst O(N/s), this gives us an overall running time of O(UN/s + Ns). If we choose s as, say, \Theta(\sqrt{N}), then the overall algorithm becomes O((U + N)\sqrt{N}), which is fast enough to pass the time limit!

Keep in mind though that you have a choice of constant in the "\Theta notation" so practically it’s still up to you to choose s really. The optimal constant depends on your implementation and you can try to find it via trial-and-error / ternary search. We suggest choosing s not too far from \sqrt{N} though.

**Theoretically better parameter**

Notice above that we chose s as \Theta(\sqrt{N}) without really thinking about it that much. It’s just pure intuition. Here we’ll choose a better parameter to get a theoretically better running time.

The running time is O(UN/s + Ns), and we want to make the running time as small as possible. Since UN/s decreases and Ns decreases as s increases, it’s best to choose s that will make them equal to make their sum as small as possible (at least in terms of O notation). Thus, we want UN/s = Ns. Thus, in fact the better choice would be s = \Theta(\sqrt{U}), NOT \Theta(\sqrt{N}), and the running time we get O(U + N \sqrt{U}). Note that this running time is better than O((U + N)\sqrt{U}) in almost all cases!

Note that we said O(U + N \sqrt{U}) and not just O(N \sqrt{U}), because it’s possible that U > N^2, in which case iterating through all updates dominates the running time.

This doesn’t really matter that much in terms of the contest, but it’s interesting nonetheless.

### Time Complexity:

O(U + N\sqrt{U}) or O((U + N)\sqrt{N})