PROBLEM LINK:
Practice
Contest: Division 1
Contest: Division 2
Setter: DOlaBMOon
Tester: Hussain
Editorialist: Taranpreet Singh
DIFFICULTY:
MediumHard
PREREQUISITES:
Fast Fourier Transformation for polynomial Multiplication and Square root Decomposition.
PROBLEM:
Given two array A and B of length N, Answer Q queries of the form.
 Given three integers x, y and l, print sum \sum^{l1}_0 A_{x+i}*B_{y+i}
SUPER QUICK EXPLANATION

Divide $B$ (or $A$, as per our choice) into blocks of length $L$. Calculate for every block, Reverse it and multiply with array $A$ (considering both $A$ and blocks as polynomial). This,way, we get dot product of $A[i\dots i+L1]$ and block. let this be stored in F[block number][Index in array A].

After doing this, answer queries, of length less than L using brute force, and remaining by jumping over blocks in $O(1)$ time, by adding for every block i, F*[position of pointer in $A$, if pointer in $B$ has reached start of block i]. Border blocks can be handled brutely.
EXPLANATION
Beware, long (and probably scary) editorial ahead. You have been warned.
First of all, Let us consider a problem Where we are given two arrays and we need to perform dot product.
I know you all are smart enough to do this in O(N) time, But let us say I am too weak to do it. I just know about polynomials.
I represent A and B each as polynomials, So, how can we find out dot product of A and B using some property of polynomials. We know that if C is product of A and B, then
C* = \sum A[j]*B[ij] \forall 0 \leq j \leq i
But this does not help us directly, Because we need to calculate \sum A[j]*B[j] \forall 1 \leq j \leq n, which Multiplies A* with B*, not B[nj+1]. Isn’t there anything we can do so that we can convert required sum to polynomial multiplication? Give it a try and if you still can’t figure it out, See Hidden box.
Click to view
We reverse one array. Suppose we reverse B, call it RB, and C = A*RB. This way, C[N+1] = \sum A**RB[Ni+1] = \sum A**B*
This way, If we can multiply polynomials efficiently, we can find dot product of two arrays in same time. We all know we can multiply polynomials efficiently using Fast Fourier Transformations in O(N*logN) time (I know I’m too slow than you guys, who did this in O(N)). I am not going into details of FFT, otherwise we would be leaving the territory of this problem, but you can read more about it from CLRS, here and here (Possibly the simplest explanation).
So, Returning to our original problem, how can we utilize above information? We can use Square root Decomposition.
The idea is to Divide one of the array (Suppose B) into blocks of length L, and for every block, we reverse the block, lets call it BL, and multiply it with A. If we call this product polynomial C, we can prove that C[i+L] = A[i\dots i+L1]* BL.
We see that after reversing, C[i+L] will constitute \sum A[i+j]*RB[L+1j] which is same as \sum A[i+j]*B[j] \forall 0 \leq j \leq L1. Let us store this results in an array, say PRE[Block Count][N].
I know above paragraph may sound incomprehensible to beginners with polynomials in CP, so read more about polynomials representations here as well as in CLRS book link mentioned above.
Let us assign PRE[block]* = C[i+L]. Don’t worry about Time complexity, which i have explained at the end.
Now, we can say that PRE*[j] represent dot product of Block i and A[j \dots j+L1].
If you have understood this, Give a try to answering queries, as we generally do using square root decomposition.
For queries where both endpoints of a query lies in same block, we calculate it in naive manner. Otherwise, Our query is of form, starting partial (or complete) block followed by a number of complete blocks and then a partial (or complete) ending block. For start and end block, we can calculate answer naively and using the PRE table, we can answer dot product queries for each block.
The only thing which I suspect, might confuse is, that for every block, we have N results stored in PRE array, which one to use.
We just need to observe that suppose we have a query x, and length. we see that every element in array A will have a corresponding element in B. we just need to find this corresponding element for starting position of every block. Give it a try (I know you are saying, why this guy doesn’t tell anything straight, but that’s how we all learn.) and as usual, refer our secret box if help needed.
Click to view
A[x] correspond to B[y] and we can conclude in similar manner, that A[x+i] correspond to B[y+i] \forall 0 \leq i \leq len1 So, assuming block start at position p, the corresponding element of B[p] is A[x+py]. See how??
Click to view
py give number of positions we moved forward in array B, so we add py to x, moving (py) steps forward from index x in array A.That’s all
So, this conclude our another long editorial. Oh wait, Time complexity analysis is still left.
TIME COMPLEXITY ANALYSIS
For Precomputation, we do FFT multiplication of arrays of size N and B for every block, giving O(C*((N+B)*log(N+B))) time if C means number of block and B is size of blocks.
For every query, We never make more that O(B) iterations for start and ending blocks and O(C) iterations for middle blocks, Thus, giving Time Complexity per query O(B+C) Since C = \lceil N/B \rceil. This become O(B+N/B).
Overall Complexity becomes O(N/B*((N+B)*log(N+B))+Q*(B+(N/B))).
Choice for B: Usually, choosing B around \sqrt N works, but in this particular problem, precomputation part involves FFT, which, apart from extra Log factor, includes a heavy constant, so, we might reduce Number of block, so reduce Number of FFTs performed. Setter uses 15*\sqrt N.
This way, Overall complexity Comes down to O(\sqrt N * (Q+N*logN)).
AUTHOR’S AND TESTER’S SOLUTIONS:
Setter’s solution
Tester’s solution
Feel free to Share your approach, If it differs. Suggestions are always welcomed.