FOURSQ - Editorial
<p><a href="https://www.codechef.com/problems/FOURSQ">Practice</a> <br>
<a href="https://www.codechef.com/JAN17/problems/FOURSQ">Contest</a> </p>
<p><strong>Author:</strong> <a href="https://www.codechef.com/users/sidhant007">Sidhant Bansal</a> <br>
<strong>Tester:</strong> <a href="https://www.codechef.com/users/iscsi">Istvan Nagy</a><br>
<strong>Editorialist:</strong> <a href="https://www.codechef.com/users/mgch">Misha Chorniy</a></p>
<h1>Difficulty:</h1>
<p>Medium</p>
<h1>Pre-Requisites:</h1>
<p>segment tree or related data structure</p>
<h1>Problem Statement</h1>
<p>You are given array $A$ consisting $N$ elements. All the elements in the input are integers. You must be able to make two queries, first one - change the value of the element, second one - is to
present the product of elements with indices from l to r by modulo P in the sum of the squares of four non-negative integers.
<h1>Explanation</h1>
<p>We need to make some precalculation, for each number between 0 and 1000000 know to calculate his decomposition in the sum of four squares. The key idea in this problem, if we can decompose number $A$ in the sum of four squares and $B$ in the sum of four squares, then $A*B$ <a href="https://www.alpertron.com.ar/4SQUARES.HTM">can be decomposed</a> in the sum of four squares. </p>
<p>$A*B$ = $(x_{1}^2+x_{2}^2+x_{3}^2+x_{4}^2)*(y_{1}^2+y_{2}^2+y_{3}^2+y_{4}^2)$ = </p>
<p>$(x_{1}*y_{1}+x_{2}*y_{2}+x_{3}*y_{3}+x_{4}*y_{4})^2 + $</p>
<p>$(x_{1}*y_{2}-x_{2}*y_{1}+x_{3}*y_{4}-x_{4}*y_{3})^2 + $</p>
<p>$(x_{1}*y_{3}-x_{2}*y_{4}-x_{3}*y_{1}+x_{4}*y_{2})^2 + $</p>
<p>$(x_{1}*y_{4}-x_{4}*y_{1}+x_{2}*y_{3}-x_{2}*y_{3})^2 $</p>
The second observation is <a href="https://en.wikipedia.org/wiki/Lagrange's_four-square_theorem">Lagrange's four-square theorem</a>, it's well-known fact in math. Every number can be represented in the sum of four squares, from here there are no $-1$ answers in the problem.
</p>
<h1>Subtask 1</h1>
<p>$P$ is less or equal than $1000$. Let's make precalculation in a next way, iterate with four cycles, where each number less than $sqrt(1000)<32$ </p>
<code>
for i = 0..32
for j = i..32
for k = j..32
for l = k..32
s[i*i+j*j+k*k+l*l]=(i,j,k,l) //t is the array of tuples from 4 numbers
</code>
<p>When we know this thing, numbers can be multiplied and values of the sum of 4 square squares can be recalculated. Of course, all operations must be processed by modulo $P$. After that queries can be simulated, the first one in $O(1)$ time, the second one in $O(N)$ time. Totally, it takes $O(Q*N+32^4)$ time.</p>
<h1>Subtask 2</h1>
<p> Precomputation with four for's as in Subtask 1 get TL for $P=10^6$. How to speed up this one, let's make next thing, find all numbers which can be represented as the sum of 3 square numbers.</p>
<code>
for i=0..1000
if i*i>1000000
break
for j=i..1000
if i*i+j*j>1000000
break
for k=j..1000
if i*i+j*j+k*k>1000000
break
t[i*i+j*j+k*k]=(i,j,k) //t is the array of tuples from 3 numbers
s[i*i+j*j+k*k]=(i,j,k,0)
</code>
<p>We will find almost $85$% of representions after this preprocessing. How to find the rest:
<code>
for i=0..1000000
if s[i] != null
continue
for j=1..1000
if i-j*j<0
i - j * j<0
break
if z[i-j*j](3)==0
s[i]=(z[i-j*j](0),z[i-j*j](1),z[i-j*j](2),j)
z [i-j * j][3]==0
s[i]=(z[i-j * j][0],z[i-j * j][1],z[i-j * j][2],j)
break
</code>
</p>
<p>Let's estimate total time of work of these two fragments of code, first fragment has three for's each of those to 1000, but $0<=i<=j<=k<=1000$, number of possible triplets (i, j, k) the same thing as $1000 \choose 3$. The second fragment works in time which equal number of numbers which can't be represented as a sum of 3 squares multiply by 1000 less than $2*10^5*1000$ = $2*10^8$, summary both fragments will process in time less than $5*10^8$. </p>
<p>Now we have all decompositions for numbers in the range between $0$ and $10^6$, but we don't know how to speed up processing of the 2nd query. A function of representation in a sum of 4 square is associative and multiplicative, we can use <a href = "https://www.quora.com/What-are-some-good-tutorials-on-segment-trees">segment tree</a> for solving this problem. In each node $v$ which assigned with interval $l..r$ storing decomposition of the product $A_{l}*A_{l+1}*...*A_{r}$ in the sum of 4 squares by modulo $P$. </p>
<p> Now both queries can be done with complexity $O(log N)$ </p>
<h1>Subtask 3</h1>
<p>The difference of a subtask 3 from a subtask 2 that $P$ can be up to $10^{12}$. It does problems because 64-bit integers type can store numbers up to $1.8 * 10^{19}$, how to cope with a multiplication of two numbers which can't be stored in the 64-bit integer type, there are several ways:</p>
<h3>First way: </h3>
<code>
mult(a, b, p)
if b == 0
return 0
if b % 2 == 0
a2=mult(a,b/2,p) // Finding a * (b / 2), a * b = a * (b / 2) + a * (b / 2)
return (a2+a2) mod p // (a+...+a)+(a+...+a)
return (a+mult(a,b-1, p)) mod p //a * b = a * (b - 1) + a
</code>
<p> </p>
<p>Complexity of this algorithm is O(log N), unfortunately, this algorithm can't pass TL</p>
<h3>Second way:</h3>
<p>In C++ exists types like __int128, __int128_t, Python and Java has libraries for working with Long Arithmetics, but all of it is very slow to get AC.
<h3>Third way:</h3>
Divide multipliers into two parts $X=A*2^{20}+B$, $0<=A, B<2^{20}$
<code>
mult(a, b, p)
a0=a % (2^20)
a1=a / (2^20)
b0=b % (2^20)
b1=b / (2^20)
//(a1*(2^20)+a0)*(b1*(2^20)+b0)=a1*b1*(2^40)+(a1*b0+b1*a0)*(2^20)+(a0*b0)
return ((a1*b1)%p*(2^20)%p*(2^20)+(a1*b0+a0*b1)%p*2^20+a0*b0)%p
</code>
If change % and / into bit operations, correct solution must get AC.
<h3>Fourth way:</h3>
To calcuate product in big real types(C++ version):
<code>
//a * b mod p = a * b - [a * b / p] * p
long long mul(long long a, long long b, long long p) {
long long k = (long long)((long double)a * b / p + 0.5);
return a * b - k * p;
}
</code>
<p>This optimization for multiplying numbers up to $10^{12}$ modulo $10^{12}$ is fastest.</p>
<p>The overall time complexity of this approach is $O(N log N)$ but with huge constant.</p>
<h1>Solution:</h1>
<p>Setter's solution can be found <a href="http://www.codechef.com/download/Solutions/JAN17/Setter/FOURSQ.cpp">here</a> <br>
Tester's solution can be found <a href="http://www.codechef.com/download/Solutions/JAN17/Tester/FOURSQ.cpp">here</a></p>
<p><strong>Please feel free to post comments if anything is not clear to you.</strong></p>