REALSET - Editorial

dec13
editorial
fft
hard
maths

#1

PROBLEM LINK:

Practice
Contest

Author: Iaroslav Tverdokhlib
Tester: Mahbub
Editorialist: Jingbo Shang

DIFFICULTY:

Hard

PREREQUISITES:

Circulant Matrix

PROBLEM:

Given a Circulant Matrix, determine whether it is full rank (i.e. non-zero determinant, or the null space is trivial).

EXPLANATION:

First of all, we need to know the matrix stated in the problem is a very special type of matrix called Circulant Matrix. There are a lot of properties about the determinant and rank of Circulant Matrix can be found in Wikipedia.

Let’s summarize the properties looks relevant to this problem:

alt text

where, f(x) = A0 + A1 * x + A2 * x^2 + … An * x^n.

To avoid the precision problems, we can choose some prime and one of its primitive root instead of the root of unity (it may have chances (really rare, but exists) to fail, but we can choose some different primes to check, e.g. 3 different primes). Also, it will better to try Number Theoretic Transform instead of double-valued FFT in the following steps.

One straight-forward way is to calculate the determinant directly (of course, using FFT or NTT, both setter and tester solved the problem in this way). But a lot of contestants failed due to the precision problem when checking whether a double value is non-zero.

Another way is the check whether “d = 0” i.e. gcd(x^n - 1, f(x)) is some constant. Here we take the later one to solve this problem. Another related knowledge is about Cyclotomic Polynomial.

alt text

With this property, we know that x^n-1 can be factorized into some cyclotomic polynomials Phid(x) for every divisor d of n. Actually, for all n between 1 and 30000, the number of divisors, denoted as D, is at most 96 (when n = 27720). Therefore, we can enumerate all possible divisors d and check the cyclotomic polynomial Phid(x) can divide f(x).

Then, the only thing we need to do is the polynomial remainder operation. If we adopt brute-force to do this, it will take O((N-d) * d) time. Directly applying this method (with some breaks) may get TLE. But there is a interesting view to combine this method with the double-valued FFT (like this method can work well when N is a prime). This solution accepted with a further check using this method.

##The fastest and easiest solution (explanation provided by Anton Lunyov)
Let’s understand how the fastest and shortest solution to this problem works.
As pointed above we need to check that our polynomial is divisible by Phid(x) for some divisor d of n. Instead of direct calculation of Phid(x) we can do the following trick. Let d has prime divisors p1, …, pk. Now the crucial observation is the following:

Some polynomial P(x) is divisible by Phid(x) if and only if P(x) * (xd/p1 - 1) * … * (xd/pk - 1) is divisible by xd - 1.

It follows from explicit formula for cyclotomic polynomial (see second formula there). Indeed the rational function
(xd/p1 - 1) * … * (xd/pk - 1) / xd - 1
equals to Qd(x) / Phid(x) for some polynomial Qd(x) that is coprime to Phid(x) which yields the result.

Now all is simple. Polynomial computations modulo xd - 1 are straightforward: if P(x) = a0 + a1 * x + … + am * xm then P(x) mod (xd - 1) = (a0 + ad + a2d + … ) + (a1 + ad+1 + a2d+1 + … ) * x + … + (ad-1 + a2d-1 + … ) * xd-1 (for clarity see lines 49-50 here). Multiplication by xk - 1 could be made by simple loop and requires only additions and subtractions. And it could be done modulo xd - 1 at the same time (see lines 53-62 here).

So we could easily find the remainder of polynomial division P(x) * (xd/p1 - 1) * … * (xd/pk - 1) by xd - 1. If it’s zero for some d then the answer is YES, otherwise NO.

The complexity of each step is O(n + d * nu(d)), where nu(d) is the number of prime divisors of d. Here we have n because we need first to adjust P(x) modulo xd - 1. Thus, overall complexity is (n * tau(n) + n * log n), where tau(n) is the number of divisors of n. Here n * log n is some estimate of the sum of d * nu(d) over divisors o n (I believe it is so :)). Hence we get super fast and simple solution that requires only additions and subtractions of polynomial coefficients.

Any different ideas and solutions are welcome.

AUTHOR’S AND TESTER’S SOLUTIONS:

Author’s solution can be found here.

Tester’s solution can be found here.


#2

I did the same thing.Find the fft of the n real inputs
ie c0,c1,c2,…,c(n-1) as mentioned in the circulant determinant link.
.Now check for each output

bool flag=0;
for(int i=0;i<N;i++) 
 {if(floor(abs(out*))==0)
  {printf("YES

");
flag=1;
break;}
}
if(!flag)
printf("NO
");
Here’s my code

http://www.codechef.com/viewsolution/3090877

Is my fft implementation incorrect?(Radix2 Cooley Tukey)


#3

I initially implemented FFT on real numbers (double, then long double), but I always had precision problems (actually, to be honest, since I was too lazy to implement an FFT algorithm working for any value of N, e.g. N being prime, I used a freely available implementation of Bluestein’s FFT algorithm). But I had to implement something very different in order to have my solution accepted. If you think of the values A(i) as coefficients of a polynomial (A(0) + A(1) * x + A(2) * x^2 + …), then the answer is YES only if this polynomial is divisible by a cyclotomic polynomial Phi_D(X) for some divisor D of N. My implementation of this idea is actually straight-forward: I computed all the cyclotomic polynomials for divisors of N and I checked divisibility with each of them. The important thing here is that cyclotomic polynomials have integer coefficients, so there is no need for using real numbers (thus, any possible precision errors are avoided).


#4

This was a frustrating problem. I think the main issue was I had a lot of precision issues, but it was frustrating because I wasn’t sure how to modify it to not use doubles (and some accepted solutions even used doubles). During the contest, I thought the fact that the solutions had to be integral made my algorithm wrong, so I wasn’t able to come up with an easy fix, which lead to me ultimately giving up on this problem. It was definitely a cool problem.

I just have some quick followup questions though. Does the fact that B has to be integral matter? How difficult is it to actually reconstruct a sequence B that works (if one exists)?


#5

I’d like to share my journey with this problem (apologies but it’s very long).

During day 1, I started by trying to solve some easy ones first, and eventually found this one with no accepted solutions yet. Since the contest had just begun, I tried not to think too much: the examples given were [1,1,1] and [2,-4,1,1], both of which involve the vector [1,1,1…] of all ones, so I thought the answer must somehow always involve such a vector. I tried to find a few more cases (such as [1,2,3,1,2,3] being annihilated by [1,0,0,-1,0,0]), and came up with my first (naive) solution. Here it is written in pseudo-python (link to my submission here):

def is_beautiful(a[0..n-1]):
	for every divisor d of n:
		w = ((array of length d))
		for i from 0 to d - 1:
			w* = a* + a[i + d] + a[i + 2*d] + ...
		if check(w):
			return True
	return False

# this is a subroutine
def check(a[0..n-1]):
    for every divisor d of n:
    	# check if sums of subarrays of distance d are all zero
        all_zero = True
        for i from 0 to d - 1:
            s = a* + a[i + d] + a[i + 2*d] + ...
            if s is not 0:
            	all_zero = False
            	break
    	if all_zero:
    	    return True
    for every proper divisor d of n:
    	# check if subarrays of distance d have all equal elements
		all_eq = True
		for i from d to n - 1:
			if v* != v[i - d]:
				all_eq = False
				break
    	if all_eq:
    	    return True
	return False

This is an O(n log2 n) algorithm. It somehow passed the initial dataset of the problem, even though it’s wrong (the first counter-example I discovered is [1,-1,1,0,0,0], which can be annihilated by [1,1,0,-1,-1,0]). Luckily the dataset was fixed immediately (and the solution then got WA).

After actually thinking about the problem, it became clear that we needed to determine the singularity of a certain special matrix, each row of which is a single right-rotation of the previous row. After a brief search, I discovered that such a matrix is actually called a circulant matrix, and the Wikipedia page conveniently provided a few ways to determine singularity.

It was clear that if the matrix A is nonsingular, then the only solution to Ax = 0 is the vector x = 0, and if A is singular, then Ax = 0 has a nonzero solution x whose elements are rational, and hence an integral solution (using a proper scaling factor). That there exists a nonzero x whose elements are rational can be confirmed by doing Gaussian elimination with total pivoting on Ax = 0 until we fail to find a pivot (this will happen because A is singular), in which case we can assign the remaining variables 1, and solve for the others. It is clear that all intermediate and final values are rational.

The first property I considered was the ‘Rank’. Wikipedia says that the rank is n - D, where D is the degree of gcd(a(x), xn-1) (and a(x) = a0 + a1·x + a2·x2 + … + an-1·xn-1). Now, we only needed to check whether D is zero or not, i.e., gcd(a(x), xn-1) is a constant or not. I actually found a theoretically good one to calculate polynomial gcd, running in O(n log2 n), based on the concept of “Half-gcd”. This relies on using FFT for polynomial multiplication AND division (see this link for a description on how to reduce polynomial division and modulo to multiplication). However, after playing around with it a bit and implementing it I discovered that there are far too many multiplications and divisions for this approach to pass the time limit (it started getting slow even at n around 10000), so I scratched this approach.

Next, I factorized xn-1 into prime polynomials, which are simply the cyclotomic polynomials Φn(x) (Φn(x) is the monic polynomial whose roots are the primitive nth roots of unity). xn-1 is simply the product of Φd(x) for every divisor d of n. Since the cyclotomic polynomials are prime, D is nonzero if and only if one of those Φd(x) divides a(x).

This means I only had to do polynomial modulo operation for every Φd(x). Among all n in [1,30000], 27720 has the largest number of divisors, at 96. This means that this solution runs in about 96·O(n log n) time. I was very excited with this solution because I already knew how to code all the parts, unlike the HGCD approach, so I spent some time coding it, but after submitting, the verdict was TLE. Clearly the constant is still too high, and this approach only works well if divisor_count(n) isn’t too large.

By the way, whenever I implement FFT, I always do it modulo a large prime p which has a primitive 2kth root of unity for a large enough k (say p=2013265921 where 440564289 is a primitive 227th root of unity). This allows me to use 440564289 as my root of unity instead of a complex irrational, and thus using integers entirely and avoiding precision errors.

Of course, this means that the coefficients will be reduced mod p. However, for this problem, I simply assume that if A(x) divides B(x) for some polynomials A and B in Zp[x], then A(x) also divides B(x) in Z[x]. This has a tiny chance of failing which I assumed would not happen on the judge’s data.

So I left the ‘Rank’ approach and considered the ‘determinant’ next. We needed to know if the determinant is zero or not. According to the Wikipedia page again, the determinant of a circulant matrix is the product of the evaluation of a(x) for all nth roots of unity. In other words, we needed to find the DFT of the sequence a0, a1, …, an-1, and check whether any of the values of the resulting transform is zero. The problem is that the normal FFT algorithm only works for power-of-2 sizes, so we have to find a way to calculate the DFT of an exact size.

An issue that came up is that we needed to have a primitive nth root of unity in our modulus, so p=2013265921 wouldn’t work most of the time. My solution is to simply write a list of large primes such that for every n in the range [1…30000], there is a prime p in the list where p ≡ 1 (mod n). To meet the 50000 bytes program limit I had to compress the list and decompress it at the start of execution.

The first thing I thought of is that the normal FFT algorithm can be extended. In FFT, we split into two parts of size n/2 and then combine in O(n) time, but it is possible to split into p parts, each of size n/p, for any prime p dividing n, and combine in time O(p·n). If we continue to do this for all prime factors of n (considering multiplicity), we will eventually be able to compute the DFT!

The only problem is that p can be large. If n is composite, then we can always find a p ≤ √n ≤ 173, and O(p·n) will be waitable. But if n is prime, then O(p·n) = O(n2) which is clearly too slow. So I had to think more of how to solve that part, but for now this approach works well if n doesn’t have large enough prime factors (I tried implementing this first and submitting, and the judge indeed gave the expected TLE).

At this point I took a dinner break, and while walking to the store I thought of a cool idea. Why not combine my two previous approaches? I figured that if n has a large prime factor, it cannot have that many divisors, so my first algorithm would work well! I became very excited with this idea and I tried implementing it immediately (after eating), having only needed to do a few copy-pasta. Sure enough, the solution didn’t get TLE! Unfortunately, it got WA.

Of course it turned out later that it only got WA because there were some issues with the judge data. After rejudging, this solution got accepted (link here). However I didn’t know that it was correct at the time, so I suspected that working modulo primes was bad after all.

At this point I felt that solving DFT problem for prime sizes was the best approach, so after a bit of research I found two algorithms, Rader’s algorithm and Bluestein’s algorithm. Both solutions are amazingly simple and clever, and work in O(n log n). Bluestein’s algorithm has a bonus that it works for any size, not just prime sizes, but for some random reason I decided to use Rader’s algorithm, perhaps because I already have code for finding a generator for a given prime p.

There were a few details to consider, but I was able to finally implement it (link here). However the judge still gave WA (because of judge data issues). At this point I concluded that there must be something wrong with the data, or the data is too clever that it somehow finds a bad input to my essentially random list of primes (out of desperation I even tried to submit a double-precision version of FFT, which of course got WA)! Since I felt the former to be more likely, I simply gave up and waited until perhaps things got resolved. Sure enough, I later found @mugurelionut being able to solve it, followed by @djdolls, with no WAs in between! Since I found this rather unlikely, I tried to submit my latest solution and sure enough it got AC. I felt a lot of relief afterwards, knowing that I can finally move on with my life!

It was indeed very frustrating to get many WAs due to the issues with the judge data, but I think there has been some positive side effect on me because I was able to learn a lot of new things. So thank you @KADR for proposing the problem, and I hope you handle the input/output files correctly next time!


#6

Just a note when multiplying two polynomials quickly.

Assume we want to multiply f(x) and g(x) exactly, using FFT. Using double-precision and using e2πi/n as the primitive nth root of unity is too risky because precision errors are very likely to come up.

An alternative is to choose a prime p with a modular 2kth root of unity for a large enough k (i.e. 2k > deg f + deg g) and just calculate f(x)·g(x) (mod p), but instead of using e2πi/n, we use a modular nth root of unity mod p. This avoids precision errors, but an obvious problem is that the coefficients of the product will be reduced mod p.

But let’s say we know that any coefficient of f(x) can have an absolute value of at most F, and any coefficient of g(x) can have an absolute value of at most G. Then we are sure that any coefficient of f(x)·g(x) can have an absolute value of at most F·G·(min(deg f, deg g)+1) (let’s call this quantity M).

So a strategy to calculate f(x)·g(x) exactly is:

  1. Choose a series of primes p1, p2, …, pl such that they all have a 2kth root of unity for a large enough k. Choose enough primes so that p1·p2·…·pl > 2M.
  2. Find f(x)·g(x) mod p1, f(x)·g(x) mod p2, …, until f(x)·g(x) mod pl.
  3. Use the Chinese remainder theorem to reconstruct f(x)·g(x) exactly. Note that you should choose the coefficient with the least absolute value, so you should choose the negative value if necessary, if it has a lower absolute value.

Note that since we have p1·p2·…·pl > 2M, any number has at most one representative modulo p1·p2·…·pl having absolute value ≤ M. Hence, the constructed coefficients are guaranteed to be correct.

If in case you need to calculate f(x)·g(x) modulo some prime P, but you don’t have a control over what P is, then most likely there wouldn’t be a 2kth root of unity mod P, so FFT mod P wouldn’t work. A viable strategy is to calculate f(x)·g(x) exactly first as above, and then reduce it mod P afterwards (I actually used this trick for REALSET, somewhere in this submission. MOD0 and MOD1 are the two primes I chose).


#7

My solution check whether polynomial is divided by Cyclotomic Polynomial order m where m is divisor of n. We just need to check that P(x) * Product[x^(m / d) - 1] mod (x ^ m - 1) == 0, where d is prime divisor of m.


#8

I’ve updated editorial with the detailed explanation of the fastest solution.
The first thing I’ve made after the contest was to understand this solution.
Now I see that I should be more lazy :slight_smile:
It is to better to think more before proceeding to complex algorithms like fast polynomial division.
I’ve spent 10 hours on Sunday to optimize it well enough.
Actually test data are weak and I could easily fail my solution by TLE :stuck_out_tongue:
At some point I even implemented half-GCD based O(n * log2 n) algorithm for calculation of GCD of two polynomials
(see http://cs.nyu.edu/~yap/book/alge/ftpSite/l2.ps.gz for an amazing explanation) but as kevinsogo I gave up with it as well.
The final optimization that allows me to pass is caching DFT values. Since we need to divide P(x) by several polynomials we will calculate its DFT several times. So I saved it and reuse afterwards.


#9

Maybe it is sucked by the precision (exceeding or some losses). The general idea looks fine. You can have a look at accepted solutions and find the differences. :slight_smile:


#10

@shanjingbo
Could you give me the failing test case and provide a detailed FFT implementation?
How to implement it to maximize precision.What about using Number Theoretic Transform? Read it somewhere that there won’t be precision error there since its modulo prime.
Could you provide a C++ implementation for that.


#11

The key point is to use discrete FFT, and using Chinese Remainder Theorem (CRT). I have no codes now. And I think providing cases is not a good idea. Sorry.


#12

I also implemented Bluestein’s algorithm, also using long double and also had trouble with precision (running time 19 seconds, WA). This problem was probably aimed at failing any common FFT.


#13

@jaskaran_1: I looked at your implementation and your algorithm is not correct. You can only use the Radix2 Cooley Tukey FFT algorithm when N is a power of 2. When N is odd you split the array into two slices of unequal sizes. You probably make them equal by adding an extra zero coefficient in the smaller slice, but that is incorrect and the end result is not the FFT for the initial array of N values. There are only a few algorithms which work for any value of N (N being prime is usually a tough case): Bluestein’s algorithm is one of them.


#14

if the polynomial is divisible by a cyclotomic polynomial Phi_D(X) for some divisor D of N.Could you explain that?


#15

@jaskaran_1: See Wikipedia for the definition and properties of cyclotomic polynomials: http://en.wikipedia.org/wiki/Cyclotomic_polynomial


#16

I don’t mean to be rude, but this is not what can be expected for an editorial for a HARD problem. There are many concepts to be explained, many difficulties around (how you come to fast discrete fourier transform, how you use the Bluestein to handle cases where N is not a power of 2, how you handle floating point numbers precision, how you handle overflow in recursion, and so on), and you only provide a wikipedia link… I could have done the same. Telling that we can read other contestants code is not a solution, it’s a lazy way of not answering the question. I’m kind of disappointed.


#17

@mugurelionut Won’t it work like mergesort.I mean there also the array isn’t exactly halved.Why isn’t it correct?
Could you give me an example where it fails?


#18

@mugurelionut how about using Number Theoretic Transform?


#19

Nope. If there’s a solution in complex numbers, there’s one in integers as well. Proof (a bit short, there’s not much space in comments :D):

  1. a solution in complex numbers is a pair of solutions in reals that means we can take just the real parts of b-s

  2. if there is a solution (in reals now), we can obtain it by running GEM on the above mentioned circulant matrix, but GEM can be performed in rationals only, which gives us a solution in rational numbers

  3. if vector B is a solution, kB is also a solution; choose k to be the product of all denominators, and kB is a solution in integers


#20

I think the editorial is in the process of being written, so you shouldn’t flame just because you’re impatient…

I mean, I hope it is so…