I did this in a slightly different way and passed in 2.32 seconds.
Note that for a number A \le 10^6 , there can be no more than one prime factor larger than 10^3. If there is a prime factor of any A_i which is larger than 10^3, I divided it by that factor. Now we are sure that all A_i 's consist of prime factors less than 10^3. There are only 168 such primes. For each of these, I calculate sum on path using LCA. Note that you need to calculate LCA only once and after that you need to calculate sum on path for every prime.
Okay, so we are done with all primes less than 10^3.
For primes less than 10^3, we will simply do Mo’s algorithm as in the editorial. But now, the add remove function is just O(1) as you just need to maintain count and not the complete factorization of the number.
So this ran in O(N\sqrt N + 168*N + NlogN) (with a good block size obviously!)
PS: It took me a huge number of submissions to get this AC. I was doing one mistake. Maybe some of you made the same mistake. I precomputed (L/BLOCK) for the query comparator and initialized only N indices of that array. I should have initialized 2*N of them. It took me about 70-80 submissions to figure this out as the verdict was always TLE. Affecting the ordering only affects the runtime, not the correctness of your code.
Great approach. No wonder you are in the fastest 30 submissions. Ig I was quite lucky to get it working under 20 submissions with inline add/del, optimized comparator, const block size, and most importantly limiting the mod operator (I use modular arithmetic template). My solution operates in 6.8sec.
Thanks a lot : )
Yeah actually I had never really implemented Mo before this. Even this is mostly the same as one I saw somewhere. But you see, this error is subtle, the reason being: I always thought that my code is slow (and never thought that sorting could cause Mo to get this slow) , when it was actually not. When I first submitted I was 100% sure of a WA or an AC. TLE on last case was wild. But then I thought I am overestimating performance maybe. And it took me a whole day to get AC and a few hours of the next day to understand that why this array was causing an issue and why writing l/BLOCK passed.
Notice that for second subtask, N \le1000. So you can precompute the answer for every (u, v) pair beforehand store them.
For this, we can use DFS once from every vertex, assuming it as u. Now in a single DFS, answer for every corresponding v can be found.
Answering every query thus becomes O(1).
For implementation details you can refer this