I did this in a slightly different way and passed in 2.32 seconds.

Note that for a number A \le 10^6 , there can be no more than one prime factor larger than 10^3. If there is a prime factor of any A_i which is larger than 10^3, I divided it by that factor. Now we are sure that all A_i 's consist of prime factors less than 10^3. There are only 168 such primes. For each of these, I calculate sum on path using LCA. Note that you need to calculate LCA only once and after that you need to calculate sum on path for every prime.

Okay, so we are done with all primes less than 10^3.

For primes less than 10^3, we will simply do Mo’s algorithm as in the editorial. But now, the add remove function is just O(1) as you just need to maintain count and not the complete factorization of the number.

So this ran in O(N\sqrt N + 168*N + NlogN) (with a good block size obviously!)

PS: It took me a huge number of submissions to get this AC. I was doing one mistake. Maybe some of you made the same mistake. I precomputed (L/BLOCK) for the query comparator and initialized only N indices of that array. I should have initialized 2*N of them. It took me about 70-80 submissions to figure this out as the verdict was always TLE. Affecting the ordering only affects the runtime, not the correctness of your code.

Great approach. No wonder you are in the fastest 30 submissions. Ig I was quite lucky to get it working under 20 submissions with inline add/del, optimized comparator, const block size, and most importantly limiting the mod operator (I use modular arithmetic template). My solution operates in 6.8sec.

Thanks a lot : )
Yeah actually I had never really implemented Mo before this. Even this is mostly the same as one I saw somewhere. But you see, this error is subtle, the reason being: I always thought that my code is slow (and never thought that sorting could cause Mo to get this slow) , when it was actually not. When I first submitted I was 100% sure of a WA or an AC. TLE on last case was wild. But then I thought I am overestimating performance maybe. And it took me a whole day to get AC and a few hours of the next day to understand that why this array was causing an issue and why writing l/BLOCK passed.

There was certainly some issue with the judge for this problem. The exact same code when submitted at different times gave running times as 4.75 and 6.58 during the contest. That’s a huge difference.

actually I always go through top 25 submissions just to learn from others code. and your’s runs in 2.32 so I rechecked the submissions and yours was just below it.

Basically the problem page has “Successfull Submissions” section (on right) where solutions are in increasing order of time taken. Each page has like 12 submissions.

Notice that for second subtask, N \le1000. So you can precompute the answer for every (u, v) pair beforehand store them.
For this, we can use DFS once from every vertex, assuming it as u. Now in a single DFS, answer for every corresponding v can be found.
Answering every query thus becomes O(1).
For implementation details you can refer this

I applied all the above things but still got tle on subtask 3. here is my code, can anybody tell me what’s the problem . Please take a look at it. https://www.codechef.com/viewsolution/31502577

Hey, Can anyone please tell me, Why i am getting TLE in even first subtask.
Since, I think the overall complexity will satisfy the constrains of subtask1.