A little clarification for PRIMEDST

This is regarding this problem and this editorial for it, and also this blog post on centroid decomposition.
I have a few questions regarding the above links.

  • In the quora post, N P \log N is around 10^9, so wouldn’t an \mathcal O (N P \log N) solution not pass the time limit since the constant factor will be quite big in this case?
  • In the editorial solution, suppose you have an almost linear chain given as input, where each node on the main chain has exactly one off-main neighbor which in turn has no other neighbors. Then you do not perform any more “shadow node” creation, and you perform the fft N/2 times, with polynomials of degree 1,2,3, \cdots N/2. This means that the complexity is \mathcal O(N^2 \log N) and not \mathcal O (N \log ^2 N).
  • My approach, which I’ve not coded out yet is to do a centroid decomposition first, then for each node have a “representative” polynomial which has coefficient of x^i as the number of nodes at distance i in its subtree and then perform necessary multiplications. Unfortunately even though its complexity seems to be \mathcal O (N \log ^2 N) it is very complicated to implement, especially since for nodes in the centroid decomposed tree having > 2 children, both the square of sum and sum of square of child polynomials is needed.
    Is there a more efficient centroid decomposition +/or fft solution than that in the quora post?

Can anyone clarify? Thanks.