PROBLEM LINKSDIFFICULTYHARD PREREQUISITESTree Data Structure, Suffix Array PROBLEMThere is a tree of N nodes numbered 1 through N. Let deg(i) be the number of nodes connected to node i. Count the number of different pairs (A, B) such that:
Let (P, Q) and (R, S) be two pairs. Suppose that the shortest path traversal from node P to node Q visits g cities u_{1}, u_{2}, ..., u_{g} (u_{1} = P; u_{g} = Q) and the shortest path traversal from node R to node S visits h cities v_{1}, v_{2}, ..., v_{h} (v_{1} = R; v_{h} = S). Then, (P, Q) and (R, S) will be considered different if and only if g ≠ h or there is at least one index i such that deg(u_{i}) ≠ deg(v_{i}). QUICK EXPLANATIONFor each node, consider the sequence of degress of the nodes that are visited on the shortest path traversal to node 1. Consider the sequences as strings over alphabet {1, 2, ..., N1}. Then we will have N strings. The problem can be reduced to finding the number of distinct prefixes from these N string. This can be solved efficiently using suffix array data structure. EXPLANATIONConsider each node. For each node i, let L_{i} be the number of nodes visited on the shortest path traversal from node i to node 1. Let u_{1}, u_{2}, ..., u_{Li} be the sequence of the visited nodes, where u_{1} = i and u_{Li} = 1. Let S_{i} = deg(u_{1}), deg(u_{2}), ..., deg(u_{Li}). Consider each S_{i} as a string over alphabet {1, 2, ..., N1}. Therefore, we will have N strings S_{1}, S_{2}, ..., S_{N}. Counting PrefixesWe can reduce the problem into finding the number of distinct nonempty prefixes from the N strings. This can be solved efficiently by first sorting the strings lexicographically. For example, suppose the input tree is a straight line of N=5 nodes, where node i and node i+1 are connected. Suppose the degrees of the nodes are deg(1) = 2, deg(2) = 5, deg(3) = 2, deg(4) = 5, deg(5) = 1. We will represent each string as sequence of digit characters without any spaces for convenience. Therefore,
S_{1} = 2 Note that the above example is actually invalid as we cannot have a tree with such configuration. It is manually created so that the algorithm is easier to explain. To solve the reduced problem, sort the strings lexicographically:
S_{5} = 15252 Now, we will start counting the number of different nonempty prefixes. We start with S_{5}. It has 5 prefixes: 1, 15, 152, 1525, and 15252. We continue to S_{1}. It has only 1 prefix: 2. We continue again to S_{3}. It has 3 prefixes. However, prefix 2 has been counted before. Therefore, we only find 2 new prefixes: 25 and 252. We continue once more to S_{2}. It has 2 prefixes: 5 and 52. None of them has been found before. Finally, we consider S_{4}. It has 4 prefixes. However, prefixes 5 and 52 have been counted before. Therefore, we only find 2 new prefixes: 525 and 5252. Therefore, the number of different prefixes of this configuration is 5 + 1 + 3 + 2 + 4= 15. We can conclude that the algorithm is as follows. // S[] = {S_{1}, S_{2}, ..., S_{N}} sort(S) res = S[1].length() for i = 2; i ≤ N; i++: res += S[i].length() res = lcp(S[i], S[i1]) where lcp(A, B) returns the length longest common prefix of A and B, i.e. the largest i such that the first i characters of A are equal to the first i characters of B. Sorting StringsThe unexplained part is how to sort the strings. Each string can have at most O(N) characters, so a naive sorting would require O(N^{2} lg N) time. We have to sort the strings with a method that is similar to the method used in suffix array. This method will use the fact that the nodes are arranged in a tree. First, we sort the strings based on their first 2^{0} = 1 character:
1: S_{5} = 15252 Note that equal strings must have equal ranks; for example, since S_{1} and S_{3} have the same first character, their ranks are equal. Then, we will sort the strings based on their first (at most) 2^{1} = 2 characters. We can use the previous result to do this. Note that each string is actually a suffix of the string 15252. Let T_{i} be the suffix that starts from the ith character. To compare two strings based on their first 2 characters, we can first compare the ranks of their first characters. If they are different, we have their comparison, else we compare the ranks of their second characters. The rank of the second character of a string can be retrieved in constant time. Suppose we want to know the rank of the second character of S_{4}. Since S_{4} = T_{2}, its second character is actually the first character of T_{5} = S_{3}. Therefore, the rank of the second character of S_{4} is the rank of S_{3} in the previous sorting. So, instead of sorting the strings we sort N pairs of two values: (the rank of the first character, the rank of the second character). If there is no second character, we can set the value to 0 so that it ranks higher before all characters. For the above strings, the pairs are:
((1, 3), 5) Note that we augment the pair to store the index of each string as well. So actually we sort N pairs of (pair(first character's rank, second character's rank), index). After the second sorting, the order of the strings becomes:
1: S_{5} = 15252 Next, we will sort the strings based on their first (at most) 2^{2} = 4 characters. For each string, we store a pair of (the rank of the first 2 characters, the rank of the next 2 characters) in a similar way. For example, the rank of the next 2 characters of S_{5} is actually the rank of S_{3} in the previous sorting.
1: S_{5} = 15252 Finally, sort the strings based on their first (at most) 2^{3} = 8 characters.
1: S_{5} = 15252 The complexity of one sorting reduces to only O(N lg N) since we need only O(1) time to compare two strings. There are O(lg N) steps, so this sorting part needs O(N lg^{2} N) time. Storing StringsAnother unexplained part is how to store the strings in the memory. Since there are at most O(N^{2}) characters, we cannot store them in the memory. Fortunately, we don't have to store all characters to implement the sorting method explained above. For each string, we only need to store the indices of the strings that are the suffixes starting with the 2^{p}th character for all 1 ≤ 2^{p} < N. Since the nodes are arranged in a tree, the jth character of S_{i} is the degree of the (j1)th parent of i. We can use this recurrence to precompute the jth parent of all nodes, where f[i][j] = the 2^{p}th parent of node i. In short, the 2^{p}th parent of node i is the 2^{p1}th parent of (the 2^{p1} parent of node i). We have to make the tree rooted at the node 1 and store the direct parent of each node in parent[] array. reset all values in f to 0 for i = 1; i ≤ N; i++: f[i][0] = parent[i] for j = 1; 2^{j} < N; j++: for i = 1; i ≤ N; i++: if f[i][j1] ≠ 0: f[i][j] = f[f[i][j1]][j1] Since there are only O(lg N) possible values for j, we only need O(N lg N) time and space for storing the strings. CodeHere is a pseudocode of the sorting method as described above. for i = 1; i ≤ N; i++: // we can safely set the ranks of the first character of each string // to be the degree of the corresponding node. rank[i][0] = deg(i) for j = 1; j < N; j++: for i = 1; i ≤ N; i++: a = rank[i][j1] if f[i][j1] ≠ 0: b = rank[f[i][j1]][j1] else: b = 0 data[i] = ((a, b), i) sort(data[1], data[2], ..., data[N]) rank[data[1].second] = 1 for i = 2; i ≤ N; i++: // if the current pair is equal to the previous pair, // the rank should be equal if i > 1 and data[i].first == data[i1].first: rank[data[i].second] = rank[data[i1].second] else: rank[data[i].second] = rank[data[i1].second] + 1 for i = 1; i ≤ N; i++: pos[rank[i][lg N]] = i The final order of the strings is stored in pos[] array. Longest Common PrefixFinally, the last unexplained part is how to compute the LCP of two strings in an efficient way. We can utilize the rank[][] table produced in the previous sorting algorithm. Suppose we want to compute the length of the LCP of S_{x} and S_{y}. We compute the largest p such that rank[x][p] == rank[y][p], i.e. the largest k such that the first 2^{p} characters of S_{x} are equal to the first 2^{p} characters of S_{y}. That means we have found a common prefix of length 2^{p}. The problem is then reduced to finding the length of the LCP of S_{f[x][p]}, S_{f[y][p]} and can be computed in a similar way. function lcp(S_{x}, S_{y}): res = 0 for k = lg N; k ≥ 0; k: if rank[x][k] == rank[y][k]: res += k x = f[x][k] y = f[y][k] return res ComplexityThe time complexity of the whole solution is O(N lg^{2}N). The space complexity is O(N lg N). SETTER'S SOLUTIONCan be found here. TESTER'S SOLUTIONCan be found here.
This question is marked "community wiki".
asked 11 Dec '12, 18:21

I tried using Ukkonen's suffix tree, but it required to much memory, so I took a slightly different approach that tries to factorize prefixes. The idea comes from making an indeterministic automaton to a deterministic one. First I convert the tree to a DAG and I create a fake 'root' node and connect it to all the nodes in the tree (this way, I can create substrings starting from any character). Next I travel the DAG and for each node, if more than one child has the same letter, I create a new node with that letter and connect it to all the childs of the nodes with the same letter. Once this is done, I have a DAG that each node, no two childs have the same letter. Since I don't have common prefixes I can run a DP on DAG to get the number of distinct paths. answered 11 Dec '12, 18:39

Virtually the same code from any accepted solution of http://www.codechef.com/APRIL12/problems/TSUBSTR will work here; that problem is identical (other than an extra part on top). answered 12 Dec '12, 01:54
Yes, you are right. I have noticed that during the contest, I have pointed that out in my AC code. Till the end of the contest I've thought that there is some linear (or fast nlogn) solution that uses the fact that the "letter" written in the vertices corresponds to it's number of neighbors.
(12 Dec '12, 02:40)
1
I know this is an old discussion, but I've just noticed now that the time complexity of the official solution is O(N * log^2(N)). That's only because at each step of the suffix array construction, an N*log(N) sorting algorithm is used. Instead, an O(N) sorting algorithm can be used there (just like in the standard suffix array construction algorithm), to get the overall time complexity to O(N * log(N)) (actually, that's what I did in my solution, but I guess the running time doesn't really show it).
(16 Mar '13, 04:27)

rank[data[1].second] = 1 this part is not clear bcz rank is a 2D array how it can access the rank with only 1 index and other part is
from the above pseudo code i think the rank array size is NN but it can be reduced with just only two columns to (N2) if i did this the how can i compute the LCP from the above pseudo code where the requirement of rank is N*[log2(N)+1]; can any explain it @admin and any other user who is intrested what is size of rank array i used and how i used answered 28 Jun '13, 09:18

Tester's program got Time Limit Exceed in the following case:
answered 27 Feb '17, 20:12
