×

# EXPTREE - Editorial

Author: Aleksandr Kulkov
Primary Tester: Misha Chorniy
Editorialist: Hussain Kara Fallah

Medium

# PREREQUISITES:

Combinatorics,Math,Number Theory

# PROBLEM:

Consider an ordered tree with N vertices. Your task is to calculate the expected value of the number of vertices having exactly one child in such tree assuming that it is uniformly chosen from the set of all ordered trees of size N.

Consider the answer to be a proper fraction P/Q, where gcd(P, Q) = 1. Then your task is to output two integers PQ-1 mod 109+7 and PQ-1 mod 109+9

# OBSERVATION 1:

The number of ordered trees consisting of n nodes is equal to the (n-1)th catalan number. Imagine the Depth-First-Search order of the nodes of the trees, entering each node corresponds to an open bracket ( , and finishing each node corresponds to a closed bracket ) . So if we represented the DFS order of a tree it will form a simple balanced bracket sequence.

Two different ordered trees must have different bracket representations. So we can say that the number of ordered trees consisting of n nodes is equal to the number of balanced bracket sequences consisting of (n-1) pairs of brackets. (n-1) because our tree is connected so we must fix the root (the first and the last bracket in the expression). If the subexpression (the expression excluding the first and last bracket) is balanced, that guarantees that we will never exit the root before the last bracket and our tree is connected.

It's well known that the number of balanced bracket sequences of n pairs of brackets is equal to the nth catalan number.

$C_n = \frac{(2n)!}{(n+1)!n!}$

Before reading the next observation you should be familiar with Linearity Of Expectation.

# OBSERVATION 2:

Let's consider all different (n-1) ordered trees, and think about applying this operation. Choosing an arbitrary node of a tree and linking it to a single child (nth node) and removing all edges between our chosen node and its children, then linking them to our new node. So our new node would serve as a single child of the chosen one.

A node having one child in a tree is equivalent to a pair of brackets, with a balanced expression between them and surrounded by a pair of brackets.

Observe that each node in a tree consisting of n nodes and having only one child can be formed by applying this operation to only one tree of (n-1) nodes. It's exactly the same as choosing an opened bracket and its corresponding closing bracket in a simple balanced bracket sequence framing the subexpression between by a pair of brackets (our new node).

Thus our nominator P would be

P = ( number of trees of n-1 nodes * (n-1) )

We multiplied by n-1 which denotes the number of ways to choose the vertex we decided to apply this operation on.

Q = ( number of trees of n nodes )

$answer = \frac{C_{n-2} * (n-1)}{C_{n-1}}$

According to the catalan number formula we wrote above, this can be reduced to:

$answer = \frac{n * (n-1)}{4n-6}$

This fraction can be reduced easily by calculating GCDs between the denominator and the nominator terms. We can calculate PQ-1 after calculating the modular inverse of Q considering each modulo.

# Note:

It's true that this solution is given without a formal proof. But the idea is quite simple and correct. Actually its correctness can be proved by intuition. In real life contests you won't have the internet to read formal proofs and papers or search for formulas for a sequence. This solution is not hard to get from scratch and worths trying.

# AUTHOR'S AND TESTER'S SOLUTIONS:

AUTHOR's solution: Will be found here
TESTER's solution: Will be found here
EDITORIALIST's solution: Will be found here

This question is marked "community wiki".

108932
accept rate: 0%

 1 After some hours of hard thinking, I came up with this explanation/proof for the calculation of p. Hope it is useful and to get feedback in case I'm mistaken. We want to count the number of vertices with one child in all trees of n + 1 vertices (n edges). Let's call this number p'. Imagine to lay out all the C_n trees and label each tree's vertices from 0 to n, top-to-bottom, left-to-right (the root is always 0). To find p', you could fix a tree, count the number of vertices with one child in this tree, and then sum over all the trees. This doesn't lead to an easy closed formula. Instead, fix a vertex v in [1..n] and count how many trees, among the C_n, are such that parent[v] has only one child v. Summing over all v's will give us p'. For each v, removing (parent[v], v) from the tree by the operation explained above, gives us a n-vertice tree. There are exactly C_{n-1} such trees and they must all appear in our original layout (of C_n trees). Furthermore, there is no double-counting: otherwise there would be v1, v2 in [1..n] from trees T1, T2 resp., such that removing (parent[v1], v1) and (parent[v2], v2) leads to the same tree, except for the labels. This means T1=T2 would appear twice in out original layout, which is impossible by construction. There are n vertices, each associated with C_{n-1} trees, therefore p' = n * C_{n-1}. For p, just let n = k - 1 and substitute. answered 22 Jul '17, 20:37 167●6 accept rate: 33%
 5 A good and comprehensive paper on the same http://www.cs.tau.ac.il/~nachumd/papers/Enumerations.pdf answered 17 Jul '17, 18:49 251●6 accept rate: 0%
 2 We don't need to cancel gcd, just taking $P$ and $Q$ modulo $10^9+7$ and $10^9+9$ is enough. answered 17 Jul '17, 15:57 1.1k●2●11 accept rate: 10% You mean $gcd()$ of any two numbers in this case will definitely $1$? (17 Jul '17, 16:09) True because while applying inverse condition is that a and m should be coprime ba-1(inverse)%m and if a is not a multiple of m multiples of a also cannot be multiples of m and hence they are coprime too. (17 Jul '17, 16:10) 1 @bansal1232 No gcd in this case will not at all be 1 (in all cases may be in some) but the thing is you don't need them to be 1 because 8/2%5=4 and 4/1%5=4. (17 Jul '17, 16:22) Sure you are right, but in description gcd(P,Q) = 1 so just being precise (18 Jul '17, 10:48)
 2 If one had found the following paper is that considered cheating? :) answered 17 Jul '17, 21:38 3★eugalt 282●7 accept rate: 4% 1 Under the rules of long contests, "it is alright to refer to tutorials, books and other materials, learn a concept and then apply the same to solve a problem during a contest." Hence it's fine to do research as long as you do it on your own and you don't actively discuss with other participants. (20 Jul '17, 19:41) hikarico5★ That's fair enough. Although in this particular case the paper starts with stating a few results, the formula required to solve the problem among them, and one would be able to use it without going through the rest of the paper to follow the derivation. (21 Jul '17, 03:35) eugalt3★ I know, it's kinda subjective. Of course, following the derivation is still best. But I don't think we can control the way the paper was written, and either way, there was a need to code/derive more concepts beyond the formula after all. And so, officially, reading a research paper is not cheating, but if you are still unsure, you can leave it to your conscience and ethics to decide. I hope that was clear :) (22 Jul '17, 21:17) hikarico5★
 1 Observe that each node in a tree consisting of n nodes and having only one child can be formed by applying this operation to only one tree of (n-1) nodes. Why is this true? answered 17 Jul '17, 17:14 21●2 accept rate: 0% I have added an extra small explanation (18 Jul '17, 09:28)
 1 "P = ( number of trees of n-1 nodes * (n-1) ) We multiplied by n-1 which denotes the number of ways to choose the vertex we decided to apply this operation on." I don't understand how the editorial got to this part. answered 17 Jul '17, 18:26 71●3 accept rate: 0% I have added an extra small explanation (18 Jul '17, 09:28)
 1 Total number of ordered trees(denominator) for n-node is (n-1)th catalan number. And the numerator is (n-2)th middle number in pascal triangle or central binomial co-efficient. Formula for central binomial co-efficient Now answer is n(n-1)/4n-6 (simplification). Now for dealing large numbers like 10^18 numerator will overflow so, I calculated it in two parts. Numerator1 = n, Numerator2=(n-1). Now final numerator =( Numerator1/gcd(num1,denominator) * Numerator2/gcd(num2,denomenator) ) Final denominator is min(denominator/gcd(num1,denominator) , denominator/gcd(num2,denomenator) ) Now comes inverse mod part. Important point: If x is smaller than y and we want to find y mod^-1 x then it is equal to (y%x) mod^-1 x. Now if I use Fermat's theorem everytime it timouts. So here is link which shows how to find inverse mod much faster in linear time Inverse Mod in linear time it is in Russian translate it. Now if I find inverse mod of all numbers again timout so I find inverse mod only till 10^3 (and then i use these result to find inverse mod of all number till 10^7 or 10^9)(kind of square root decompostion it can be easily understood from formula given in link to find inverse mod). My solution Almost 30 submissions to get full 100 :D answered 17 Jul '17, 19:13 4★rj25 230●5 accept rate: 0% you can find inverse using fermat's theorem in logN time. http://www.geeksforgeeks.org/multiplicative-inverse-under-modulo-m/ (17 Jul '17, 19:20) t test cases. so t*log 10^7 gives time out. O(n) compute (ony one time for n=2 to n=1000),store and then using it is more faster. (17 Jul '17, 19:24) rj254★ It didnt gave TLE to me when i did the same @rj25 . In fact, code passed well within the time limit. (17 Jul '17, 19:34) @vijju123 can you give link to your solution ?https://www.codechef.com/viewsolution/14488182 My solution using that approach only gave me 10 points. But @vijju I think my final approach is much faster than O(log n) one. (17 Jul '17, 19:40) rj254★ (17 Jul '17, 21:25) 1 Your solution is much simpler than mine. I have complicated everything by not taking n%m right from start. (17 Jul '17, 21:43) rj254★ showing 5 of 6 show all
 1 "Observe that each node in a tree consisting of n nodes and having only one child can be formed by applying this operation to only one tree of (n-1) nodes. It's exactly the same as choosing an opened bracket and its corresponding closing bracket in a simple balanced bracket sequence framing the subexpression between by a pair of brackets (our new node)." Would you mind adding in a pictorial representation to describe how the operation is done? answered 19 Jul '17, 00:44 21●2 accept rate: 0%
 0 I don't know why they gave that Proper fraction part. Even if you just solve it simply by taking Q^-1 and multiply it by P we get correct ans. answered 17 Jul '17, 18:17 255●5 accept rate: 11% 1 Sure you are right, but in problem's description gcd(P,Q) = 1 so just being precise (18 Jul '17, 10:48)
 0 i am not able to understand multiplicative inverse part How 6/5 is 400000004 and 200000003?? answered 17 Jul '17, 19:03 1 accept rate: 0% 5^-1 gives 400000003, multiply it by 6 and then take mod. You'll get the desired op. (17 Jul '17, 19:10) see this if you did not learn mod inverse (18 Jul '17, 11:08)
 0 Well getting a formal proof is something really hard, you will find yourself ending in reading papers. This solution was the easiest thing that came up to my mind. Anyway, this solution is kind of intuitive :) answered 18 Jul '17, 08:07 108●9●32 accept rate: 0%
 0 Forget formal proof. Right at the beginning of the paper there is formula for the number of nodes with d children. answered 18 Jul '17, 08:35 3★eugalt 282●7 accept rate: 4% I have added extra explanation, and also a note :) (18 Jul '17, 09:29)
 0 Can anybody please elaborate on the observation 2 part please -"Let's consider all different (n-1) ordered trees, and think about applying this operation. Choosing an arbitrary node of a tree and linking it to a single child (nth node) and removing all edges between our chosen node and its children, then linking them to our new node. So our new node would serve as a single child of the chosen one" How to perform this operation? answered 18 Jul '17, 21:33 1★sagar03 1 accept rate: 0% I tried to explain it as much as possible, If you don't understand at all why are we doing this you should check the link I have added right now (linearity of expectation) (18 Jul '17, 21:49) Okay,thanks! (18 Jul '17, 22:12) sagar031★
 0 What difference does it make if I write 2*(n)%MOD and (n+n)%MOD? answered 19 Jul '17, 00:55 1 accept rate: 0%
 0 Can someone explain in detail more about what ordered trees mean in this context? I have checked on various site - stackoverflow etc . but couldnt understand much.. Thanks in advance. answered 21 Jul '17, 20:58 2★chari407 448●2●7 accept rate: 0% http://cs.lmu.edu/~ray/notes/orderedtrees/ This should help. (21 Jul '17, 21:19) Thanks mate. Saw this but couldnt understand much. I dont seem to understand the concept behind the word ordering. (21 Jul '17, 23:05) chari4072★
 0 I have explained my solution here Though it is very long I have made an attempt to explain everything. answered 24 Jul '17, 13:10 4★adg1089 1 accept rate: 0%
 0 In the mentioned paper: The number of nodes of degree, d in all the trees in T, = the number of occurrences of d in all the sequences in I, = the number of occurrences of runs of exactly d(‘s (or equivalently d)‘s) in all the expressions in P(Parenthesis representation). How can the 2nd statement be justified, say suppose d=1, then according to 2nd statement, number of nodes of degree 1 would be all '(' brackets and that would n, which doesn't seem to be true. Am I missing something which is not so obvious? answered 30 Nov '17, 07:59 31●3 accept rate: 0%
 toggle preview community wiki:
Preview

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported
• mathemetical formulas in Latex between \$ symbol

Question tags:

×15,679
×891
×877
×229
×22
×20

question asked: 16 Jul '17, 08:48

question was seen: 4,251 times

last updated: 30 Nov '17, 08:00