PROBLEM LINK:
Author: Ilya Malinovsky
Tester: Pushkar Mishra
Editorialist: Florin Chirica
DIFFICULTY
medium
PREREQUISITES
dynamic programming, graph theory
PROBLEM
You’re given a graph with vertices defined by 2 numbers (a, b), where a is the layer and b is the position in the layer. There is always an edge between (a, x) and (a+1, y), for every a, for each x and y. Also there are given some additional edges in the input. How many paths are from (1, 0) to (n+1, 0)?
QUICK EXPLANATION
Let’s start by a simple DP. One can notice the graph is a directed acyclic one, so one can apply this standard DP for counting number of paths: dp[a]** = in how many ways can we arrive into (a, b).
The trick is to speed it up by considering only states (a, b) that appear at least once in an additional edge. We sort all those pairs (a, b) in increasing order by layer and at equality, in increasing order of position in the layer. When you are at a layer a, the only information that really matters is in how many ways can you reach (a  1, x), for each x. With careful calculation, this can be computed.
EXPLANATION
No cycles
Let’s start by studying the properties of the given graph. One may notice the graph is a directed acyclic one (noted DAG). Why? A directed edge increases either the layer or the position in the current layer (because all edges are directed from left to right). Suppose we get a cycle ABCD. Layer of D will be greater than layer of A or position of D will be greater than position of A in the same layer. However, as we get a cycle, it means we also have an edge DA. This contradicts our previously stated fact, so no cycle can exist. Hence, the given graph is a DAG.
Count number of paths in a DAG
For now, let’s assume the limits are small enough to simply iterate over all vertexes. Then the problem is solved by dynamic programming. Let ways[layer][pos] = in how many ways can we reach vertex defined by (layer, pos).
We have two choices

Go by a “special” edge. In this way, ways[layer][pos] is sum of ways[edge0][edge1], such as edge2 = layer and edge3 = pos.

Don’t go from a special edge. In this way, we have to arrive from the previous layer. Since between (layer  1, x) and (layer, pos), there is always one edge, we can add to ways[layer][pos] sum of ways[layer  1][x], for each x between 0 and m1.
In the end, the answer is given in ways[n + 1][0]. This approach is way too slow, but hopefully we can optimize it.
Speeding it up
The trick to optimize our solution is to notice only pairs (x, y) such as at least one edge0 = x and edge1 = y or edge2 = x and edge3 = y are actually the ones needed to iterate. Suppose we consider a pair (x’, y’) such as it’s not in the set of pairs (x, y) mentioned above. Then, ways[x’][y’] is always equal to ways[x’  1][0] + ways[x’  1][1] + … + ways[x’  1][m  1].
Let’s have all unique pairs (x, y) sorted in increasing order by x and at equality, in increasing order of y. We can calculate ways[x][y] easily if we have sumLayer(x  1), which means ways[x  1][0] + ways[x  1][1] + … + ways[x  1][m  1]. The task is reduced to quickly calculate sumLayer(x). If we can do this, the whole task can be solved.
Let’s assume that x is the layer of the current vertex and x’ is the layer of previous vertex. Suppose sumLayer(x’) is calculated and now we want to calculate sumLayer(x). Once again, sumLayer(x) can be obtained in two ways

Use special edges
Then, we add to sumLayer(x) all values ways[a]** such as we have in input a special edge a b x y. This can be done in parallel with computing ways[x][y].

Do not use special edges
Then, I claim that sumLayer(x) += sumLayer(x’) * power(M, x  x’). From each of layers x’ + 1, x’ + 2, …, x we can choose exactly one vertex. For each layer, we can make m choices, so by the rule of product we can arrive from x’ to x in power(M, x  x’). Also, this can be done for each path which arrives in x’, namely sumLayer(x’). Function pow(A, B) can be calculated in O(log2(B)) using binary exponentiation.
There is one detail left  how to handle pairs like (a, b), where both a and b can have large values. For C++ programmers, we can use a container like map. Other approach might be to simply sort the array and refer to a pair as its index in sorted sequence. This way, no map is required at all, but the code becomes more complex.
Time Complexity
We iterate over all M edges. Each step, our most expensive operation is doing binary exponentiation and the exponent is up to N. So our complexity is O(M * \log(M)).