PROBLEM LINK:
Author: Sunny Aggarwal
Tester: Istvan Nagy
Editorialist: Xellos
DIFFICULTY:
Easymedium
PREREQUISITES:
Next greater element, prefix sums
PROBLEM:
You’re given arrays A and B of equal length N, which define a matrix C_{i,j}=A_iB_j+iB_j+jA_i+ij. Compute the sum of maxima of all K\times Ksubmatrices of C, modulo 10^9+7.
SHORT EXPLANATION:
A lot of small steps:

a small modification of the arrays A and B reduces the elements of C to C_{ij}=A_iB_j

the maximum in a submatrix can also be written as the product of maxima in the arrays

the answer can then be written as a product, too

we need to compute sums of maxima in specific subarrays; we can find that by computing the number of subarrays in which a given element is the maximum instead

we end up with the subproblem of “add linear functions to ranges in an array, find the values in this array afterwards”, which can be solved using differences of adjacent elements
EXPLANATION:
step 1
The problem statement contains a big formula (for some people): C_{i,j}=A_iB_j+iB_j+jA_i+ij. We can write it in a simpler way: C_{i,j}=(A_i+i)\cdot(B_j+j).
Apart from that, there’s no mention of the arrays A and B later on, so if we add i to each A_i and j to each B_j, we can solve the problem for C_{i,j}=A_iB_j.
step 2
The elements of A and B were positive; that didn’t change after the previous step. Under these constraints, the maximum element in a submatrix C_{l..r,t..b} is the product of maxima of subarrays A_{l..r} and B_{t..b}. This should be clear  all elements of the submatrix of C are computed as products of numbers from the subarrays of A and B and we just maximised that product by maximising both factors (note: of course, such an element really exists in that submatrix).
step 3
Each submatrix X is uniquely given by a subarray X_A of A and another subarray X_B of B, where X contains the pairwise products of all elements in X_A with all elements in X_B. Since X has to be a K\times K matrix, X_A and X_B have to contain K elements each. The formula for F can then be written as (omitting the modulo)
denoting the sum of maxima over all Kelement subarrays X_A of A by F_A and similarly F_B for B. We only need to compute both of them now.
step 4  computing one answer
The usual way to count the sum of maxima over some subarrays of A (we’ll focus just on that; the same algorithm is applied to B) is by trying all elements of A and counting the number of subarrays for which they’re the maximum elements. For that, we need to assign maximum elements to subarrays uniquely; one possible way is by taking the leftmost maximum. If there are P(i) subarrays for which A_i is the maximum, then the sum we’re looking for can be computed as \sum A_iP(i), because for each subarray, its maximum is counted in this sum exactly once.
We only need to figure out how to compute P(i) for all i. We can see that A_i is the leftmost maximum only in subarrays which don’t contain the closest element A_{l_i} \ge A_i to its left or the closest element A_{r_i} > A_i to its right. The latter is known as the Next Greater Element and can be found using a wellknown stack algorithm in O(N) time; with a small modification, that algorithm can also be used to compute l_i. If there’s no valid l_i, we’ll say that l_i=0 and if there’s no valid r_i, then we’ll say r_i=N+1.
We’re only interested in subarrays of fixed length K; P(i) is the number of such subarrays that start in the interval [l_i+1,i] and end in the interval [i,r_i1]. If a subarray starts at A_j, then it ends at A_{j+K1}, so it needs to satisfy the (sufficient) conditions l_i+1 \le j \le i and i \le j+K1 \le r_i1. Together, they give
so we have
Now, we know what to do to compute the answer for a single K: compute all l_i, r_i, P(i), then F_A (and F_B in the same way), multiply them modulo 10^9+7 (F_A and F_B fit into 64bit integers, so it’s necessary to apply the modulo before multiplying as well) and we get F.
Since each P(i) can be computed in O(1) time, the time complexity for one K is O(N) and in total, it’s O(N^2). That should be enough for the first subtask.
computing all answers
However, we want the answers (in other words, F_A and F_B, from which we can compute the answers easily) for all K at once. For that, we’ll look at the specific cases for P(i):

i > r_iK, l_i+1 \le iK+1: r_ii < K \le il_i, P(i)=r_ii

i \le r_iK, l_i+1 \le iK+1: 0 < K \le \mathrm{min}(il_i,r_ii), P(i)=K

i > r_iK, l_i+1 > iK+1: \mathrm{max}(il_i,r_ii) < K \le N, P(i)=r_iKl_i

i \le r_iK, l_i+1 > iK+1: il_i < K \le r_ii, P(i)=il_i
Each of the terms A_iP(i) corresponds to adding a constant or linear function to values of F_A for a specific range of K. This can be done using segment trees or Fenwick trees with range updates, but there’s a faster way using the fact that there we only want the values of F_A after all those operations.
Let’s take an array of differences between successive F_A: dif[k]=F_A(k+1)F_A(k). Adding a linear function c+kd to a range [a,b] of F_A means adding the constant d to a range [a,b) in this array and also adding c to dif[a1] and c to dif[b]. If we found the array dif, we can find F_A as its prefix sums.
The array dif can be found in the same way using its own array of differences dif_2[k]=dif[k+1]dif[k]. Adding a constant x to a range [a,b] in dif means adding x to dif_2[a1] and x to dif_2[b]; once again, the values of dif can be found as prefix sums of dif_2.
Since we have just O(N) linear functions to add and prefix sums can be computed in O(N), the whole algorithm runs in O(N) as well. Memory: O(N).