OWCAFILE - Editorial

vijju123 · February 29, 2020, 8:58pm

PROBLEM LINK:

Setter- Kasra Mazaheri
Tester- Encho Mishinev
Editorialist- Abhishek Pandey

DIFFICULTY:

HARD

PRE-REQUISITES:

Concept of Modular Exponentiation and Fermat’s Little Theorem, Applications of FFT (All Possible Sums), Basic Combinatorics (The link might be more than what you need), General concepts of Modular set - like Generator of the Modular Set and Primitive Root (It just sound scary - its a trivial thing).

PROBLEM:

Given an array A with N elements, we make a brute force generator to generate sequences of length K to crack password. Our brute force generator generates all N^K sequences. The password is product of the K chosen elements modulo P. We need to find how many times each integer from 0 to P-1 is tried by the sequences generated by our generator modulo 998244353.

QUICK-EXPLANATION:

Key to AC- All possible sum (FFT) can be used if we change multiplication to addition!

We need to handle the case of 0 separately. Lets use a frequency array F to store frequency of elements. Let F[0] denote the frequency of 0. The solution for case of 0 would be N^K-(N-F[0])^K.

Now we need to find answer for all elements from 1 to P-1. For this, we will first transform the problem. Let G be the primitive root of the set. Hence, all numbers from 1 to P-1 can be written as some power of G.

Thus, transform the array A such that A_i is replaced by y_i such that G^{y_i}=A_i. Hence the product A_{i1}*A_{i2}*...*A_{ik} \% P now becomes \large G^{(y1+y2+...+yk)\% (P-1)} \%P.

Given that any power y (0 \leq y \leq P-2) of G (i.e. G^y) corresponds to a unique element from 1 to P-1, now the problem reduces to modular version of “All possible sums” problem, i.e. out of given y_i (via array A), in how many ways can we chose K of them to get a (modular) sum of r, for each r from 0 to P-2.

To solve above, we go by method of All Possible Sums. We make a polynomial P(x) such that:

The zero’eth coefficient (we are using this term for convenience) corresponds to power of x^0. Similarly first coefficient corresponds to power x^1. Hence, i'th coefficient corresponds to power x^i.
The value of i'th coefficient is nothing but frequency of i in array A (after transformation).

Now, since we can chose K elements, our answer lies in the K'th power of this polynomial. To find it, we do the following:

K is large, so we cannot do normal multiplication. Instead, we will use the same algorithm of Fast Exponentiation to calculate it - just this time on polynomials instead of numbers! In other words, we calculate P(x), P(x)^2, P(x)^4… and so on and use them to find P(x)^K in O(logK) time factor.
The terms in the polynomial will be too many in higher powers! But hey, we are interested in modular sum, not normal sum. What I mean is, take any power, say x^i where i \geq P-1. Note that the coefficient of x^i represents in how many ways we can get i by summing. But we are only interested in values from 1 to P-1 (we calculated for 0 already), and hence i's from 0 to P-2. So instead of keeping a separate term, and then manipulating result at end, we add coefficient of x^i to x^{i \% (P-1)} and discard x^i. Hence number of terms are always \leq P-1.

The only thing that’s left then trivial manipulation to see answer for which y_i corresponds to which A_i and print the result.

EXPLANATION:

The editorial will have following sections-

Transforming the problem from Product to Sum.
Bringing in Polynomial - Their interpretation, how they can help and the challenges.
Fast Exponentiation on above polynomial to get the answer

Mostly, the detailed explanation is aimed towards not-the-top-star-coders because the top coders already got everything they needed in the Quick Explanation Section. In case any doubt remains, they are requested to just skim through relevant sections.

There are some concept-checks in editorial - there solution is also in the bonus corner.

Transforming the problem from Product to Sum

This was the first step to solve this problem. And there are some things we need to realize.

For any problem related to product or divisions, the case of 0 has to be - almost always - either handled separately or seen with care.

So going by that norm, we will handle it separately. Number of times 0 is used as a password can be easily calculated as we only need count of sequences with at least 1 zero. This can be easily found as-

Count Sequences with at least one zero = Total sequences - Count of Sequences with no zeroes at all.

Let F[0] be the frequency of 0 in A. Hence, the answer for 0 becomes N^k-(N-F[0])^k. If you do not get how, its given in bonus corner.

Concept Check - Why are we not using the nCk formula here?

Now, we need the solution for passwords from 1 to P-1.

Now, obviously the product of K terms is a difficult thing to handle. Not only it overflows the standard data type range - even using Big Integer etc is going to be very costly, especially for K \approx 10^9.

The most common trick we apply for these cases is to somehow change multiplication to addition. For different questions, its done in different ways, like taking logs etc. For this question, taking logs is not desirable as we are doing modular arithmetic, hence we use the other concept - Generator or Primitive Root. (refer to pre-requisites and bonus corner for more)

Using the methods given in pre-requisite links, we can find the Primitive Root/Generator for the set, i.e. G. Note that when I say set, I mean the set of elements from 1 to P-1. Now, every element in A can be represented as some power of G. In other words, we replace A_i with y_i such that we have G^{y_i}=A_i.

Now, the product of A_{i1}*A_{i2}*...*A_{ik} \% P becomes sum (of powers) of G. That is, A_{i1}*A_{i2}*...*A_{ik} \% P now becomes \large G^{(y1+y2+...+yk)\% (P-1)} \%P.

Concept check - Why are the powers modulo P-1 instead of P?

Now, our problem transforms from “Finding how many times product of K elements equals to i for all i from [1,P-1]” to "Finding in how many ways sum of y_1+y_2+...+y_k \%P yields c such that G^c \% P \in [1,P-1] for all c \in [0,P-2] (Note modulo of power is P-1 and each power from 0 to P-2 represents a distinct integer in range [1,P-1])

Bringing in the Polynomial - Their interpretation and how they can help.

Our problem is now reduced to finding in how many ways we can chose K elements from our transformed A such that their modular sum is c for all 0 \leq c \leq P-2. This can be solved by using All Possible Sums algorithm.

The original problem says that, given 2 arrays, find what are the possible sums and how many ways are possible to generate them. Seems pretty similar! The only changes are this:

All possible sums (given in FFT pre-requisite link) does it for 2 arrays, and assumes we want all possible sums if we chose one element from each.
It can be easily generalized. We know that all our elements are to be chosen from A. This is similar to having K copies of array A and having to choose one element from each. This is why you will see we go for K'th power of frequency polynomial.

To begin with solving the problem, we shall make a frequency polynomial. This is a polynomial where i'th coefficient (corresponding to power x^i) has value F[i] where F[i] is frequency of i in A. Obviously, our polynomial has 0-based indexing (zeroth coefficient corresponds to x^0). Note that this is done in the modified/transformed array A where A_i is replaced by y_i such that G^{y_i}=A_i. (Concept check - Why does this polynomial work? This also deals with how it helps and is at bonus corner)

As already justified above, our answer shall lie in the K'th power of this polynomial. But there are 2 challenges we face here:

K is very large. Using normal FFT will take a LOT of time for K \approx 10^9.
The number of terms is going to kill us as well. For K \approx 10^9, you can imagine how many terms the K'th power will have.

We will resolve these difficulties in the next section.

Overcoming Challenges using our old and friendly Fast Exponentiation Trick

We had 2 challenges in previous section. One pertaining to K being very high - hence it being very time consuming to calculate K'th power of the polynomial. And the other pertained to the K'th power having too many terms to cause memory and time limits to get exceeded.

You will see that both of these are actually handled by very basic fundamentals.

Firstly, lets deal with high number of terms first. It is a legit question, that how can we solve the problem with this method if the number of terms are \approx 10^9 ? The answer is simple - we are interested in modular sum and not normal sum! All Possible Sums solves for normal sum, but it is faster if it has to be done modulo P. Actually, since the sum we are finding are powers of G, they are modulo is P-1, but you get the flow!

Now, what happens when terms increase? Terms increase because we got a new power of x, say x^i where i \geq (P-1). Note that, we are interested in finding how many ways we can select (y_1,y_2,...,y_k) and get a value c where 0 \leq c \leq P-2. So we know that ultimately the number of ways to get i (which is the coefficient of x^i) are to be added to number of ways to get i \% (P-1) to get the final answer (because we are interested in modular sum and not normal sum). So why not do that now?

All that I’m saying is, whenever we get a term x^i where i \geq P-1, we simply add its contribution to x^{i \% (P-1)} and discard the term x^i. This keeps number of terms at any time O(P) and the concept behind the polynomial stays intact!

Now all that we need to handle is finding the K'th power. This can be easily done using fast exponentiation trick - except this time on polynomials instead of numbers! Let P(x) be our frequency polynomial. We will calculate P(x), P(x)^2, P(x)^4, P(x)^8, … and so on to finally obtain P(x)^K using their combination. The multiplication happens as described earlier (to keep number of terms in check).

So if you’ve understood till here, pat yourself on the back because you are now well equipped to solve this problem. You will find referring to setter’s code very much easy - just know that he has used some algebra library (which you should as well) instead of writing the code from scratch. So just trust those function names and read the code with a pinch of abstraction.

SOLUTION

Setter

// In The Name Of The Queen
#include<bits/stdc++.h>
using namespace std;
const int LG = 18, N = 1 << LG, Mod = 998244353, LGK = 30;
inline int Power(int a, int b)
{
    int ret = 1;
    for (; b; b >>= 1, a = 1LL * a * a % Mod)
        if (b & 1) ret = 1LL * ret * a % Mod;
    return (ret);
}
 
// ================================================================
const int MOD = 998244353;
 
struct mod_int {
    int val;
 
    mod_int(long long v = 0) {
        if (v < 0) v = v % MOD + MOD;
        if (v >= MOD) v %= MOD;
        val = v;
    }
 
    static int mod_inv(int a, int m = MOD) {
        // https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#Example
        int g = m, r = a, x = 0, y = 1;
 
        while (r != 0) {
            int q = g / r;
            g %= r; swap(g, r);
            x -= q * y; swap(x, y);
        }
 
        return x < 0 ? x + m : x;
    }
 
    explicit operator int() const {
        return val;
    }
 
    mod_int& operator+=(const mod_int &other) {
        val += other.val;
        if (val >= MOD) val -= MOD;
        return *this;
    }
 
    mod_int& operator-=(const mod_int &other) {
        val -= other.val;
        if (val < 0) val += MOD;
        return *this;
    }
 
    static unsigned fast_mod(uint64_t x, unsigned m = MOD) {
#if !defined(_WIN32) || defined(_WIN64)
        return x % m;
#endif
        // Optimized mod for Codeforces 32-bit machines.
        // x must be less than 2^32 * m for this to work, so that x / m fits in a 32-bit integer.
        unsigned x_high = x >> 32, x_low = (unsigned) x;
        unsigned quot, rem;
        asm("divl %4\n"
            : "=a" (quot), "=d" (rem)
            : "d" (x_high), "a" (x_low), "r" (m));
        return rem;
    }
 
    mod_int& operator*=(const mod_int &other) {
        val = fast_mod((uint64_t) val * other.val);
        return *this;
    }
 
    mod_int& operator/=(const mod_int &other) {
        return *this *= other.inv();
    }
 
    friend mod_int operator+(const mod_int &a, const mod_int &b) { return mod_int(a) += b; }
    friend mod_int operator-(const mod_int &a, const mod_int &b) { return mod_int(a) -= b; }
    friend mod_int operator*(const mod_int &a, const mod_int &b) { return mod_int(a) *= b; }
    friend mod_int operator/(const mod_int &a, const mod_int &b) { return mod_int(a) /= b; }
 
    mod_int& operator++() {
        val = val == MOD - 1 ? 0 : val + 1;
        return *this;
    }
 
    mod_int& operator--() {
        val = val == 0 ? MOD - 1 : val - 1;
        return *this;
    }
 
    mod_int operator++(int) { mod_int before = *this; ++*this; return before; }
    mod_int operator--(int) { mod_int before = *this; --*this; return before; }
 
    mod_int operator-() const {
        return val == 0 ? 0 : MOD - val;
    }
 
    bool operator==(const mod_int &other) const { return val == other.val; }
    bool operator!=(const mod_int &other) const { return val != other.val; }
 
    mod_int inv() const {
        return mod_inv(val);
    }
 
    mod_int pow(long long p) const {
        assert(p >= 0);
        mod_int a = *this, result = 1;
 
        while (p > 0) {
            if (p & 1)
                result *= a;
 
            a *= a;
            p >>= 1;
        }
 
        return result;
    }
 
    friend ostream& operator<<(ostream &stream, const mod_int &m) {
        return stream << m.val;
    }
};
 
namespace NTT {
    vector<mod_int> roots = {0, 1};
    vector<int> bit_reverse;
    int max_size = -1;
    mod_int root;
 
    bool is_power_of_two(int n) {
        return (n & (n - 1)) == 0;
    }
 
    int round_up_power_two(int n) {
        while (n & (n - 1))
            n = (n | (n - 1)) + 1;
 
        return max(n, 1);
    }
 
    // Given n (a power of two), finds k such that n == 1 << k.
    int get_length(int n) {
        assert(is_power_of_two(n));
        return __builtin_ctz(n);
    }
 
    // Rearranges the indices to be sorted by lowest bit first, then second lowest, etc., rather than highest bit first.
    // This makes even-odd div-conquer much easier.
    void bit_reorder(int n, vector<mod_int> &values) {
        if ((int) bit_reverse.size() != n) {
            bit_reverse.assign(n, 0);
            int length = get_length(n);
 
            for (int i = 0; i < n; i++)
                bit_reverse[i] = (bit_reverse[i >> 1] >> 1) + ((i & 1) << (length - 1));
        }
 
        for (int i = 0; i < n; i++)
            if (i < bit_reverse[i])
                swap(values[i], values[bit_reverse[i]]);
    }
 
    void find_root() {
        max_size = 1 << __builtin_ctz(MOD - 1);
        root = 2;
 
        // Find a max_size-th primitive root of MOD.
        while (!(root.pow(max_size) == 1 && root.pow(max_size / 2) != 1))
            root++;
    }
 
    void prepare_roots(int n) {
        if (max_size < 0)
            find_root();
 
        assert(n <= max_size);
 
        if ((int) roots.size() >= n)
            return;
 
        int length = get_length(roots.size());
        roots.resize(n);
 
        // The roots array is set up such that for a given power of two n >= 2, roots[n / 2] through roots[n - 1] are
        // the first half of the n-th primitive roots of MOD.
        while (1 << length < n) {
            // z is a 2^(length + 1)-th primitive root of MOD.
            mod_int z = root.pow(max_size >> (length + 1));
 
            for (int i = 1 << (length - 1); i < 1 << length; i++) {
                roots[2 * i] = roots[i];
                roots[2 * i + 1] = roots[i] * z;
            }
 
            length++;
        }
    }
 
    void fft_iterative(int N, vector<mod_int> &values) {
        assert(is_power_of_two(N));
        prepare_roots(N);
        bit_reorder(N, values);
 
        for (int n = 1; n < N; n *= 2)
            for (int start = 0; start < N; start += 2 * n)
                for (int i = 0; i < n; i++) {
                    mod_int even = values[start + i];
                    mod_int odd = values[start + n + i] * roots[n + i];
                    values[start + n + i] = even - odd;
                    values[start + i] = even + odd;
                }
    }
 
    const int FFT_CUTOFF = 150;
 
    vector<mod_int> mod_multiply(vector<mod_int> left, vector<mod_int> right) {
        int n = left.size();
        int m = right.size();
 
        // Brute force when either n or m is small enough.
        if (min(n, m) < FFT_CUTOFF) {
            const uint64_t ULL_BOUND = numeric_limits<uint64_t>::max() - (uint64_t) MOD * MOD;
            vector<uint64_t> result(n + m - 1, 0);
 
            for (int i = 0; i < n; i++)
                for (int j = 0; j < m; j++) {
                    result[i + j] += (uint64_t) ((int) left[i]) * ((int) right[j]);
 
                    if (result[i + j] > ULL_BOUND)
                        result[i + j] %= MOD;
                }
 
            for (uint64_t &x : result)
                if (x >= MOD)
                    x %= MOD;
 
            return vector<mod_int>(result.begin(), result.end());
        }
 
        int N = round_up_power_two(n + m - 1);
        left.resize(N);
        right.resize(N);
 
        bool equal = left == right;
        fft_iterative(N, left);
 
        if (equal)
            right = left;
        else
            fft_iterative(N, right);
 
        mod_int inv_N = mod_int(N).inv();
 
        for (int i = 0; i < N; i++)
            left[i] *= right[i] * inv_N;
 
        reverse(left.begin() + 1, left.end());
        fft_iterative(N, left);
        left.resize(n + m - 1);
        return left;
    }
 
    vector<mod_int> mod_power(const vector<mod_int> &v, int exponent) {
        assert(exponent >= 0);
        vector<mod_int> result = {1};
 
        if (exponent == 0)
            return result;
 
        for (int k = 31 - __builtin_clz(exponent); k >= 0; k--) {
            result = mod_multiply(result, result);
 
            if (exponent >> k & 1)
                result = mod_multiply(result, v);
        }
 
        return result;
    }
 
    vector<mod_int> mod_multiply_all(const vector<vector<mod_int>> &polynomials) {
        if (polynomials.empty())
            return {1};
 
        struct compare_size {
            bool operator()(const vector<mod_int> &x, const vector<mod_int> &y) {
                return x.size() > y.size();
            }
        };
 
        priority_queue<vector<mod_int>, vector<vector<mod_int>>, compare_size> pq(polynomials.begin(), polynomials.end());
 
        while (pq.size() > 1) {
            vector<mod_int> a = pq.top(); pq.pop();
            vector<mod_int> b = pq.top(); pq.pop();
            pq.push(mod_multiply(a, b));
        }
 
        return pq.top();
    }
}
// ================================================================
 
int n, k, P, A[N], Res[N], MP[N];
int main()
{
    scanf("%d%d%d", &n, &P, &k);
    for (int i = 0; i < n; i ++)
        scanf("%d", &A[i]);
    sort(A, A + n);
    reverse(A, A + n);
    int Zero = 0;
    while (n && !A[n - 1])
        Zero ++, n --;
    Res[0] = (Power(n + Zero, k) - Power(n, k) + Mod) % Mod;
 
    if (P == 2)
        return !printf("%d %d\n", Res[0], Power(n, k));
 
    int G = 3;
    while(P > 3)
    {
        int c = 1;
        for (int a = G; a != 1; a = a * G % P)
            c ++;
        if (c == P - 1)
            break;
        G += 2;
    }
    if (P == 3)
        G = 2;
 
    int c = 1;
    for (int a = G; a != 1; a = a * G % P)
        MP[a] = c, c = (c + 1) % P;
 
    int lenp = 1;
    while (lenp < P * 2 - 3)
        lenp <<= 1;
    mod_int rlenp = Power(lenp, Mod - 2);
 
    vector < mod_int > F(lenp, 0), R(lenp, 0);
    for (int i = 0; i < n; i ++)
        F[MP[A[i]]] ++;
    R[0] = 1;
 
    for (int i = 0; i <= LGK; i ++, k >>= 1)
    {
        NTT::fft_iterative(lenp, F);
        if (k & 1)
        {
            NTT::fft_iterative(lenp, R);
            for (int j = 0; j < lenp; j ++)
                R[j] *= F[j] * rlenp;
            reverse(R.begin() + 1, R.end());
            NTT::fft_iterative(lenp, R);
            for (int j = 0; j < P - 1; j ++)
                R[j] += R[j + P - 1];
            for (int j = P - 1; j < lenp; j ++)
                R[j] = 0;
        }
        if (i == LGK)
            break;
        for (int j = 0; j < lenp; j ++)
            F[j] *= F[j] * rlenp;
        reverse(F.begin() + 1, F.end());
        NTT::fft_iterative(lenp, F);
        for (int j = 0; j < P - 1; j ++)
            F[j] += F[j + P - 1];
        for (int j = P - 1; j < lenp; j ++)
            F[j] = 0;
    }
 
    for (int i = 0, a = 1; i < P - 1; i ++, a = a * G % P)
        Res[a] = R[i].val;
    for (int i = 0; i < P; i ++)
        printf("%d ", Res[i]);
    return 0;
}

Time Complexity=O(P * logP *logK) (because frequency polynomial is of size P not N)
Space Complexity=O(P*logK)