[Tutorial] Disjoint Sparse Table


Hi! Since I couldn’t find a tutorial on this data structure and I decided to make one myself. I don’t know if this data structure has a name so I am just using the name I saw it being referred to on codeforces i.e. Disjoint Sparse Table or DST.

So before diving into its structure and working let’s talk what it supports and when can it be used

Suppose you are given an array A of size N and you are asked to perform Q queries.
Each query asks you to compute some function F over subarray [L, R] i.e. F(A_L, A_{L+1}, \cdots, A_R )

Now with a sparse table, you can answer these queries in \mathcal{O}(\ln N) with \Theta(N \ln N) preprocessing.

Let N' = 2^{\lceil \ln N \rceil}. Then with disjoint sparse table we can answer the queries in \Theta (1) with \Theta(N' \ln N') preprocessing.


  • Here ln means base 2.
  • Array A is immutable
  • Function F is
  • If F is
    then normal sparse table can also
    answer queries in \Theta(1)

As you can see you won’t need to use this DS in most problems. As problems with immutable data and with tight time constraints are very rare.


Its structure is same as a segment tree. Each node stores information related to a range of indexes [L, R).

visual representation of DST

Pardon my sub-par MS paint skills.
**Note: It should be depth not height in the image**

Now consider a node corresponding to [L, R) and the middle index M = \frac{L+R}{2}.
This node stores precomputed values of

F(A[i \ldots M]) \mid i \in [L, M] \text{ and}
F(A[M+1 \ldots i]) \mid i \in [M+1, R)

So size of node is equal to R-L. If its size>1 then it has two children corresponding to [L, M) and [M, R).


Now to build a node corresponding to [L, R). We first compute all the required precomputed values in \Theta(R-L) and recursively build its child nodes.

Let the time complexity to build a tree with root of size X be T(X) then

T(X) = 2 T(^X/_2) + \Theta(X)
\Rightarrow T(X) = \Theta(X \ln X)

So Time complexity to build a DST is equal to \Theta(N \ln N).


We can answer Queries recursively similar to how we do in segment tree. Let us say we are in node [L, R) and we need to answer query corresponding to [X, Y]; then

if X \le M \le Y then we an answer in \Theta(1) provided we can merge answers of [X, M] and [M+1, Y] in \Theta(1).

else the range [X, Y] is completely in one of the child node ranges and we move the query downwards to that node.

So Time Complexity for querying would be \mathcal{O}(\ln N). Note that worst case happen in cases like X=Y= \frac {N}{2} - 1.

Improving Time Complexity

Let’s increase the size of array A to N' = 2^{\lceil \ln N \rceil} i.e. till its size becomes a power of 2. Note that this wouldn’t change the answer to any query.Let x =\lceil \ln N \rceil.

Now consider the j^{th} node (0-based indexing from left) at depth i. It’s size = \frac{N'}{2^i}. It would correspond to \left[ \frac{N'}{2^i}\times j, \frac{N'}{2^i}\times (j+1)\right). Its M = j \times \frac{N'}{2^{i}} + \frac{N'}{2^{i+1}}.

Consider M’s binary representation. It would be of the form

\underbrace{m_1m_2m_3m_4m_5 \ldots m_k} _ {\text{binary representation of j}}1\underbrace{00000 \ldots0 0000} _ {x-i-1 \text{ zeroes}}

Now lets say we are asked to answer a query [L, R]. Removing L=R trivial case we can say that L < R and binary representation of L and R would be of the form:

\begin{matrix} L& = b_0b_1b_3 \ldots b_{k_1}& 0 & l_1 & l_2 & \ldots & l_{k_2} \\ R& = b_0b_1b_3 \ldots b_{k_1}& 1 & r_1 & r_2 & \ldots & r_{k_2} \\ \hline M& = b_0b_1b_3 \ldots b_{k_1}& 1 & 0 & 0 & \ldots & 0 \end{matrix}

Note that k_2 = \text{word size } - \text{number of leading zeroes of }L \oplus R - 1.
It can be seen that L < M \le R. Furthermore it can be verified that a node at depth = x - k_2 - 1 has M as its middle index. Now it remains to verify that the bounds of that node say [X, Y] fully contain the interval [L, R].

One can see that binary representation of X and Y would be as follows

\begin{matrix} M& = b_0b_1b_3 \ldots b_{k_1}& 1 & 0 & 0 & \ldots & 0 \\ X& = b_0b_1b_3 \ldots b_{k_1}& 0 & 0 & 0 & \ldots & 0 \\ Y& = b_0b_1b_3 \ldots b_{k_1}& 1 & 1 & 1 & \ldots & 1 \end{matrix}

Thus X \le L < R \le Y.

Considering that the microprocessor has BSR(Bit Scan Reverse) operation we can determine the node, that can answer the query in \Theta(1), in \Theta(1).

So now each query can be answered in \Theta(1).


We can see that each level sum of sizes of nodes is equal to N. So We can simply use a 2 dimensional array to store DST. I will explain with an example:

Solution to range product query:

Suppose we are given an array an array A and a number P and we are asked to answer Q queries:
given L and R, 0 \le L \le R < N, find \prod\limits_{i=L}^R A_i \pmod P.

Global variables and constants:

#define MAXN 1000000
#define MAXPOWN 1048576		// 2^(ceil(log_2(MAXN)))
#define MAXLEV 21			// ceil(log_2(MAXN)) + 1
int n, P, Q;
int table[MAXLEV][MAXPOWN];
int maxlev, size;

Then some initialisations:

size = n;
maxlev = __builtin_clz(n) ^ 31;		// floor(log_2(n))
if( (1<<maxlev) != n)
    size = 1<<++maxlev;

Then the build function(the preprocessing part):

void build(int level=0,int l=0, int r=size)
  int m = (l+r)/2;
  table[level][m] = A[m]%P;
  for(int i=m-1;i>=l;i--)
    table[level][i] = (long long)table[level][i+1] * A[i] % P;
  if(m+1 < r) {
    table[level][m+1] = A[m+1]%P;
    for(int i=m+2;i<r;i++)
      table[level][i] = (long long)table[level][i-1] * A[i] % P;
  if(l + 1 != r)		// r - l > 1
    build(level+1, l, m);
    build(level+1, m, r);

We call build() to initialize the DST.

Now to answer query [X, Y] we use

int query(int x, int y)
  if(x == y)
    return A[x]%P;
  int k2 = __builtin_clz(x^y) ^ 31;
  int lev = maxlev - 1 - k2;
  int ans = table[lev][x];
  if(y & ((1<<k2) - 1))	// y % (1<<k2)
    ans = (long long)ans *  table[lev][y] % P;
  return ans;


  • I assume that size of int is 32 bits
  • __builtin_clz() is an inbuilt function in gcc compiler(not in C++ standard) which returns the count of leading zeroes(hence the name)
  • 31 - num = 31 ^ num. This this true for any number of the form 2^x-1.

You can try solving SEGPROG with this DS.



Nice tutorial! The explanation of how it works was clear :slight_smile:
The structure can also be built bottom-up, easier to code that way. Here’s what I came up with.

int n = 1; while(n < N) n <<= 1;
int range, half;
for(int h = 1; (range = 1 <> 1;
    for(int i = half; i < n; i += range) {
        t[h][i - 1] = A[i - 1];
        for(int j = i - 2; j >= i - half; j--)
            t[h][j] = t[h][j + 1] * A[j] % P;
        t[h][i] = A[i];
        for(int j = i + 1; j < i + half; j++)
            t[h][j] = t[h][j - 1] * A[j] % P;

Note that h is height, makes more sense to use height than depth when building this way. And for queries

if(L == R) result = A[L];
else {
    int h = number_of_bits_in_int - __builtin_clz(L ^ R);
    result = t[h][L] * t[h][R] % P;

Additionaly, if a BSR operation is not available there are some alternate ways get calculate the position of highest set bit fast using bitwise operations. You can find a few such ways here.


i didn’t get how query get answered in o(1) please help

Anyone know how to fix \underbrace latex here?

What needs fixing? Looks good to me!

Very well written tutorial! Great work

@meooow I replaced it with an image. lets see if it works here \underbrace{m_1m_2m_3m_4m_5 \ldots m_k}_{\text{binary representation of j}}1\underbrace{00000 \ldots0 0000}_{x-i-1 \text{ zeroes}}

@adhyyan1252 Thank you!

Your implementation is much cleaner! Storing prefixes till M-1 helps us avoid if-condition in query. In using height instead of depth we no longer need to store x.

Also here h = \lfloor \ln {(L \oplus R)} \rfloor + 1. Which can also be computed in O(\ln(\ln N)) ~5 operations with precomputation/look up table.

Oh I didn’t notice it was an image. I messed around with the formatting and apparently if you have more than one underbrace on the same line it shows up incorrectly like that. But you can avoid it, just put spaces before and after the underscore like this:
\underbrace{abc} _ {def} ghi \underbrace{jkl} _ {mno} becomes \underbrace{abc} _ {def} ghi \underbrace{jkl} _ {mno}


@meooow, there is a inbuilt function to compute highest bit in O(1). It is __lg(i). Basically it returns \lfloor{log_2 n}\rfloor, which would also be the highest bit in number. Think about this.

@nilesh3105 yes, the one of the methods in the bithacks link does exactly that :smiley:

@likecs I didn’t know that, thanks! Although according to the source here, int __lg(int __n) is implemented as
sizeof(int) * __CHAR_BIT__ - 1 - __builtin_clz(__n)
which is really the same thing, but with one extra multiplication.

Thanks. Fixed.

@likecs __builtin prefixed functions translate directly to the appropriate assembly instruction(s). Provided the processor has them. __lg() function is a wrapper function.

Maybe when i get time i will analyze assembler output for different implementations.

For any L and R of a query we can compute M, X, Y in \Theta(1)(Using the binary representation given in the post).
These 3 numbers are sufficient to recognize the node that will answer the query in \Theta(1)(using the precalculated prefixes and suffixes).
Assuming that in your implementation you can access that node in \Theta(1) we can answer the query in \Theta(1)

sorry to disturb you again but can you please explain me how that representation came ? and still i didn’t understand how that query is get answered through that single node (we can get prefix or suffix but how can we get the product in a range in that node? is it not the same as original question we are trying to solve !)

i am reading this from morning and also consulted my friends but didn’t able to understand it completely, please help !

We can answer Queries recursively
similar to how we do in segment tree.
Let us say we are in node [L,R) and we
need to answer query corresponding to
[X,Y] then if X≤M≤Y then we an
answer in Θ(1) provided we can merge
answers of [X,M] and [M+1,Y] in Θ(1)

The notation can be confusing because X, Y and L, R are swapped in this quote.
In the original notation [L, R] = merge([L,M], [M+1,R])

sizeof(int) * __CHAR_BIT__ will be computed by the compiler, so it won’t count as runtime multiplication