 Author: Aleksandr Abelyan
Editorialist: Jelmer Firet

MEDIUM

# PREREQUISITES:

Bitwise operations
Setter’s solution: Fast Fourier Transform

# PROBLEM:

You are given N vertical lines and M horizontal lines. You may add one more horizontal line. Give the maximum number of squares you can form with different areas?

# QUICK EXPLANATION:

Use bitsets to speed up the following \mathcal{O}(n^2) solution. Determine which distances between pairs of horizontal and vertical lines already exist. For each possible new line (temporarily) update the distances between horizontal pairs. Then count how many k are both the distance between two horizontal lines and the distance between two vertical lines.

# EXPLANATION:

Before I start my explanation, I want to mention that I start explaining the idea behind the algorithm. The code given in this part is used to support the idea, but would give a TLE because it is too slow. In the section after that I will discuss how the code can be sped up using bitsets.

## Idea of the algorithm

Note that a square is a rectangle with equal side lengths. Furthermore the area of a square is entirely defined by the side length. So the problem could have been rephrased as finding the maximum possible number of squares with different side lengths. A square with side length k is formed by the lines only if there are a pair of vertical lines k units apart; and a pair of horizontal lines k units apart.

Because we are not allowed to add vertical lines, all possible square sizes are already determined by the given vertical lines. Therefore it will be handy to figure out for each k if there are two vertical lines that are k units apart. We will also determine for which k there are two horizontal lines k units apart. We can find these distances between pairs of lines with the following bit of code:

// vertical[i] is true if there is a line x=i, reason for boolean array comes later
vector<bool> vertical;
//verDiff[k] is going to be true if there is a pair of vertical lines k apart
vector<bool> verDiff(width+1);
for (int i = 0; i <= width; i++) if (vertical[i]) { // lines x=i
for (int j = i; j <= width; j++) if (vertical[j]) { // lines x=j
verDiff[j-i] = true;
}
}
// do the same for horizontal lines


Now we have two boolean arrays verDiff and horDiff that hold whether the vertical or horizontal lines would allow a square of size k to exist. If we want to count the number of different sized squares we loop trough both arrays and count how often verDiff[k] = horDiff[k] = true. But we can still add a line to increase the number of squares.

A quick modification that allows us to calculate the answer is to consider every possible line y=c, temporarily add it and calculate the number of squares we end up with. However this would become way too slow, because now we have three nested loops (over x_1, x_2 and c). To get back to quadratic complexity we notice that a lot of things don’t need to be recalculated. For example verDiff doesn’t depend on the newly added line. Furthermore at most M of the values in horDiff change when you add the new line, so we can just determine which these are.

So the algorithm we have consists of the following steps

1. Calculate verDiff for the vertical lines
2. Calculate horDiff for the original horizontal lines
3. Consider each horizontal line y=c
1. Determine which items in horDiff will additionally be set to true (named newHorDiff)
2. Determine the number of squares by looping trough verDiff, horDiff and newHorDiff

This is done in the following snippet:

// calculate verDiff and horDiff
int maxSquare = 0;
for (int c = 0; c <= height; c++) { // consider y=c
vector<bool> newHorDiff; // the distances from the new line to the others
for (int i = 0; i <= height; i++) if (horizontal[i]) { // lines y=i
newHorDiff[abs(i-c)] = true;
}
int numSquare = 0;
for (int i = 0; i <= height; i++) {
if (verDiff[i] and (horDiff[i] or newHorDiff[i]) ) {
numSquare++;
}
}
maxSquare = max(maxSquare, numSquare);
}
// print maxSquare-1 to ignore the 0-area square


## Making it faster

Now that we have got the idea of the algorithm, there is still an issue: it’s too slow. In my explanatory snippets there are nested for loops, each of which can be executed up to 10^5 times. This means the algorithm is \mathcal{O}(n^2), where we take n the largest of W,H,N,M. Using the estimate of 10^9 operations per second, it seems that this algorithm is not feasible. However, we can do more in a single operation. Let’s introduce bitwise operations.

Bitwise operations are like the familiar boolean operations, but they work on 64 bits at a time (maybe 32 based on OS and hardware). There are also two additional bitwise operations: left-shift and right-shift. You can read up on these on Wikipedia if you are not familiar with them.

To make use of bitwise operations we will store the boolean arrays in a bitset instead. bitset is a datastructure from the C++ standard that allow us to use bitwise operations on more than 64 bits. Other programming languages might have similar structures predefined. Now we just need to transform the snippets to use bitsets.

For the first snippet we are going to replace the inner for loop with bitwise operations. First we can replace the assignment of true by an or operation with the condition of if(vertical[x2]), so we will get verDiff[x2-x1] |= vertical[x2]. Next we will increase both indices to get verDiff[x2] |= vertical[x2+x1]. And finally turn the for-loop into a bitwise operations using, where you replace the +x1 in the index with a left-shift. We then turned the snippet into the following code:

// vertical[i] is true if there is a line x=i, reason for boolean array comes later
bitset<mx> vertical;
//verDiff[k] is going to be true if there is a pair of vertical lines k apart
bitset<mx> verDiff;
for (int i = 0; i <= width; i++) if (vertical[i]) {
verDiff |= (vertical >> i)
}
// repeat the same for the horizontal lines


For the second segment I need to introduce a new bitset. This will be revHorizontal, which is the reverse of horizontal. So revHorizontal[i] is set if there is a horizontal line y=\text{height}-i. We need this new bitset because it also allows us to update distances to lines below the new line quickly. horizontal and revHorizontal will be shifted by y and \text{height}-y respectively, so that after shifting horizontal[k] is set if there is a line at y=c+k, and similarly revHorizontal[k] will be set if there is a line at y=c-k.
Next we need to calculate for how many squares we can form. We can just translate the boolean operations of the snippet into bitwise operations to get a bitset where bit k is set if there is a square with side-length k. The C++ bitset contains a function count that counts how many bits are set, so the number of squares. The second snippet transforms into the following snippet:

// calculate verDiff and horDiff
int maxSquare = 0;
for (int c = 0; c <= height; c++) {
bitset<mx> newHorDiff;
newHorDiff |= (horizontal >> c);
newHorDiff |= (revHorizontal >> (height-c));
int numSquare = (verDiff & (horDiff | newHorDiff)).count();
maxSquare = max(maxSquare, numSquare);
}
// print maxSquare-1 to ignore the 0-area square.


Now that all inner for loops have been replaced by bitwise operations the algorithm is fast enough to pass the time limit. Intuitively by using bitsets we turned an algorithm that used \mathcal{O}(n^2) operations into an algorithm that uses \mathcal{O}(\frac{n^2}{64}) operations. (Though note that because of the definition of big O the algorithm using bitsets is still \mathcal{O}(n^2))

You can look at my full code below which has a lot more comments.

# ALTERNATE SOLUTIONS:

The setter (Aleksandr) used Fast Fourier Transfrom (FFT) to determine the already existing distances between pairs of horizontal or vertical lines. This is a method that uses some complex (literally) mathematics to determine for multiply frequencies how strong an input array “resonates” with that frequency. In terms of this problem, that would happen if there is a pair of lines that frequency apart. 3blue1brown has a good video that explains what a Fourier Transform is. In practice I have not yet encountered a problem for which a FFT is required for the solution. If any of you do, please tell me about it in the comments.

The tester (Radoslav) used the same approach as me. However he constructed the revHorizontal on the fly instead of beforehand.

# SOLUTIONS:

Setter's Solution
#include <bits/stdc++.h>
using namespace std;

typedef long long ll;
typedef unsigned long long ull;
typedef long double ld;

//#pragma GCC target("avx2")
//#pragma GCC optimization("unroll-loops")
//#pragma GCC optimize("O2")
//#pragma GCC optimize("O3")

#define FOR(i,a) for (int i=0;i<(a);++i)
#define FORD(i,a) for (int i=(a)-1;i>=0;i--)
#define FORT(i,a,b) for (int i=(a);i<=(b);++i)
#define FORTD(i,b,a) for (int i=(b);i>=(a);--i)
#define trav(i,v) for (auto i : v)
#define all(v) v.begin(),v.end()
#define fr first
#define sc second
#define mpr(a,b) make_pair(a,b)
#define pir pair<int,int>
#define make_unique(v) v.erase(unique(all(v)),v.end())
#define fastio ios_base::sync_with_stdio(false); cin.tie(0); //cout.tie(0);
#define y1 EsiHancagorcRepa
const int N=1e5+4;
const ll MOD=1e9+7;
const ll MX=1e16;

using cd = complex<ld>;
using vcd = vector<complex<ld> >;

const ld PI = acos((ld)-1);

int rev[3 * N];
cd roots[3 * N];

void fft(vcd& a)
{
int n = a.size();
for (int i = 0; i < n; ++i)
if (i < rev[i])
swap(a[i], a[rev[i]]);
for (int len = 1, step = n / 2; len < n; len <<= 1, step >>= 1)
for (int st = 0; st < n; st += len + len)
for (int i = 0, root = 0; i < len; ++i, root += step)
{
cd u = a[st + i], v = roots[root] * a[st + len + i];
a[st + i] = u + v;
a[st + i + len] = u - v;
}
}
void inv(vcd& a)
{
fft(a);
reverse(a.begin() + 1, a.end());
for (cd& x : a)
x /= a.size();
}
void conv(vcd& a, vcd& b)
{
int n = 1, h = 0;
while (n < a.size() + b.size())
n *= 2, ++h;
while (a.size() < n) a.push_back(0);
while (b.size() < n) b.push_back(0);
rev = 0;
for (int i = 1, high = -1; i < n; ++i) {
high += !(i & (i - 1));
rev[i] = rev[i ^ (1 << high)] | (1 << (h - high - 1));
}
ld sector = 2. * PI / n;
for (int i = 0; i < n; ++i) {
roots[i] = cd(cos(sector * i), sin(sector * i));
}
fft(a); fft(b);
for (int i = 0; i < n; ++i)
a[i] *= b[i];
inv(a);
}
bitset<N> tv,tn,unused;
int x[N],y[N];

int main(){
srng;
fastio;
int w,h,n,m;
cin>>w>>h>>n>>m;
vcd xx(w+1,0),yy(h+1,0),xxr(w+1,0),yyr(h+1,0);
FOR(i,n){
cin>>x[i];

xx[x[i]]=1;
xxr[w-x[i]]=1;

}

conv(xx,xxr);

FORT(i,w+1,xx.size()-1){

if ((int)(xx[i].real()+0.5)){

unused[i-w]=1;
}
}
FOR(i,m){
cin>>y[i];
tv[y[i]]=1;
yy[y[i]]=1;
yyr[h-y[i]]=1;
}
conv(yy,yyr);
int ans=0;
FORT(i,h+1,yy.size()-1){
if ((int)(yy[i].real()+0.5)){
if (unused[i-h])ans++;
unused[i-h]=0;
}
}
int pat=0;
FOR(i,h+1){

if (!tv){
pat=max(pat,int((unused&(tv|tn)).count()));

tv=tv>>1;
tn=tn<<1;
}
else{
tv=tv>>1;
tn=tn<<1;
tn=1;
}
}
cout<<ans+pat<<endl;
return 0;
}

Tester's Solution
#include <bits/stdc++.h>

using namespace std;
template<class T, class T1> int chkmax(T &x, const T1 &y) { return x < y ? x = y, 1 : 0; }
const int MAXN = (int)1e5 + 42;

int w, h, n, m;
int a[MAXN], b[MAXN];
bitset<MAXN> hor, ver, emp;

cin >> w >> h >> n >> m;
for(int i = 0; i < n; i++) {
cin >> a[i];
hor[a[i]] = 1;
}

for(int i = 0; i < m; i++) {
cin >> b[i];
ver[b[i]] = 1;
}
}

void solve() {
bitset<MAXN> dp1 = emp, dp2 = emp;

// Can be replaced with FFT.
for(int i = 0; i < n; i++) {
dp1 |= hor >> a[i];
}

// Can be replaced with FFT.
for(int i = 0; i < m; i++) {
dp2 |= ver >> b[i];
}

bitset<MAXN> available = dp1 & dp2;
bitset<MAXN> pref, suff = ver;
pref = ver;

int ans = available.count();
for(int i = 0; i <= h; i++) {
if(!ver[i]) {
chkmax(ans, (available | ((pref | suff) & dp1)).count());
}

pref <<= 1;
suff >>= 1;
pref = ver[i + 1];
}

// -1 cause len=0 isn't a square
cout << ans - 1 << endl;
}

int main() {
ios_base::sync_with_stdio(false);
cin.tie(nullptr);

solve();
return 0;
}

Editorialist's Solution
#include <iostream>
#include <vector>
#include <utility>
#include <tuple>
#include <algorithm>
#include <bitset>
using namespace std;

typedef vector<int> vi;
typedef vector<vi> vvi;

const int mx = 100001;

int main() {
int width, height, numVer, numHor;
cin >> width >> height >> numVer >> numHor;
// bitsets for vertical and horizontal lines
// vertical[i] is set if there is a line y=i
// horizontal[i] is set if there is a line x=i
// revHorizontal[i] is set if there is a line x=height-i
bitset<mx> vertical, horizontal, revHorizontal;

// read in x-coordinates of vertical lines and store in bitset
for (int i = 0; i < numVer; i++) {
int verLine;
cin >> verLine;
vertical.set(verLine);
}
// read in y-coordinates of horizontal lines and store in bitset
for (int i = 0; i < numHor; i++) {
int horLine;
cin >> horLine;
horizontal.set(horLine);
revHorizontal.set(height-horLine);
}
bitset<mx> verDiff, horDiff;

// iterate over each of the vertical lines (x=i)
for (int i = 0; i<= width; i++) if (vertical[i]) {
// and for each line (at x=j) to the right of it set bit j-i of verDiff
verDiff |= (vertical >> i);
}
// iterate over each of the horizontal lines (y=i)
for (int i = 0; i<= width; i++) if (horizontal[i]) {
// and for each line (at y=j) above it set bit j-i of horDiff
horDiff |= (horizontal >> i);
}

int maxSquare = 0;
// loop over all possible y-coordinates of the new line
for (int c = 0; c <= height; c++) {
// newHorDiff will store the newly formed distances,
bitset<mx> newHorDiff;
// update the distances between the new line and those above it
newHorDiff |= (horizontal >> c);
// update the distances between the new line and those below it
newHorDiff |= (revHorizontal >> (height-c));
// calculate how many squares can be created
// there is a square of a particular side-length if that side-length
// is the distance between two horizontal lines,
// and the distance between two vertical lines
int numSquare = (verDiff & (horDiff | newHorDiff)).count();
maxSquare = max(maxSquare, numSquare);
}

// -1 to ignore a 0-area square.
cout << maxSquare-1 << endl;

return 0;
}

42 Likes

Great contest

If any one wants explanation on how Bitset was used to solve this question in Hindi- LINK

11 Likes

NIce problem!

1 Like

This was a great problems.
After trying for 2 days i was able to solve it.

4 Likes

i solved this for 50 marks (3sec time) still i got TLE for 2 subtasks

#include
#include<bits/stdc++.h>
using namespace std;
#include
#define ll long long

int main(void) {

ios_base::sync_with_stdio(false);
cin.tie(NULL);

ll w,h,n,m,temp,final_max=0;
cin>>w>>h>>n>>m;
vectorall_x,all_y;

for(ll i=0;i<n;i++)
{
cin>>temp;
all_x.push_back(temp);
}

for(ll i=0;i<m;i++)
{
cin>>temp;
all_y.push_back(temp);
}

vectornbi;
for(ll j=0;j<=h;j++)
{
bool flag=false;
for(ll i=0;i<m;i++)
{
if(all_y[i]==j)
{
flag=true;
break;
}
}
if(flag!=true)
{
nbi.push_back(j);
}
}

sets,s2,s3;
for(ll i=0;i<n;i++)
{
for(ll j=i+1;j<n;j++)
{
s.insert(abs(all_x[i]-all_x[j]));
}
}

for(ll i=0;i<m;i++)
{
for(ll j=i+1;j<m;j++)
{
s3.insert(abs(all_y[i]-all_y[j]));
}
}
// ---------------------------------------------------------------COMPULSARY-----------------------------------------------------------------------

ll count=0,max1=0,pos=0;

ll f=nbi.size();

for(ll j=0;j<f;j++)
{
count=0;
sets1,s4;

for(ll i=0;i<m;i++)
{
s1=s;
s4=s3;
s1.insert(abs(all_y[i]-nbi[j]));
ll k1=s1.size(),m1=s.size();
if(k1==m1)
{
s4.insert(abs(all_y[i]-nbi[j]));
ll k2=s3.size(),m2=s4.size();
if(k2<m2)
{
count++;
}
}
}
if(count>=max1)
{
max1=count;
pos=nbi[j];
}
}

// _____________________________________________________________________________________
ll max2=0;

set :: iterator itr;

for (itr = s.begin(); itr != s.end(); ++itr)
{
set<ll>tv;
tv=s3;
tv.insert(*itr);
if(tv.size()==s3.size())
{
max2++;
}


}

cout<<max1+max2<<endl;

}

can you check

1 Like

Can you format your code or link your submission. I suggest you look into the “making it faster” section of my editorial

1 Like

I don’t understand How every unique difference is calculated in O(n). For example 8,32,1777,2345566.
Here there will be 3+2+1=6 unique differences right? i.e n*(n-1)/2
How can it be calculated in O(n) ??

Thanks for editorial !!

It’s not done in \mathcal{O}(n), but in \mathcal{O}(n^2). It is only fast enough because it is sped up using bitwise operations.

6 Likes

2 Likes

put your code in between 2 signs ` and format it or either paste code link, its not readable,

1 Like

Nice explanation!!

My bad I wrote a simple modified brute force logic O(n^2) with no bitsets and FFT, and yet my code turned out to be the fastest AC submission in the contest (Java DIV 1)
Were the test cases a bit weak?

There were only three 100pt JAVA submissions in div1, so being the fastest of those is not really surprising. I don’t know if testcases could have been constructed to prevent really optimized algorithms that didn’t use bitsets, because they run in almost the same time.

1 Like

ok thanks!
i had no idea that there were only 3 submissions btw

1 Like

Does bitset count() function have a time complexity of O(N) where N is the length of the bitset?

Asymptotic complexity is O(N) but you’d get a 32 times or 64 times speed up depending on the machine you are working on.

thanks ,
i thought i should make it to O(n) only which is not possible,
i don’t know using bitwise O(n2) algo can also be possible

1 Like

Even though bitset count function can be 64 times faster than O(N), the total number of operations per second in the worst case can be ((2 x 10^5)^2)/64 1.6 x 10^8. As the time limit is 3 sec, (1.6 x 10^8)/3 5.3 x 10^7 are the total number of operations per second for this logic. Is this a very tight bound for the compiler to accept? It seems so to me. Correct me if I’m wrong.