SELEDGE - Editorial

ratingoverflow · October 22, 2020, 9:27am

PROBLEM LINK:

Practice
Contest

DIFFICULTY:

Hard

Setter: Tianyi Qiu
Tester: Istvan Nagy
Editorialist: Tianyi Qiu

PREREQUISITES:

Min Cost Flow, Simple Probabilities, Depth-First Search

PROBLEM:

Given a undirected graph with n (weighted) vertices and m (weighted) edges, find \leq K edges to maximize the sum of weights of their endpoints (with replicate elements removed) minus their own weights.

QUICK EXPLANATION (Setter’s Approach):

First we can observe that the optimal set of edges form a union of stars. Knowing this, we can then use Min Cost Flow to find the optimal solution, but to do that we need first to know which vertices are the centers of the stars, which can be tackled with randomization.

EXPLANATION (Setter’s Approach):

Basic Idea

Consider the subgraph formed by the chosen edges. Each connected component in it should be a star (i.e. a tree with diameter<=2), because the middle edge in any path of length 3 is useless and we can repeatedly delete such edges until we get a starry sky.

We then use randomization. Randomly assign each vertex with either a “root label” or a “leaf label” (each with probability \frac 12), and from now on we shall only consider those stars with each leaf having a leaf label and the center vertex having a root label.

Now we can calculate the answer by a Min Cost Flow model, in which the flow goes from the source to the root vertex, and then to the leaf vertices, and finally to the sink. It turns out that a Successive Shortest Path algorithm using Bellman-Ford+Queue (also informally known as SPFA) have the best performance here. The number of shortest path calculations is K, and each instance among these costs O((n+m)\cdot K) because each vertex can enter the queue at most O(K) times. Therefore the complexity of one Min Cost Flow instance is O(K^2(n+m)).

Now analyse the success rate. It can be proved that, the solution hardest to get is a solution consisting entirely of 3-vertex stars, with success rate 2^{-\frac 32 K}. Thus, repeating the process \Theta(2^{\frac 32 K}) times (which is roughly 2.83^K) is enough.

Overall complexity is O(2.83^K\cdot K^2\cdot (n+m)).

Some Details

Why is that each vertex enters the queue O(K) times?

All edges in the Min Cost Flow model goes from the root side to the leaf side, and we may observe that any simple path that is long enough must contain a number of edges going from the leaf side to the root side, and the exact number is proportional to the length of the path. However, there are at most K such edges, as each of them corresponds to a chosen edge in the original graph.

Therefore, shortest path from source to any other vertex has length O(K), and therefore each vertex enter the queue O(K) times.

Why is that 3-vertex stars constitute the worst-case scenario and how to calculate its success rate?

First notice that, our Min Cost Flow model can solve the subproblem (i.e. the original problem with fixed leaf and root labels) perfectly. It can find the optimal solution to the subproblem.

Thus for any optimal solution S of the original problem, our probability of getting it is just the probability that S is also a legal solution in the subproblem. That means, for every connected component (which is a star) of S, its center vertex must have a root label, while its leaves must have leaf labels, with the exception of a 2-vertex star (in this case, either vertex can be the center). The probability of this is 2^{-s}\cdot\frac 12 (where s denotes the number of edges in the star) for s>1 and 2^{-s} for s=1. Finally, the probability of S being legal, is just the product of the aforementioned values for each star in S.

Notice that the sum of all s in S equals K, so the product of all 2^{-s} terms (for s=1 or >1) is just 2^{-K}, and we shall just focus on the number of extra \frac 12 terms. Turns out there’re at most \frac K2 such terms (the maxima is achieved when every star has s=2). Therefore the worst-case success probability is 2^{-\frac 32 \cdot K}.

Further Improvements

The previous solution can get 85pt. To get the remaining 15pt you need to implement at least one of the two following optimizations.

Optimization 1

We focus on the “all 3-stars” scenario and try to reduce the value 2.83.

Previously we assign the root and leaf label with \frac 12 - \frac 12 probability, and now we change it to \frac 13 - \frac 23. With some calculation we can find that the success rate rises to around 2.6^{-K} (the value 2.6 is actually \frac {3^{1.5}}2. Again, this maxima is achieved when every star has s=2).

Optimization 2

Previously in the Min Cost Flow model, we only allowed the flow to go from the root to the leaf. Now we permit both root-to-leaf and leaf-to-root directions. Thus for each star, the configuration of leaves having root labels and center vertex having leaf label, also becomes acceptable, and the value 2.83 falls to 2.

EXPLANATION (Tester’s Approach):

Sketch by Tester

I have a vertex set what I called ‘must used nodes’ what is basically the N(S) in the problem description

I have a recursive method:

calc:

1:find the best edge

removed the best edge → call calc with k-1

2: (best edge still removed )

We know if the best edge is not selected then the end point are still must used nodes (or should be in the N(S) in the current best solution)

let u, and v the two nodes of the best edge

we collect the available edges from u and v, sorted them by the current weight

let e_u the array of e_u1, e_u2… the edges which are the sorted edges from u, and the same for v are e_v: e_v1, e_v2 …

for uedge : e_u
- we also know the other endpoint of the uedge (which is not u) is in the N(S)
- remove uedge from the graph
- for vedge : e_v
  - we also know the other endpoint of the vedge (which is not v) is in the N(S)
  - remove vedge from the graph
  - call calc(k-2)
- we can remove the e_v nodes from the N(S)
- put back the e_v edges to the graph
we can remove the e_u nodes from the N(S)
put back the e_u edges to the graph
put back the best edge to the graph

also I made some simple condition when it has no sense to continue the search like:

when the N(S) > 2*k or when the fulfilled N(S) == 2*k we can remove all the edges what at least one of the nodes not in N(S), etc…

The complexity without cuts is about 19!! * (pick the best edge so maybe some logarithm there)

The 19!! means (semi factorial) 19*17*15*…*1.

SOLUTIONS:

Setter's Solution

#include <bits/stdc++.h>
using namespace std;
typedef long long LL;
typedef pair < int , int > pii;
#define mpr make_pair
#define FS first
#define SC second
#define PB push_back
template < typename T > void UMAX(T &a,T b){a=(a>b?a:b);}
template < typename T > void UMIN(T &a,T b){a=(a<b?a:b);}
LL readint(){
	char c=getchar();
	LL ret=0ll;
	bool neg=0;
	while(!(c>='0' && c<='9')){
		if(c=='-') neg=1;
		c=getchar();
	}
	while(c>='0' && c<='9'){
		ret=ret*10ll+(LL)(c-'0');
		c=getchar();
	}
	return neg?-ret:ret;
}
void putint(LL v){
	if(v<0){
		putchar('-');
		v=-v;
	}
	if(!v){
		putchar('0');
		return;
	}
	if(v>=10ll) putint(v/10ll);
	putchar('0'+(v%10ll));
}
const int iinf=(int)2.01e9;
int n,m,x[105],stt[105],res;
vector < pii > adj[105];
namespace flows{
	struct edge{
		int v,f,c,r;
		void set(int V,int F,int C,int R){
			v=V;f=F;c=C;r=R;
		}
	}T;
	vector < edge > adj[105];
	int N,dis[105],prevv[105],preve[105],Q[2005],ql,qr;
	bool inq[105];
	void init(){
		int i;
		for(i=0;i<N;++i) adj[i].clear();
	}
	void addedge(int u,int v,int f,int c){
		T.set(v,f,c,(int)adj[v].size());
		adj[u].push_back(T);
		T.set(u,0,-c,(int)adj[u].size()-1);
		adj[v].push_back(T);
	}
	void spfa(){
		ql=qr=0;
		Q[qr++]=0;
		fill(dis,dis+N,iinf);
		memset(inq,0,sizeof(bool)*N);
		dis[0]=0;
		inq[0]=1;
		while(qr>ql){
			int c=Q[ql++],i;
			inq[c]=0;
			for(i=0;i<(int)adj[c].size();++i){
				edge &E=adj[c][i];
				if(E.f>0 && dis[c]+E.c<dis[E.v]){
					dis[E.v]=dis[c]+E.c;
					prevv[E.v]=c;
					preve[E.v]=i;
					if(!inq[E.v]) Q[qr++]=E.v;
					inq[E.v]=1;
				}
			}
		}
	}
	int mincostflow(int K){
		int f=0,ret=0,i;
		while(1){
			spfa();
			if(dis[1]>=0){
				return ret;
			}
			ret+=dis[1];
			++f;
			if(!--K) return ret;
			for(i=1;i;i=prevv[i]){
				edge &E=adj[prevv[i]][preve[i]];
				--E.f;
				++adj[E.v][E.r].f;
			}
		}
	}
}// S:0  T:1
int id[105];
void run(int K){
	int i,j,k;
	flows::N=n+2;
	flows::init();
	int ptr=2;
	for(i=0;i<n;++i){
		if(stt[i]) id[i]=ptr++;
	}
	for(i=0;i<n;++i){
		if(!stt[i]) id[i]=ptr++;
	}
	for(i=0;i<n;++i){
		if(stt[i]){
			flows::addedge(0,id[i],1,-x[i]);
			flows::addedge(0,id[i],n,0);
			for(pii P:adj[i]){
				if(!stt[P.FS]){
					flows::addedge(id[i],id[P.FS],1,P.SC);
				}
			}
		}
		else{
			flows::addedge(id[i],1,1,-x[i]);
			flows::addedge(id[i],1,n,0);
		}
	}
	res=max(res,-flows::mincostflow(K));
}
mt19937 rng(19260817);
int solvecase(vector < int > w,vector < int > u,vector < int > v,vector < int > we,int K,int times=60000){
	int i,j,k;
	res=0;
	n=(int)w.size();
	m=(int)u.size();
	for(i=0;i<n;++i){
		x[i]=w[i];
		adj[i].clear();
	}
	for(i=0;i<m;++i){
		adj[u[i]].push_back(mpr(v[i],we[i]));
		adj[v[i]].push_back(mpr(u[i],we[i]));
	}
	int lastupd=-1;
	for(int t=0;t<times;++t){
		int bef=res;
		for(i=0;i<n;++i){
			stt[i]=(rng()%2==0);
		}
		run(K);
		if(res!=bef) lastupd=t;
	}
//	printf("lastupd %d\n",lastupd);
	return res;
}
int main(){
	int t=readint(),i;
	while(t--){
		int n=readint(),m=readint(),K=readint();
		vector < int > w(n),u(m),v(m),we(m);
		for(i=0;i<n;++i) w[i]=readint();
		for(i=0;i<m;++i) u[i]=readint()-1,v[i]=readint()-1,we[i]=readint();
		printf("%d\n",solvecase(w,u,v,we,K,20000));
	}
	return 0;
}

Tester's Solution

#include <bits/stdc++.h>

#define all(x) (x).begin(), (x).end()
#define forn(i, n) for (int i = 0; i < (int)(n); ++i)
#define fore(i, a, b) for (int i = (int)(a); i <= (int)(b); ++i)

using namespace std;
//SELEDGE

struct EDGE {
	uint16_t a;
	uint16_t b;
	uint32_t w;
	void read()
	{
		cin >> a >> b >> w;
		--a; --b;
		assert(a != b);
		if (a > b)
			swap(a, b);
	}
};

struct GRAPH {
	const uint32_t INF = 0xFFFF;
	int n;
	int m;
	int k;
	vector<uint32_t> wNodes;
	vector<EDGE> edges;
	vector<vector<uint32_t> > neighb;

	int remk;
	uint8_t mustUseNodeCounter;
	vector<bool> usedNode;
	vector<bool> mustUseNode;
	vector<bool> usedEdge;

	void init()
	{
		remk = k;
		sort(all(edges), [&](const EDGE& a, const EDGE& b) {
			uint32_t aw = wNodes[a.a] + wNodes[a.b] + b.w;
			uint32_t bw = wNodes[b.a] + wNodes[b.b] + a.w;
			return aw > bw;
			});
		neighb.resize(n);
		forn(i, edges.size())
		{
			const auto& e = edges[i];
			neighb[e.a].push_back(i);
			neighb[e.b].push_back(i);
		}
		mustUseNodeCounter = 0;
		usedNode.resize(n);
		mustUseNode.resize(n);
		usedEdge.resize(m);
	}

	void read()
	{
		cin >> n >> m >> k;
		wNodes.resize(n);
		for (auto& wn : wNodes)
			cin >> wn;
		edges.resize(m);
		for (auto& e : edges)
			e.read();
		vector<bool> removedEdges(m);
		forn(i, m) fore(j, i + 1, m - 1)
		{
			if (edges[i].a == edges[j].a && edges[i].b == edges[j].b)
			{
				if (edges[i].w <= edges[j].w)
					removedEdges[j] = true;
				else
					removedEdges[i] = true;
			}
		}
		vector<EDGE> filteredEdges;
		forn(i, m)
			if (!removedEdges[i])
				filteredEdges.push_back(edges[i]);
		edges.swap(filteredEdges);
		init();
	}

	uint32_t edgeWeight(uint32_t index) const
	{
		assert(usedEdge[index] == false);
		uint32_t we = edges[index].w;
		uint32_t wn = (usedNode[edges[index].a] ? 0 : wNodes[edges[index].a]) + 
			(usedNode[edges[index].b] ? 0 : wNodes[edges[index].b]);
		if (we > wn)
			return 0;
		return wn - we;
	}

	vector<uint32_t> getSortedEdgeIds(uint32_t index) const
	{
		vector<pair<uint32_t, uint32_t>> candE;
		for (auto ei : neighb[index])
		{
			if(usedEdge[ei])
				continue;
			uint32_t ew = edgeWeight(ei);
			if(ew == 0)
				continue;
			candE.emplace_back(make_pair(-ew, ei));
		}
		sort(all(candE));
		vector<uint32_t> res;
		for (auto ce : candE)
			res.push_back(ce.second);
		return res;
	}

	bool checkStop() const
	{
		if (remk == 0 || 2 * remk < mustUseNodeCounter)
			return true;
		return false;
	}

	uint32_t calc()
	{
		if (checkStop())
			return 0;
		
		uint32_t currSol = 0, currBest = 0;
		vector<uint32_t> currRemovedEdges, currMustUsedNodes;

		if (2 * remk == mustUseNodeCounter)
		{
			//remove all edge which not all node in the must use
			forn(i, m)
			{
				if(usedEdge[i])
					continue;
				const EDGE& e = edges[i];
				if (!mustUseNode[e.a] || !mustUseNode[e.b])
				{
					usedEdge[i] = true;
					currRemovedEdges.push_back(i);
				}
			}
		}
		else if (2 * remk == mustUseNodeCounter + 1)
		{
			//remove all edge which not all node in the must use
			forn(i, m)
			{
				if (usedEdge[i])
					continue;
				const EDGE& e = edges[i];
				if (!mustUseNode[e.a] && !mustUseNode[e.b])
				{
					usedEdge[i] = true;
					currRemovedEdges.push_back(i);
				}
			}
		}

		//find best Edge
		uint32_t bestIndex = INF, bestC = 0;
		forn(i, edges.size())
		{
			const auto& e = edges[i];
			if(usedEdge[i])
				continue;
			uint32_t currw = edgeWeight(i);
			if(currw == 0)
				continue;
			if (currw > bestC)
			{
				bestIndex = i;
				bestC = currw;
				if(!usedNode[e.a] && !usedNode[e.b])
					break;
			}
		}

		if (bestC == 0)
		{
			for (auto ei : currRemovedEdges)
				usedEdge[ei] = false;
			for (auto cn : currMustUsedNodes)
			{
				mustUseNode[cn] = false;
				mustUseNodeCounter--;
			}
			return 0;
		}
			
		const EDGE& bestEdge = edges[bestIndex];

		bool prevA = usedNode[bestEdge.a];
		bool prevB = usedNode[bestEdge.b];

		usedNode[bestEdge.a] = true;
		usedNode[bestEdge.b] = true;
		usedEdge[bestIndex] = true;

		bool prevBMA = mustUseNode[bestEdge.a];
		bool prevBMB = mustUseNode[bestEdge.b];

		mustUseNode[bestEdge.a] = false;
		mustUseNode[bestEdge.b] = false;

		if (prevBMA)
			--mustUseNodeCounter;
		if (prevBMB)
			--mustUseNodeCounter;

		--remk;
		currBest = bestC + calc();
		++remk;

		if (prevBMA)
			++mustUseNodeCounter;
		if (prevBMB)
			++mustUseNodeCounter;

		mustUseNode[bestEdge.a] = prevBMA;
		mustUseNode[bestEdge.b] = prevBMB;

		usedNode[bestEdge.a] = prevA;
		usedNode[bestEdge.b] = prevB;

		if (remk == 1)
		{
			for (auto ei : currRemovedEdges)
				usedEdge[ei] = false;
			for (auto cn : currMustUsedNodes)
			{
				mustUseNode[cn] = false;
				mustUseNodeCounter--;
			}

			usedEdge[bestIndex] = false;
			return currBest;
		}
			
		//go through on a edges till the mustUseCounter valid

		vector<uint32_t> aEdges = getSortedEdgeIds(bestEdge.a);
		vector<uint32_t> bEdges = getSortedEdgeIds(bestEdge.b);

		if(aEdges.size() > remk+1)
			aEdges.resize(remk+1);
		if (bEdges.size() > remk+1)
			bEdges.resize(remk+1);

		for (const auto& ae : aEdges)
		{
			if (checkStop())
				break;
			//add ae to used
			const auto& ea = edges[ae];
			const uint32_t aw = edgeWeight(ae);
			usedEdge[ae] = true;
			currRemovedEdges.push_back(ae);
			bool prevaA = usedNode[ea.a];
			bool prevaB = usedNode[ea.b];
			usedNode[ea.a] = true;
			usedNode[ea.b] = true;

			bool prevMA = mustUseNode[ea.a];
			bool prevMB = mustUseNode[ea.b];

			mustUseNode[ea.a] = false;
			mustUseNode[ea.b] = false;

			if (prevMA)
				--mustUseNodeCounter;
			if (prevMB)
				--mustUseNodeCounter;

			vector<uint32_t> bremovedEdge;
			vector<uint32_t> bMustUsed;
			for (const auto& be : bEdges)
			{
				if (checkStop())
					break;
				//add be to used
				const auto& eb = edges[be];
				const uint32_t bw = edgeWeight(be);
				if (aw + bw <= bestC)
					break;

				usedEdge[be] = true;
				bremovedEdge.push_back(be);
				bool prevbA = usedNode[eb.a];
				bool prevbB = usedNode[eb.b];
				usedNode[eb.a] = true;
				usedNode[eb.b] = true;

				bool prevMbA = mustUseNode[eb.a];
				bool prevMbB = mustUseNode[eb.b];

				mustUseNode[eb.a] = false;
				mustUseNode[eb.b] = false;

				if (prevMbA)
					--mustUseNodeCounter;
				if (prevMbB)
					--mustUseNodeCounter;

				//calc
				remk -= 2;

				uint32_t cv = calc() + aw + bw;
				currBest = max(currBest, cv);
				
				remk += 2;

				usedNode[eb.a] = prevbA;
				usedNode[eb.b] = prevbB;

				if (prevMbA)
					++mustUseNodeCounter;
				if (prevMbB)
					++mustUseNodeCounter;

				mustUseNode[eb.a] = prevMbA;
				mustUseNode[eb.b] = prevMbB;

				//add the other to the mustUsed
				if (!usedNode[eb.a] && !mustUseNode[eb.a])
				{
					mustUseNode[eb.a] = true;
					bMustUsed.push_back(eb.a);
					mustUseNodeCounter++;
				}
				if (!usedNode[eb.b] && !mustUseNode[eb.b])
				{
					mustUseNode[eb.b] = true;
					bMustUsed.push_back(eb.b);
					mustUseNodeCounter++;
				}
			}

			if (prevMA)
				++mustUseNodeCounter;
			if (prevMB)
				++mustUseNodeCounter;

			mustUseNode[ea.a] = prevMA;
			mustUseNode[ea.b] = prevMB;

			//restore b
			for (auto bi : bremovedEdge)
				usedEdge[bi] = false;
			for (auto bm : bMustUsed)
			{
				mustUseNode[bm] = false;
				mustUseNodeCounter--;
			}
			
			usedNode[ea.a] = prevaA;
			usedNode[ea.b] = prevaB;
			//add the other to the mustUsed
			if (!usedNode[ea.a] && !mustUseNode[ea.a])
			{
				mustUseNode[ea.a] = true;
				currMustUsedNodes.push_back(ea.a);
				mustUseNodeCounter++;
			}
			if (!usedNode[ea.b] && !mustUseNode[ea.b])
			{
				mustUseNode[ea.b] = true;
				currMustUsedNodes.push_back(ea.b);
				mustUseNodeCounter++;
			}
		}

		//restore

		for (auto ei : currRemovedEdges)
			usedEdge[ei] = false;
		for (auto cn : currMustUsedNodes)
		{
			mustUseNode[cn] = false;
			mustUseNodeCounter--;
		}

		usedEdge[bestIndex] = false;

		return currBest;
	}
};

int main(int argc, char** argv) 
{
	ios::sync_with_stdio(false);
	int T;
	cin >> T;
	forn(tc, T)
	{
		GRAPH g;
		g.read();
 		auto ans = g.calc();
 		cout << ans << endl;
	}
	return 0;
}

SOMETHING ELSE:

Participants had used an unanticipated approach to get AC in this problem. This approach utilizes weighted blossom algorithm, with varying complexity from O(m\sqrt{n}\cdot\mathrm{polylog}(n,\textit{maxweight})) to O(n^4).

Link to one of these submissions: CodeChef: Practical coding for everyone

@tmyklebu has a detailed explanation in the comments. Thanks, tmyklebu!

VIDEO EDITORIAL:

carre · November 16, 2020, 1:48pm

Although I think my solution pass just by luck, I’m happy to has been the first one to get AC on the hardest(?) problem

ratingoverflow · November 16, 2020, 2:43pm

I was so confident about having set enough traps in test data to attack any imperfect solution, until I saw your submission. : )

ratingoverflow · November 16, 2020, 2:49pm

BTW please tell me the complexity of the blossom solution if you know that, thanks I’m wondering if this approach is no worse than standard solution in all dimensions. This will be the case only if the complexity is O(Km) or the like.

carre · November 16, 2020, 3:41pm

The key was the small K. The time required to find the shortest path using Bellman-Ford is \mathcal{O}(n*m) and you need to find K of them so it’s \mathcal{O}(K*n*m).

tmyklebu · November 16, 2020, 9:32pm

There’s an O(|E| sqrt(|V|) polylog(|V|)) algorithm for max weighted nonbipartite matching if memory serves. My solution used an exponential algorithm instead because all polynomial algorithms I know for nonbipartite weighted matching are very tricky!

To reduce to weighted nonbipartite matching, notice that an edge uv can be used to cover only u, only v, or both u and v. Build a graph with 2|V| vertices and 3|E| edges, turning the edge uv into three edges, uu’ (weight w_u - w_{uv}), vv’ (weight w_v - w_{uv}), and uv (weight w_u + w_v - w_{uv}). Now find the max-weight matching in this graph with at most K edges used.

Given a max-weight perfect matching implementation, you can find the max-weight matching with at most K edges by Lagrangian relaxation and binary search, or by a simply construction adding |V|-2K new vertices each connected to the original |V| vertices. Using Lagrangian relaxation, you can get an overall complexity of O(|E| sqrt(|V|) polylog(|V|) log(max weight)). Using the construction, you get O(|V|^2.5 polylog(|V|)).