HAMTREE - Editorial

PROBLEM LINK:

Practice
Contest: Division 1
Contest: Division 2
Contest: Division 3
Contest: Division 4

Author: dragonado
Testers: mexomerf, rivalq
Editorialist: iceknight1093

DIFFICULTY:

TBD

PREREQUISITES:

DP on trees

PROBLEM:

Given a tree T on N vertices, find the minimum number of edges that need to be added to it so that the resulting graph contains a Hamiltonian path.

EXPLANATION:

Suppose we add some edges to T so that it has a Hamiltonian path.
Notice that the tree edges used in this Hamiltonian path form a set of vertex-disjoint paths in the original tree.

Let’s call a set of vertex-disjoint paths a path forest.

Of course, minimizing the number of extra edges we use is the same thing as maximizing the number of tree edges forming the complementary path forest; so it seems reasonable that we compute the maximum size of a path forest that can be obtained from the given tree.

Proof of equivalence

Let’s call a graph good if it contains a Hamiltonian path.
Our claim is that keeping the largest number of edges in a path forest is equivalent to minimizing the number of edges added to form a good graph.
Alternately, the minimum number of edges that need to be removed to obtain a path forest equals the minimum number of edges added to form a good graph; which is what we’ll prove.

Let P be one minimum (by size) set of edges that, upon removal, form a path forest.
Let H be one minimum (by size) set of edges that, upon addition to T, make it good.

Note that the path forest can be connected into a single long path (thus making the resulting graph good) using |P| additional edges (since removing |P| edges from a tree will create exactly |P|+1 connected components).
So, |H| \leq |P|.

Conversely, suppose we add the edges in H.
Consider one Hamiltonian path obtained by this addition.
Suppose there are k tree edges on this path. Then, |H| = N-1-k (because H wouldn’t be minimal otherwise).
Deleting the N-1-k tree edges that are not on this path gives us a path forest, so |S| \leq N-1-k = |H|.

This shows that |S| = |H|, and we’re done.

Now, notice that in a path forest, every vertex will have degree \leq 2. This allows us to compute the largest size of a path forest using dynamic programming.

Root the tree at vertex 1.
Then, let dp(u, d) denote the largest size of a path forest in the subtree of vertex u, such that u has degree \leq d in this forest.
Transitions are not too hard:

  • dp(u, 0) is simply the sum of dp(v, 2) across all children v of u.
  • dp(u, 1) requires us to connect u to one of its children.
    • Suppose we connect it to child c. Then, the size of the path forest is 1 + dp(c, 1) + \sum_{v\neq c} dp(v, 2), which also equals 1 + dp(c, 1) - dp(c, 2) + \sum_v dp(v, 2).
    • Maximizing this quantity is the same as maximizing dp(c, 1) - dp(c, 2), so take the maximum value across all c and we’re done.
  • dp(u, 2) requires us to connect u to two of its children.
    • Once again, it’s not hard to see that this requires us to take the two largest values of dp(c, 1) - dp(c, 2) across all c; and add them to dp(u, 0) + 2.
  • Finally because our states were defined to use degree \leq d, set dp(u, 1) = \max(dp(u, 0), dp(u, 1)) and dp(u, 2) = \max(dp(u, 2), dp(u, 1), dp(u, 0)).

TIME COMPLEXITY:

\mathcal{O}(N) per testcase.

CODE:

Setter's code (C++)
#pragma GCC optimize("O3,unroll-loops")
#pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")

#include <bits/stdc++.h>
using namespace std;

//#include <ext/pb_ds/assoc_container.hpp> //required
//#include <ext/pb_ds/tree_policy.hpp> //required
// using namespace __gnu_pbds; //required
// template <typename T> using ordered_set =  tree<T, null_type, less<T>,
// rb_tree_tag, tree_order_statistics_node_update>;

// ordered_set <int> s;
// s.find_by_order(k); returns the (k+1)th smallest element
// s.order_of_key(k); returns the number of elements in s strictly less than k

#define pb push_back
#define mp(x, y) make_pair(x, y)
#define all(x) x.begin(), x.end()
#define allr(x) x.rbegin(), x.rend()
#define leftmost_bit(x) (63 - __builtin_clzll(x))
#define rightmost_bit(x) __builtin_ctzll(x) // count trailing zeros
#define set_bits(x) __builtin_popcountll(x)
#define pow2(i) (1LL << (i))
#define is_on(x, i) ((x)&pow2(i))      // state of the ith bit in x
#define set_on(x, i) ((x) | pow2(i))   // returns integer x with ith bit on
#define set_off(x, i) ((x) & ~pow2(i)) // returns integer x with ith bit off
#define fi first
#define se second

typedef long long int ll;
typedef long double ld;

const int MOD = 1e9 + 7; // 998244353;
const int MX = 2e5 + 5;
const int INF = 1e9; // not too close to LLONG_MAX
const ld PI = acos((ld)-1);
const ld EPS = 1e-8;
const int dx[4] = {1, 0, -1, 0},
          dy[4] = {0, 1, 0, -1}; // for every grid problem!!

// hash map and operator overload from
// https://www.youtube.com/watch?v=jkfA0Ts6YBA Custom hash map
struct custom_hash {
  static uint64_t splitmix64(uint64_t x) {
    x += 0x9e3779b97f4a7c15;
    x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
    x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
    return x ^ (x >> 31);
  }

  size_t operator()(uint64_t x) const {
    static const uint64_t FIXED_RANDOM =
        chrono::steady_clock::now().time_since_epoch().count();
    return splitmix64(x + FIXED_RANDOM);
  }
};
template <typename T1, typename T2> // Key should be integer type
using safe_map = unordered_map<T1, T2, custom_hash>;

// Operator overloads
template <typename T1, typename T2> // cin >> pair<T1, T2>
istream &operator>>(istream &istream, pair<T1, T2> &p) {
  return (istream >> p.first >> p.second);
}
template <typename T1, typename T2> // cout << pair<T1, T2>
ostream &operator<<(ostream &ostream, const pair<T1, T2> &p) {
  return (ostream << p.first << " " << p.second);
}

template <typename T> // cin >> array<T, 2>
istream &operator>>(istream &istream, array<T, 2> &p) {
  return (istream >> p[0] >> p[1]);
}
template <typename T> // cout << array<T, 2>
ostream &operator<<(ostream &ostream, const array<T, 2> &p) {
  return (ostream << p[0] << " " << p[1]);
}

template <typename T> // cin >> vector<T>
istream &operator>>(istream &istream, vector<T> &v) {
  for (auto &it : v)
    cin >> it;
  return istream;
}
template <typename T> // cout << vector<T>
ostream &operator<<(ostream &ostream, const vector<T> &c) {
  for (auto &it : c)
    cout << it << " ";
  return ostream;
}
clock_t startTime;
mt19937 rng(chrono::steady_clock::now().time_since_epoch().count());
double getCurrentTime() {
  return (double)(clock() - startTime) / CLOCKS_PER_SEC;
}
string to_string(string s) { return '"' + s + '"'; }
string to_string(const char *s) { return to_string((string)s); }
string to_string(bool b) { return (b ? "true" : "false"); }
int getRandomNumber(int l, int r) {
  uniform_int_distribution<int> dist(l, r);
  return dist(rng);
}

// https://github.com/the-tourist/algo/blob/master/misc/debug.cpp
template <typename A, typename B> string to_string(pair<A, B> p) {
  return "(" + to_string(p.first) + ", " + to_string(p.second) + ")";
}
template <typename A> string to_string(A v) {
  bool first = true;
  string res = "{";
  for (const auto &x : v) {
    if (!first) {
      res += ", ";
    }
    first = false;
    res += to_string(x);
  }
  res += "}";
  return res;
}
void debug_out() { cerr << endl; }
template <typename Head, typename... Tail> void debug_out(Head H, Tail... T) {
  cerr << " " << to_string(H);
  debug_out(T...);
}

#ifdef LOCAL_DEBUG
#define debug(...) cerr << "[" << #__VA_ARGS__ << "]:", debug_out(__VA_ARGS__)
#else
#define debug(...) ;
#endif

#define int ll    // disable when you want to use atcoder library
#define endl '\n' // disable when dealing with interactive problems

typedef vector<int> vi;
typedef pair<int, int> pii;
typedef array<int, 2>
    edge; // for graphs, make it array<int,3> for weighted edges

// #include <atcoder/all>
// using namespace atcoder;

constexpr int MAXN = 2e5;
vector<vi> adj;
int root = 0;
vector<array<int, 2>> dp;
// dp[i][0] = minimum number of edges to remove in the subtree rooted at i such
// that the resulting graph is a forest of paths && the edge between i and its
// parent is removed

// dp[i][1] = minimum number of edges to remove in the subtree rooted at i such
// that the resulting graph is a forest of paths && the edge between i and its
// parent is not removed

// dp[root][1] = infinity
// Final answer = dp[root][0]
void dfs(int cur, int par) {
  // degree(cur) in the final graph is <= 2.
  // It is not optimal for degree to be 0. So degree(cur) = 0 or 1 or 2
  // in the state dp[cur][1], cur can have atmost one child.
  // in the state dp[cur][0], cur can have atmost two children.

  vi vec;
  int sum = 0;
  for (int ne : adj[cur]) {
    if (ne == par)
      continue;

    dfs(ne, cur);
    sum += dp[ne][0];
    vec.pb(dp[ne][0] - dp[ne][1]);
  }
  int d = vec.size();

  if (d == 0) { // leaf
    dp[cur][0] = dp[cur][1] = 0;
    return;
  }

  sort(all(vec));
  reverse(all(vec));

  dp[cur][0] = dp[cur][1] = d - 1 + sum - vec[0];

  if (vec.size() > 1)
    dp[cur][0] = min(dp[cur][0], d - 2 + sum - vec[0] - vec[1]);
}

void solve() {
  // code starts from here
  int N;
  cin >> N;
  adj.clear();
  adj.resize(N);
  dp.assign(N, {INF, INF});

  for (int u, v, i = 0; i < N - 1; i++) {
    cin >> u >> v;
    u--;
    v--;
    adj[u].pb(v);
    adj[v].pb(u);
  }

  dfs(0, 0);
  cout << dp[0][0] << endl;
}

signed main() {
  ios_base::sync_with_stdio(false);
  cin.tie(NULL);
  // startTime = clock();

  int T = 1;
  cin >> T;

  for (int _t = 1; _t <= T; _t++) {
    solve();
  }

  // cerr << getCurrentTime() << endl;
  return 0;
}
Tester's code (C++)
// Jai Shree Ram  
  
#include<bits/stdc++.h>
using namespace std;

#define rep(i,a,n)     for(int i=a;i<n;i++)
#define ll             long long
#define int            long long
#define pb             push_back
#define all(v)         v.begin(),v.end()
#define endl           "\n"
#define x              first
#define y              second
#define gcd(a,b)       __gcd(a,b)
#define mem1(a)        memset(a,-1,sizeof(a))
#define mem0(a)        memset(a,0,sizeof(a))
#define sz(a)          (int)a.size()
#define pii            pair<int,int>
#define hell           1000000007
#define elasped_time   1.0 * clock() / CLOCKS_PER_SEC



template<typename T1,typename T2>istream& operator>>(istream& in,pair<T1,T2> &a){in>>a.x>>a.y;return in;}
template<typename T1,typename T2>ostream& operator<<(ostream& out,pair<T1,T2> a){out<<a.x<<" "<<a.y;return out;}
template<typename T,typename T1>T maxs(T &a,T1 b){if(b>a)a=b;return a;}
template<typename T,typename T1>T mins(T &a,T1 b){if(b<a)a=b;return a;}

// -------------------- Input Checker Start --------------------
 
long long readInt(long long l, long long r, char endd)
{
    long long x = 0;
    int cnt = 0, fi = -1;
    bool is_neg = false;
    while(true)
    {
        char g = getchar();
        if(g == '-')
        {
            assert(fi == -1);
            is_neg = true;
            continue;
        }
        if('0' <= g && g <= '9')
        {
            x *= 10;
            x += g - '0';
            if(cnt == 0)
                fi = g - '0';
            cnt++;
            assert(fi != 0 || cnt == 1);
            assert(fi != 0 || is_neg == false);
            assert(!(cnt > 19 || (cnt == 19 && fi > 1)));
        }
        else if(g == endd)
        {
            if(is_neg)
                x = -x;
            if(!(l <= x && x <= r))
            {
                cerr << l << ' ' << r << ' ' << x << '\n';
                assert(false);
            }
            return x;
        }
        else
        {
            assert(false);
        }
    }
}
 
string readString(int l, int r, char endd)
{
    string ret = "";
    int cnt = 0;
    while(true)
    {
        char g = getchar();
        assert(g != -1);
        if(g == endd)
            break;
        cnt++;
        ret += g;
    }
    assert(l <= cnt && cnt <= r);
    return ret;
}
 
long long readIntSp(long long l, long long r) { return readInt(l, r, ' '); }
long long readIntLn(long long l, long long r) { return readInt(l, r, '\n'); }
string readStringLn(int l, int r) { return readString(l, r, '\n'); }
string readStringSp(int l, int r) { return readString(l, r, ' '); }
void readEOF() { assert(getchar() == EOF); }
 
vector<int> readVectorInt(int n, long long l, long long r)
{
    vector<int> a(n);
    for(int i = 0; i < n - 1; i++)
        a[i] = readIntSp(l, r);
    a[n - 1] = readIntLn(l, r);
    return a;
}
 
// -------------------- Input Checker End --------------------

const int maxn = 2e5 + 5;
int p[maxn];
int sz[maxn];
void clear(int n){
    rep(i,0,n + 1)p[i]=i,sz[i]=1;
}
int root(int x){
   while(x!=p[x]){
       p[x]=p[p[x]];
       x=p[x];
   }
   return x;  
}
void merge(int x,int y){
    int p1=root(x);
    int p2=root(y);
    if(p1==p2)return;
    if(sz[p1]>=sz[p2]){
        p[p2]=p1;
        sz[p1]+=sz[p2];
    }
    else{
        p[p1]=p2;
        sz[p2]+=sz[p1];
    }
}
int solve(){
 		int n = readIntLn(2, 2e5);
 		static int sum_n = 0;
 		sum_n += n;
 		assert(sum_n <= 2e5);
 		clear(n);
 		vector<vector<int>> g(n);
 		for(int i = 2; i <= n; i++){
 			int u = readIntSp(1,n);
 			int v = readIntLn(1,n);
 			assert(root(u) != root(v));
 			merge(u,v);
 			u--;
 			v--;
 			g[u].push_back(v);
 			g[v].push_back(u);
 		}

 		vector<vector<int>> dp(n, vector<int> (2));
 		function<void(int,int)> dfs = [&](int u,int p){
 			//0 -> available
 			//1 -> not available
 			if(p != -1 and sz(g[u]) == 1){
 				dp[u][0] = 0;
 				dp[u][1] = 1e9;
 				return;
 			}
 			vector<pii> vec;
 			int s = 0;
 			for(auto i: g[u]){
 				if(i != p){
 					dfs(i,u);
 					s += min(dp[i][0],dp[i][1]);
 					vec.push_back({dp[i][0],dp[i][1]});
 				}
 			}
 			dp[u][0] = s + sz(vec);
 			dp[u][1] = 1e9;
 			// s - min(aval,not_aval)_1 - min(aval, not-aval)_2 + aval_1 + aval_2 + sz(vec) - 2
 			int mn1 = 1e9, mn2 = 1e9;
 			for(auto [aval, not_aval]: vec){
 				int tmp = s - min(aval, not_aval) + aval + sz(vec) - 1;
 				mins(dp[u][0], tmp);
 				if(aval - min(aval,not_aval) <= mn1){
 					mn2 = mn1;
 					mn1 = aval - min(aval, not_aval);
 				}else if(aval - min(aval, not_aval) <= mn2){
 					mn2 = aval - min(aval, not_aval);
 				}
 			}
 			dp[u][1] = s + mn1 + mn2 + sz(vec) - 2;
 		};
 		dfs(0,-1);
 		cout << min(dp[0][0],dp[0][1]) << endl;
 		
 return 0;	
}
signed main(){
    ios_base::sync_with_stdio(0);cin.tie(0);cout.tie(0);
    //freopen("input.txt", "r", stdin);
    //freopen("output.txt", "w", stdout);
    #ifdef SIEVE
    sieve();
    #endif
    #ifdef NCR
    init();
    #endif
    int t = readIntLn(1,1e4);
    while(t--){
        solve();
    }
    return 0;
}
Editorialist's code (C++)
#include "bits/stdc++.h"
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());

int main()
{
	ios::sync_with_stdio(false); cin.tie(0);

	int t; cin >> t;
	while (t--) {
		int n; cin >> n;
		vector<vector<int>> adj(n);
		for (int i = 0; i < n-1; ++i) {
			int u, v; cin >> u >> v;
			adj[--u].push_back(--v);
			adj[v].push_back(u);
		}
		vector<array<int, 3>> dp(n);
		const int inf = 1e9;
		auto dfs = [&] (const auto &self, int u, int p) -> void {
			dp[u] = {0, 0, 0};
			int best1 = inf, best2 = inf;
			for (int v : adj[u]) {
				if (v == p) continue;
				self(self, v, u);

				int val = dp[v][2];
				dp[u][0] += val;
				if (dp[v][1] - val < best1) {
					best2 = best1;
					best1 = dp[v][1] - val;
				}
				else best2 = min(best2, dp[v][1] - val);
			}
			if (best1 == inf) dp[u] = {1, 1, 1};
			else {
				dp[u][1] = dp[u][0] + best1;
				if (best2 == inf) dp[u][2] = dp[u][1];
				else dp[u][2] = dp[u][0] + best1 + best2 - 1;
			}
			++dp[u][0];
			dp[u][2] = min({dp[u][0], dp[u][1], dp[u][2]});
			dp[u][1] = min(dp[u][0], dp[u][1]);
		};
		dfs(dfs, 0, 0);
		cout << dp[0][2]-1 << '\n';
	}
}
2 Likes

Let’s call a set of vertex-disjoint paths a path forest .

FYI, it is more commonly referred to as a (vertex-disjoint) path cover, at least for the directed graphs.

Now, notice that in a path forest, every vertex will have degree \leq 2.

While it is undoubtedly true, a simpler way to say this is “every vertex is either an endpoint of some path or not”, merging the cases d = 0 and d = 1 into one.

Oh, I wasn’t aware that path covers allowed for empty paths, which is why I chose to use the word forest instead.
In retrospect, it makes sense - you do want every graph to have a path cover.

1 Like

There is a greedy way also. According to the observations made in the editorial we need to find the maximum number of edges for which degree of each vertex is less than or equal to 2.

For a node v lets say it has c children and tc be the number of children with degree 2. As mentioned in the editorial we won’t have degree>2 in the subtree.

  • If c > 2 We have to delete at least c-2 edges or else the degree of v would be greater than 2. We will greedily delete those edges which are connected with children of degree 2. Because if we connected them then the degree of child will become 3.

  • Else c<=2 we don’t have to delete edges to children. But we will have to delete the edges if tc>0

  • degree[v] = min(2, c-tc) and total number of edges in this subtree is the summation of all edges in children + degree[v]

Here is my code Solution

Do I understand correctly that this originates from the observation that dp(u, 2) \le dp(u, 1) \le dp(u, 2) + 1? I thought about it but coded a straightforward dp instead because I was running out of time.