# MIN_UGLY - Editorial

Author: aryan12
Tester: udhav2003
Editorialist: iceknight1093

2714

# PREREQUISITES:

Computing distances in a tree

# PROBLEM:

You’re given a tree with N vertices. Answer Q queries on it:

• Given a subset S of vertices of the tree, define the ugliness of a vertex x of the tree to be \max(\text{dist}(x, y)) across all y \in S.
Find the minimum ugliness across all vertices.

# EXPLANATION:

Let’s try to solve a simpler version of the problem first: what if the subset S consisted of all N vertices?

Then, we want to find a vertex that isn’t too far away from all the vertices of the tree.
In particular, if there’s a path of length d in the tree, then any vertex is at a distance of at least \left\lceil \frac{d}{2} \right\rceil away from one of the endpoints of this path.

So, let D be the length of the longest path in the tree (i.e, its diameter).
Then, the answer is at least \left\lceil \frac{D}{2} \right\rceil.
It’s not hard to see that the answer is indeed exactly this: choose the midpoint of the diameter to attain it (if there are two midpoints, choose either one).

This idea can be extended to an arbitrary subset S of vertices.

If there’s a path of length d between two vertices of S, once again we see that the answer is at least \left\lceil \frac{d}{2} \right\rceil.
Yet again, if D is the longest path between two vertices of S, the answer will be exactly \left\lceil \frac{D}{2} \right\rceil by choosing the midpoint of this path.

So, our problem turns into the following:
Given a subset S of vertices of the tree, find the longest path whose endpoints lie in S.

To compute this quickly, recall one of the methods to compute the diameter of a tree:

• Fix an arbitrary vertex x.
• Find the furthest vertex from x; let it be d_1.
• Find the further vertex from d_1; let it be d_2.
• d_1 and d_2 are then endpoints of a diameter of the tree.

This algorithm adapts itself nicely to our version:

• Fix an arbitrary vertex x \in S.
• Find the farthest vertex from x that lies in S; say d_1.
• Find the farthest vertex from d_1 that lies in S; say d_2.
• d_1 and d_2 are the endpoints of one longest path, so D equals the distance between them.

The problem now is performing the second and third step quickly.
Of course, we can run a full BFS/DFS each time to compute distances to all vertices; however that would take \mathcal{O}(N) time and definitely won’t be fast enough to answer multiple queries.

Instead, note that with a bit of precomputation, we can calculate the distances between any two nodes of a tree in \mathcal{O}(\log N) or \mathcal{O}(1) as follows:

• Root the tree at some node, say 1. Then, using a BFS/DFS calculate d_u — the distance of node u from 1 — for every node u.
• Then, we have \text{dist}(x, y) = d_x + d_y - d_l, where l = LCA(x, y) is the lowest common ancestor of x and y.
l can be found in \mathcal{O}(\log N) time using binary lifting, for example.

This allows us to transform our algorithm to \mathcal{O}(|S|\log N) per query from \mathcal{O}(N), since in each step we only compute distances from a vertex of S to all the other vertices of S.

Since the sum of |S| across all queries is bounded by 4\cdot 10^5, this is good enough.

# TIME COMPLEXITY

\mathcal{O}((N+Q)\log N) per test case.

# CODE:

Author's code (C++)
#include <bits/stdc++.h>
using namespace std;
#define int long long

const int INF = 1e18, N = 3e5 + 5;
vector<int> g[N];
int dp[N];
int depth[N];

void dfs(int node, int par, int dep)
{
depth[node] = dep;
dp[node] = par;
for(int to: g[node])
{
if(to == par)
{
continue;
}
dfs(to, node, dep + 1);
}
}

int dist(int x, int y)
{
if(depth[x] > depth[y])
{
swap(x, y);
}
int diff = depth[y] - depth[x];
int ans = 0;
for(int i = 19; i >= 0; i--)
{
if((1 << i) & diff)
{
y = dp[i][y];
ans += (1 << i);
}
}
if(x == y)
{
return ans;
}
for(int i = 19; i >= 0; i--)
{
if(dp[i][x] != dp[i][y])
{
x = dp[i][x];
y = dp[i][y];
ans += 2 * (1 << i);
}
}
return ans + 2;
}

void Solve()
{
int n, q;
cin >> n >> q;
for(int i = 1; i <= n; i++)
{
g[i].clear();
}
for(int i = 1; i < n; i++)
{
int u, v;
cin >> u >> v;
g[u].push_back(v);
g[v].push_back(u);
}
dfs(1, -1, 0);
for(int i = 1; i < 20; i++)
{
for(int j = 1; j <= n; j++)
{
dp[i][j] = (dp[i - 1][j] == -1) ? dp[i - 1][j] : dp[i - 1][dp[i - 1][j]];
}
}
for(int i = 1; i <= q; i++)
{
// cout << "query number: " << i << "\n";
int k;
cin >> k;
if(k == 1)
{
int u;
cin >> u;
cout << "0\n";
continue;
}
int u, v;
cin >> u >> v;
int max_dist = dist(u, v);
for(int j = 3; j <= k; j++)
{
int x;
cin >> x;
int diam1 = dist(x, u);
int diam2 = dist(x, v);
if(diam1 > max_dist && diam1 >= diam2)
{
v = x;
max_dist = diam1;
}
else if(diam2 > max_dist && diam2 >= diam1)
{
u = x;
max_dist = diam2;
}
}
cout << (max_dist + 1) / 2 << "\n";
}
}

int32_t main()
{
auto begin = std::chrono::high_resolution_clock::now();
ios_base::sync_with_stdio(0);
cin.tie(0);
int t = 1;
cin >> t;
for(int i = 1; i <= t; i++)
{
//cout << "Case #" << i << ": ";
Solve();
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin);
cerr << "Time measured: " << elapsed.count() * 1e-9 << " seconds.\n";
return 0;
}

Tester's code (C++)
#pragma GCC optimisation("O3")
#pragma GCC target("avx,avx2,fma")
#pragma GCC optimize("Ofast,unroll-loops")
#include<bits/stdc++.h>
using namespace std;
typedef unsigned long long ull;
typedef long long ll;
#define NUM1 1000000007LL
#define all(a) a.begin(), a.end()
#define beg(a) a.begin(), a.begin()
#define sq(a) ((a)*(a))
#define NUM2 998244353LL
#define MOD NUM2
#define LMOD 1000000006LL
#define fi first
#define se second
typedef long double ld;
const ll MAX = 200010;
const ll MAX2 = MAX;
const ll large = 1e18;

void dfs(vector<ll> adj[], ll st, ll p, vector<ll>& prt, vector<ll>& dist, ll dval)
{
dist[st] = dval;
prt[st] = p;
if(x != p){
dfs(adj, x, st, prt, dist, dval + 1);
}
}
}

signed main(){
ios_base::sync_with_stdio(0);
cin.tie(0);
cout.tie(0);
ll t;
cin >> t;
while(t--){
ll n, q;
cin >> n >> q;
for(ll i = 0; i < n - 1; i++){
ll u, v;
cin >> u >> v;
u -= 1; v -= 1;
}
vector<ll> prt(n), dist(n);
dfs(adj, 0, 0, prt, dist, 0);
vector<vector<ll>> up(n, vector<ll>(20));
for(ll i = 0; i < n; i++) up[i] = prt[i];
for(ll j = 1; j < 20; j++){
for(ll i = 0; i < n; i++){
up[i][j] = up[up[i][j - 1]][j - 1];
}
}
auto jump = [&](ll v, ll val){
for(ll j = 19; j >= 0; j--){
if(val >= (1 << j)){
val -= (1 << j);
v = up[v][j];
}
}
assert(val == 0);
return v;
};
auto lca = [&](ll a, ll b){
if(dist[a] < dist[b]) swap(a, b);
a = jump(a, dist[a] - dist[b]);
if(a == b) return a;
for(ll j = 19; j >= 0; j--){
if(up[a][j] != up[b][j]){
a = up[a][j];
b = up[b][j];
}
}
return prt[a];
};
auto dis = [&](ll a, ll b){
return dist[a] + dist[b] - 2*dist[lca(a, b)];
};
while(q--){
// cerr << "ehre\n";
ll k;
cin >> k;
vector<ll> v(k);
for(auto& x: v){
cin >> x;
x--;
}
if(k == 1){
cout << 0 << '\n';
continue;
}
vector<ll> e(2);
e = v; e = v;
ll dval = dis(e, e);
for(ll i = 2; i < k; i++){
// cerr << i << '\n';
ll p0 = dis(v[i], e);
ll p1 = dis(v[i], e);
// cerr << p0 << ' ' << p1 << '\n';
if(p1 > p0){
if(p1 > dval){
dval = p1;
e = v[i];
}
}
else{
if(p0 > dval){
dval = p0;
e = v[i];
}
}
}
cout << (1 + dval)/2 << '\n';
}
}
return 0;
}

Editorialist's code (C++)
// #pragma GCC optimize("O3,unroll-loops")
// #pragma GCC target("avx2,bmi,bmi2,lzcnt,popcnt")
#include "bits/stdc++.h"
using namespace std;
using ll = long long int;
mt19937_64 rng(chrono::high_resolution_clock::now().time_since_epoch().count());

template<class T>
struct RMQ {
vector<vector<T>> jmp;
RMQ(const vector<T>& V) : jmp(1, V) {
for (int pw = 1, k = 1; pw * 2 <= (int)size(V); pw *= 2, ++k) {
jmp.emplace_back(size(V) - pw * 2 + 1);
for (int j = 0; j < (int)size(jmp[k]); ++j)
jmp[k][j] = min(jmp[k - 1][j], jmp[k - 1][j + pw]);
}
}
T query(int a, int b) {
assert(a < b); // or return inf if a == b
int dep = 31 - __builtin_clz(b - a);
return min(jmp[dep][a], jmp[dep][b - (1 << dep)]);
}
};

struct LCA {
int T = 0;
vector<int> time, out, dep, path, ret;
RMQ<int> rmq;

LCA(auto& C) : time(size(C)), out(size(C)), dep(size(C)), rmq((dfs(C,0,-1), ret)) {}
void dfs(auto& C, int v, int par) {
time[v] = T++;
for (int y : C[v]) if (y != par) {
path.push_back(v), ret.push_back(time[v]);
dep[y] = 1 + dep[v];
dfs(C, y, v);
}
out[v] = T;
}

int lca(int a, int b) {
if (a == b) return a;
tie(a, b) = minmax(time[a], time[b]);
return path[rmq.query(a, b)];
}
};

int main()
{
ios::sync_with_stdio(false); cin.tie(0);

int t; cin >> t;
while (t--) {
int n, q; cin >> n >> q;
for (int i = 0; i < n-1; ++i) {
int u, v; cin >> u >> v;
}

vector active(n, basic_string<array<int, 2>> ());
const int inf = 1e9;
vector<int> dist(n, inf);

auto farthest = [&] (int src) {
basic_string<int> used;
dist[src] = 0;
queue<int> q; q.push(src);
int far = src;
while (!q.empty()) {
int u = q.front(); q.pop();
used.push_back(u);
if (dist[u] > dist[far]) far = u;
for (auto [v, w] : active[u]) {
if (dist[v] <= w + dist[u]) continue;
dist[v] = w + dist[u];
q.push(v);
}
}
int mxd = dist[far];
for (auto x : used) dist[x] = inf;
return array{far, mxd};
};

auto solve = [&] (auto &vertices) {
// Build virtual tree of vertices
sort(begin(vertices), end(vertices), [&] (int u, int v) {return L.time[u] < L.time[v];});
int k = size(vertices);
for (int i = 0; i+1 < k; ++i) vertices.push_back(L.lca(vertices[i], vertices[i+1]));
sort(begin(vertices), end(vertices), [&] (int u, int v) {return L.time[u] < L.time[v];});
vertices.erase(unique(begin(vertices), end(vertices)), end(vertices));

stack<int> st;
for (int x : vertices) {
while (!st.empty()) {
int u = st.top();
if (L.out[u] >= L.out[x] and u != x) break;
st.pop();
}
if (!st.empty()) {
int u = st.top(); // u is the parent of x in this virtual tree
active[u].push_back({x, L.dep[x] - L.dep[u]});
active[x].push_back({u, L.dep[x] - L.dep[u]});
}
st.push(x);
}

int u = farthest(vertices);
auto [v, diam] = farthest(u);
cout << (diam + 1)/2 << '\n';

for (int x : vertices) active[x].clear();
};

while (q--) {
int k; cin >> k;
basic_string<int> vertices;
for (int i = 0; i < k; ++i) {
int x; cin >> x;
vertices.push_back(--x);
}
solve(vertices);
}
}
}

2 Likes

@ iceknight1093
I have also tried to solve the problem using binary lifting in contest but code gave tle
Can you give some hint why my code if failing

#include <bits/stdc++.h>
using namespace std;
#define int long long
const int inf = LLONG_MAX;
#define pii pair<int , int>
const int mod = 1000000007 ;
vector<int>depth ;
void precompute(int source , int parent){
up[source] = parent ;
for(int i = 1 ; i <= 20 ; i++){
if(up[source][i-1] == -1){
up[source][i] = - 1 ;
}else{
up[source][i] = up[up[source][i-1]][i-1];
}
}
if(it != parent){
depth[it] = depth[source] + 1 ;
precompute(it , source);
}
}
}
int query(int node , int k){
if(node == -1 || k == 0)
return node ;
for(int i = 20 ; i >= 0 ; i--){
if(k >= (1<<i)){
return query(up[node][i] , k - (1<<i));
}
}
return -1 ;
}
int lca(vector<int>&first){
int mn = INT_MAX ;
for(auto it : first)
mn = min(mn , depth[it]);
for(int i = 0 ; i < first.size() ; i++){
if(depth[first[i]] == mn)continue;
first[i] = query(first[i] , depth[first[i]] - mn);
}
bool x = false ;
int y = first;
for(auto it : first){
if(it != y)
x = true ;
}
if(!x)
return first;
for(int i = 20 ; i >= 0 ; i--){
vector<int>curr ;
for(int it : first){
curr.push_back(up[it][i]);
}
x = false ;
y = curr;
for(int it : curr){
if(y != it)
x = true;
}
if(x == true){
first = curr ;
}
}
return up[first];
}
void solve(){
int n , q ;
cin>>n>>q;
up.resize(n + 1 , vector<int>(22));  depth.resize(n + 1);
for(int i = 0 ; i < n - 1 ; i++){
int a , b ; cin>>a>>b;
}
precompute(1 , -1);
for(int i = 0 ; i < q ; i++){
int a ; cin>>a;
vector<int>first ;
for(int i = 0 ; i < a ; i++){
int x ; cin>>x; first.push_back(x);
}
vector<int>k = first ;
if(a == 1){
cout<<0<<endl;
}
else{
int lcaa = lca(first);
int res = 0 ;
for(auto it : k)
res = max(res , depth[it] - depth[lcaa]);

cout<<res<<endl;
}
}
}
void init_code(){
ios_base::sync_with_stdio(false);
cin.tie(NULL);
#ifndef ONLINE_JUDGE

freopen("input.txt", "r", stdin);

freopen("output.txt", "w", stdout);

#endif // ONLINE_JUDGE
}
signed main(){
init_code();
int t; cin>>t;
while(t--){
solve();
}
}


Can I someone tell me why my code is failing?
Approach →

1. Precomputing Binary Uplifting using DFS.
2. For Each query, identifying one node of longest diameter from given set
3. Calculating distance from above node to all nodes present in set, answer is (maximum distance / 2).

It is failing on 4/8 testcases Source for binary lifting: Lowest Common Ancestor - Binary Lifting (cp-algorithms.com)

#include <bits/stdc++.h>
using namespace std;

// @author: stash ¯\_(ツ)_/¯
#define FAST cin.tie(0), cout.tie(0),ios_base::sync_with_stdio(false);
#define int long long
#define debug(x) cerr << "[" <<#x << " = " << x << "]" << endl
#define debug2(x, y) cerr << "[" <<#x << " = " << x << "]" << ", [" <<#y << " = " << y << "]" << endl
#define all(x) (x).begin(),(x).end()
#define endl "\n"

int n,timer;
vector<vector<int>> up,tree;
vector<int> tin,tout,dis;

void dfs(int node,int par,int cost)
{
tin[node] = ++timer;
dis[node] = cost;
// debug2(node,par);
up[node] = par;
for(int i=1;i<32;i++)
{
up[node][i] = up[up[node][i-1]][i-1];
}
for(auto x:tree[node])
{
if(x!=par)
dfs(x,node,cost+1);
}

tout[node] = ++timer;
}

bool isAnce(int u,int v)
{
return (tin[u] <=tin[v] && tout[u]>=tout[v]);
}

int lca(int u,int v)
{
if(isAnce(u,v))
return u;
if(isAnce(v,u))
return v;
for(int i=32-1;i>=0;i--)
{
if(!isAnce(up[u][i],v))
{
u = up[u][i];
}
}
return up[u];
}

int32_t main()
{
FAST
int tt;cin >> tt;
while(tt--)
{
int temp ,q;
cin >> temp >> q;
n = temp;
timer = 0;

tree.clear();
tin.clear();
tout.clear();
dis.clear();

dis.resize(n);
tin.resize(n);
tout.resize(n);
tree.resize(n);

up.clear();
up.resize(n,vector<int>(32));

for(int i=1;i<=n-1;i++)
{
int x,y;
cin >> x >> y;
x--,y--;
tree[x].push_back(y);
tree[y].push_back(x);
}

dfs(1,1,1);

while(q--)
{
int size;
cin >> size;
vector<int> vec(size);
for(int i=0;i<size;i++)
{
cin >> vec[i];
vec[i]--;
}

if(size==1)
{
cout << 0 << endl;
continue;
}
int req_node = vec,maxi = 0;
for(int i=1;i<size;i++)
{
int ancestor = lca(vec,vec[i]);
int cost = dis[vec] + dis[vec[i]] - (2*dis[ancestor]);
if(cost > maxi)
{
cost = maxi;
req_node = vec[i];
}
}
// cout << req_node << " " << maxi << endl;

for(int i=0;i<size;i++)
{
int ancestor = lca(vec[i],req_node);
int cost = dis[req_node] + dis[vec[i]] - (2*dis[ancestor]);
maxi = max(cost,maxi);
// cout << cost << " " << maxi << endl;
}
cout << ceil(maxi/(double)2ll) << endl;
}

}
}



I am a bit lost.
Writing the solution in Rust (I was sure to use the same version, 1.56), I can simulate the largest inputs locally, and some edge case scenarios and all run extremely fast (ignoring if it is right or wrong atm).
When I submit it, I get RE (other). Fair, something is wrong, but I can’t find it.

If someone can give me a hint, I would love it.
(still learning rust, then, it might be a rookie mistake, good for me, I learn something valuable)

@robostac I saw that you solved it (got your input reader, sorry). Any hint?

use std::io::prelude::*;
use std::vec;

pub struct Scanner<B> {
buf_str: Vec<u8>,
buf_iter: std::str::SplitWhitespace<'static>,
}
pub fn new(reader: B) -> Self {
Self {
buf_str: Vec::new(),
buf_iter: "".split_whitespace(),
}
}
pub fn next_item<T: std::str::FromStr>(&mut self) -> T {
loop {
if let Some(token) = self.buf_iter.next() {
}
self.buf_str.clear();
self.buf_iter = unsafe {
let slice = std::str::from_utf8_unchecked(&self.buf_str);
std::mem::transmute(slice.split_whitespace())
}
}
}

pub fn next_split<T: std::str::FromStr>(&mut self) -> Vec<T> {
loop {
let mut v = Vec::new();
for token in self.buf_iter.by_ref() {
v.push(token.parse().ok().expect("Failed parse"));
}
if !v.is_empty() {
return v;
}

self.buf_str.clear();
self.buf_iter = unsafe {
let slice = std::str::from_utf8_unchecked(&self.buf_str);
std::mem::transmute(slice.split_whitespace())
}
}
}
}

#[derive(Debug, Clone)]
struct Node {
_id: usize,
level: i32,
parent: Option<usize>,
children: Vec<usize>,
distances: Vec<Option<usize>>,
ancestors: Vec<usize>,
}

impl Node {
fn new(id: usize, n_nodes: usize) -> Self {
let mut distances = vec![None; n_nodes];
distances[id] = Some(0);
Self {
_id: id,
level: 0,
parent: None,
children: Vec::new(),
distances,
ancestors: Vec::new(),
}
}
}

#[derive(Debug, Default)]
struct Graph {
nodes: Vec<Node>,
}

impl Graph {
fn new(n_nodes: usize) -> Self {
let nodes = (0..n_nodes)
.map(|x| Node::new(x, n_nodes))
.collect::<Vec<Node>>();
Self { nodes }
}

fn distance(&mut self, x: usize, y: usize) -> usize {
match self.nodes[x].distances[y] {
Some(d) => d,
None => {
let ancestors_x = &self.nodes[x].ancestors;
let ancestors_y = &self.nodes[y].ancestors;
let mut idx = 0;
while (idx < ancestors_x.len().min(ancestors_y.len()))
&& (ancestors_x[idx] == ancestors_y[idx])
{
idx += 1;
}
let root = ancestors_x[idx - 1];
let d = (self.nodes[x].level + self.nodes[y].level - 2 * self.nodes[root].level)
as usize;
self.nodes[x].distances[y] = Some(d);
self.nodes[y].distances[x] = Some(d);
d
}
}
}
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
let stdout = io::stdout();
let lock = stdout.lock();
let mut ow = io::BufWriter::new(lock);

let stdin = io::stdin();
let mut scan = Scanner::new(stdin.lock());

let n_tests: usize = scan.next_item();
for _ in 0..n_tests {
let n_nodes: usize = scan.next_item();
let n_queries: usize = scan.next_item();
let mut graph = Graph::new(n_nodes);
let mut input_graph: Vec<(usize, usize)> = Vec::new();
for _ in 0..n_nodes - 1 {
let inputs = scan.next_split::<usize>();
input_graph.push((inputs - 1, inputs - 1));
graph.nodes[inputs - 1].children.push(inputs - 1);
graph.nodes[inputs - 1].parent = Some(inputs - 1);
}
let mut max_level = 0;
let mut next_layer = vec!;
graph.nodes.level = 0;
graph.nodes.ancestors.push(0);

while !next_layer.is_empty() {
let mut children = Vec::new();
next_layer.iter().for_each(|x| {
let x_children = graph.nodes[*x].children.clone();
let parent_level = graph.nodes[*x].level;
let parent_ancestors = graph.nodes[*x].ancestors.clone();
children.extend(x_children.iter());
for y in x_children.iter() {
let child_node = &mut graph.nodes[*y];
child_node.level = parent_level + 1;
child_node.ancestors = parent_ancestors.clone();
child_node.ancestors.push(*y);
max_level = max_level.max(parent_level + 1);
}
});
children.dedup();
next_layer = children;
}

// println!("graph built");

for _ in 0..n_queries {
let set_size = scan.next_item::<usize>();
let mut query_set = Vec::new();
let mut x = 0;
for _ in 0..set_size {
let input = scan.next_item::<usize>() - 1;
query_set.push(input);
x = if graph.nodes[input].level > graph.nodes[x].level {
input
} else {
x
};
}
// println!("running query of size {}", set_size);
if set_size == 1 {
println!("{}", 0);
continue;
}
let (_y, d2) = query_set
.iter()
.map(|z| (*z, graph.distance(x, *z)))
.max_by_key(|(_, d)| *d)
.unwrap();
writeln!(ow, "{}", (d2 + 1) / 2)?;
}
}
Ok(())
}


It looks like you assume the inputs are directed edges (from parent to child). Flipping the direction of one them in the input should give your runtime error.

You also will run out of memory at some point (consider a tree that is a single path - storing the ancestors of every node is n^2 memory).

I’ve got no problem with you copying my input / outputs, but one thing to watch out for is the output either needs to all go to the BufWriter (via writeln) or directly to stdout (via println). The bufwriter will buffer the output until the end (or the buffer gets full) but any calls to println immediately get written (which is why I don’t use it as it can be slow if you are printing a lot of lines). This will cause the order to be wrong if you mix them.

Thanks!
The assumption of direction was a facepalm realization!!! Now I can deal with it (modified part below, it was pretty easy at least).

Still getting RE (other), I assume it’s a memory issue, if not, I’ll figure it out.

        let mut graph = Graph::new(n_nodes);
let mut edges: Vec<(usize, usize)> = Vec::new();
for _ in 0..n_nodes - 1 {
let inputs = scan.next_split::<usize>();
edges.push((inputs - 1, inputs - 1));
}
let mut max_level = 0;
let mut next_layer = vec!;
graph.nodes.level = 0;
graph.nodes.ancestors.push(0);

while !next_layer.is_empty() {
let mut children = Vec::new();
next_layer.iter().for_each(|x| {
let x_fold = edges
.iter()
.fold((Vec::new(), Vec::new()), |mut acc, (a, b)| {
if *a == *x {
acc.0.push(*b);
} else if *b == *x {
acc.0.push(*a);
} else {
acc.1.push((*a, *b));
}
acc
});
let x_children = &x_fold.0;
edges = x_fold.1;
let parent_level = graph.nodes[*x].level;
let parent_ancestors = graph.nodes[*x].ancestors.clone();
children.extend(x_children.iter());
for y in x_children.iter() {
let child_node = &mut graph.nodes[*y];
child_node.parent = Some(*x);
child_node.level = parent_level + 1;
child_node.ancestors = parent_ancestors.clone();
child_node.ancestors.push(*y);
max_level = max_level.max(parent_level + 1);
}
});
children.dedup();
next_layer = children;
}


I improved so much that, according to the profiler, 55% of my runtime in extreme scenarios (10e5 nodes, 10e5 queries) is used to read the input, 11% to prepare the graph, and only 8% to compute the output. I still don’t know if it is correct :).

Unfortunately, the test result is TLE. Since now I would need to tweak the reading part, it gets to aspects more complex than what I am prepared to deal with.

Thanks for the help @robostac , it allowed me to learn a lot. I got the code more than 100000x faster compared to my first submission.

Here is the code I consider good enough.

// use std::fs::File;
use std::io::prelude::*;
use std::vec;

// const INPUT_FILE: &str = "inputs10.txt";

pub struct Scanner<B> {
buf_str: Vec<u8>,
buf_iter: std::str::SplitWhitespace<'static>,
}
pub fn new(reader: B) -> Self {
Self {
buf_str: Vec::new(),
buf_iter: "".split_whitespace(),
}
}
pub fn next_item<T: std::str::FromStr>(&mut self) -> T {
loop {
if let Some(token) = self.buf_iter.next() {
}
self.buf_str.clear();
self.buf_iter = unsafe {
let slice = std::str::from_utf8_unchecked(&self.buf_str);
std::mem::transmute(slice.split_whitespace())
}
}
}

pub fn next_split<T: std::str::FromStr>(&mut self) -> Vec<T> {
loop {
let mut v = Vec::new();
for token in self.buf_iter.by_ref() {
v.push(token.parse().ok().expect("Failed parse"));
}
if !v.is_empty() {
return v;
}

self.buf_str.clear();
self.buf_iter = unsafe {
let slice = std::str::from_utf8_unchecked(&self.buf_str);
std::mem::transmute(slice.split_whitespace())
}
}
}
}

#[derive(Debug, Clone)]
struct Node {
level: i32,
parent: Option<usize>,
}

impl Node {
fn new() -> Self {
Self {
level: 0,
parent: None,
}
}
}

#[derive(Debug, Default)]
struct Graph {
nodes: Vec<Node>,
}

impl Graph {
fn new(n_nodes: usize) -> Self {
let nodes = (0..n_nodes).map(|_| Node::new()).collect::<Vec<Node>>();
Self { nodes }
}

fn build_graph(n_nodes: usize, edges: &mut Vec<(usize, usize)>) -> Self {
let mut next_layer = vec!;
let mut graph = Self::new(n_nodes);
graph.nodes.level = 0;

while !next_layer.is_empty() {
let mut children = Vec::new();
next_layer.iter().for_each(|x| {
graph._build_children(edges, x, &mut children);
});
children.dedup();
next_layer = children;
}
graph
}

fn _build_children(
&mut self,
edges: &mut Vec<(usize, usize)>,
x: &usize,
children: &mut Vec<usize>,
) {
let x_fold = edges
.iter()
.fold((Vec::new(), Vec::new()), |mut acc, (a, b)| {
if *a == *x {
acc.0.push(*b);
} else if *b == *x {
acc.0.push(*a);
} else {
acc.1.push((*a, *b));
}
acc
});
let x_children = &x_fold.0;
*edges = x_fold.1;
let parent_level = self.nodes[*x].level;
children.extend(x_children.iter());
for y in x_children.iter() {
let child_node = &mut self.nodes[*y];
child_node.parent = Some(*x);
child_node.level = parent_level + 1;
}
}

fn distance(&mut self, x: &usize, y: &usize, x_ancestors: &[usize]) -> (i32, usize) {
if x == y {
return (self.nodes[*x].level, 0);
}

let mut root = *y;
let mut root_level = self.nodes[root].level;
while let Some(parent) = self.nodes[root].parent {
if (x_ancestors[root_level as usize] == root) || (root == 0) {
break;
}

root = parent;
root_level = self.nodes[root].level;
}
(
root_level,
(self.nodes[*x].level + self.nodes[*y].level - 2 * self.nodes[root].level) as usize,
)
}
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
let stdout = io::stdout();
let lock = stdout.lock();
let mut ow = io::BufWriter::new(lock);

// let file_input = File::open(INPUT_FILE).unwrap();
// let mut scan = Scanner::new(BufReader::new(file_input));
let stdin = io::stdin();
let mut scan = Scanner::new(stdin.lock());

let n_tests: usize = scan.next_item();
for _ in 0..n_tests {
let n_nodes: usize = scan.next_item();
let n_queries: usize = scan.next_item();
let mut edges = read_edges(&mut scan, n_nodes);
let mut graph = Graph::build_graph(n_nodes, &mut edges);

// println!("graph built");

for _ in 0..n_queries {
let set_size = scan.next_item::<usize>();
let mut query_set = vec![0; set_size];
let mut x = 0;
for item in query_set.iter_mut() {
let input = scan.next_item::<usize>() - 1;
*item = input;
x = if graph.nodes[input].level > graph.nodes[x].level {
input
} else {
x
};
}
// println!("running query of size {}", set_size);
if set_size == 1 {
writeln!(ow, "{}", 0)?;
continue;
}
let mut ancestors_x = vec![0; (graph.nodes[x].level + 1) as usize];
ancestors_x[graph.nodes[x].level as usize] = x;
let mut parent = graph.nodes[x].parent;
while parent.is_some() {
ancestors_x[graph.nodes[parent.unwrap()].level as usize] = parent.unwrap();
parent = graph.nodes[parent.unwrap()].parent;
}

let mut d2 = 0;
let mut min_level = 0;
for y in query_set.iter() {
if graph.nodes[*y].level <= min_level {
continue;
}
let (root_level, d) = graph.distance(&x, y, &ancestors_x);
if d > d2 {
d2 = d;
min_level = root_level;
}
}
writeln!(ow, "{}", (d2 + 1) / 2)?;
}
}
Ok(())
}

let mut edges: Vec<(usize, usize)> = Vec::new();
for _ in 0..n_nodes - 1 {
let inputs = scan.next_split::<usize>();
edges.push((inputs - 1, inputs - 1));
}
edges
}



I don’t really know Rust syntax so I apologize if I’m wrong, but it seems like even though you aren’t explicitly storing them anymore, your code still iterates over all ancestors of a vertex when finding the distance between two nodes.

This is definitely too slow for this problem, since you could visit \mathcal{O}(N) vertices each time you call distance.
To improve this: notice that what you’re looking for is the lowest common ancestor (simply, LCA) of x and y. There are several ways to calculate this quickly; for example see the “binary lifting” link in the editorial (that website details a few other methods too).

I do not disagree with you, it definitely can be improved, and the contest is feasible.

As an experiment, I modified the code with a break in the query_set loop to stop immediately. Still, I get TLE. I tried several (at my skill level) optimizations, pre-allocate the vector, reusing memory, etc. I always get TLE. If the solution is for me to copy a code that I can’t understand, then there is no point in keeping trying :). I just moved on to other problems where I can still learn, eventually, I’ll be good enough to come back.

Ah, yeah so in this case the main issue is that your time complexity is too high (\mathcal{O}(N\cdot \sum K_i)), and so (as long as the testcases are good) no amount of memory/cache optimization or other ‘minor’ optimizations will help at all; the only hope is to change and improve the algorithm/logic as a whole.

The idea behind computing LCA is not too hard to understand (in my opinion); and certainly is widely-used. If you stick with competitive programming, I wager you’ll have to learn it sooner rather than later As an aside: a reasonable rule of thumb is that if your program performs approximately 10^8 operations, it’ll most likely run in within a second. Slightly higher might work too, if you’re lucky/operations are simple enough; however if you’re orders of magnitude off then there’s no point in even trying.
Here for example N \cdot \sum K_i is quite close to 10^{11}, so you see why it’s so slow.

Apologies, I was not clear.
What I mean is:

Just reading the input and not doing any processing gives me TLE.

With that, first, I need a faster IO method that gives me enough runtime to process. This is why I would have to dig deeper in memory management, swapping, reusing, etc.