CHEFJR - Editorial

challenge
editorial
feb16
greedy

#1

PROBLEM LINK:

Practice
Contest

Author: Dmytro Berezin
Tester: Istvan Nagy
Editorialist: Xellos

DIFFICULTY:

Challenge

PREREQUISITES:

Greedy, directed acyclic graphs

PROBLEM:

This is an approximation problem. You’re given time intervals and books with rating and reading length and (partial) reading order in the form of a DAG. Assign books to read to time intervals so that the sum of their ratings is maximised and no two books are read at the same time. There are also two types of books - “type 1” books have to be read in a continuous time interval, “type 2” books do not.

SHORT EXPLANATION:

Greedy. Try taking the books from the “best” one, where “best” for a book should reflect also books which have to be read after it.

EXPLANATION:

simple version

There’s a simplified, yet important version of this problem: there are no dependencies and all the books are of type 2. We’re trying to find a set of books with the maximum sum of ratings and with the sum of reading times \le to the sum of break times.

This is the basic knapsack problem and it has a well-known solution. We’re not interested in the exact solution, though - it’d be much faster to take books greedily by decreasing ratio of rating and reading time (number of pages); this would probably lead to a decent approximation of the optimal solution.

full problem - first attempts

In order to solve the whole problem, we need to deal with dependencies and type 1 books. The latter is easier, we can just check if there are any rest intervals in which we can fit the current book and if there are, put that book into the one with the shortest free (unassigned for reading) time remaining. That’s because this way, we’re more likely to have time left in the other rest intervals for larger books later on. Note that with dependencies, we need to look at the previously read books to know which rest intervals this book can be fit into. For a book of type 2, we can assign as much of it to be read in the first possible interval, as much of the remainder to the next interval etc.

The simplest way to deal with dependencies is choosing greedily (based on the rating/pages ratio) only from the currently available books. When we choose a book,we can check its children and if some of them don’t have any unread books which they’d be dependent on, make them available. Keeping the available books in a priority queue, the time for checking if there’s a reading interval that’s free is O(\mathrm{maxR}\cdot \log{N}) (we can group the rest intervals in multiset<>s for all possible lengths) and the whole solution has time complexity O(\mathrm{maxR}\cdot N\log{N}). It should score about 70 points.

cost function

The main loss of ignoring all non-available books is not knowing which books should be read in order to reach some high-rated books. We can fix this by replacing the rating/pages ratio by something better - for example, adding terms which correspond to books after it.

My approach was computing the “costs” of books in reverse topological order (if book A has to be read after book B, then the cost for B is computed after the cost for A). Each cost is computed by taking the rating/pages ratio and for each book B which has to be read after it, adding its cost (computed previously) divided by the number of books which have to be read after B, multiplied by a constant. The constant describes how important later books are in comparison to earlier ones, so it shouldn’t be too large; dividing by the indegree of B is done because if a book is at an end of a complicated dependency network, there’s less of a guarantee that taking the book before it may let us read it later (so it shouldn’t affect costs as much).

This way, the costs of books “propagate” through the costs of books read before them, generally decreasing in effect.

In general, we can use various types of functions to compute a general cost/value of each book.

what can be further improved?

The described algorithm is fairly fast; it can run maybe tens of times and still fit in the time limit. So why not run it multiple times, always with different parameters or something randomised, and take the best answer it finds? With this (specifically, I varied one parameter in the cost function), I got 85 points.

Still, I didn’t have much time for this problem, so there are a lot of things that could be done better. Maybe. Here are some untested ideas.

However we’re assigning books, there are always tradeoffs. If we assign a book of type 1 into a smaller, but later interval, we may lose a high-rated book that has to be read after it. If we assign books of type 2 to an early interval, we may not be able to read a valuable book of type 1 because we already spent the interval to which it could’ve gone. There are various ways to act and they usually depend on the future, which a greedy solution doesn’t know. (This may be improved by running the greedy twice and throwing away bad decisions on the 2nd try.)

The way to describe costs is kind of sloppy, too - the cost of a book may propagate to the costs of earlier books through various paths, so the intuitive picture of “how easily we may get to read it” is actually more complicated. There are definitely better cost functions, but that’s something to determine by trial and error.

Of course, there’s also the possibility of trying a different greedy method, for example choosing some books without (much) regard for dependencies and then finding out which books should be read before them. Or maybe a combination of multiple methods, it depends on everyone’s level of masochism.

Sample Solutions

Author
Tester