Monday, January 22, 2018

A mean median week

This week Codeforces hosted its Round 458, also known as CodeCraft-18 (problems, results, top 5 on the left, analysis). Syloviaely has made a solid claim to the first place by submitting the 3000-point problem F just 13 minutes after the contest started (was it a previously known problem, or was it really fast solving?..), but SkyDec has gathered a whopping 800 points from challenges on two different problems, 450 on B and 350 on C, and thus finished first with quite a margin. Congratulations!

This round also had a great side story — the top-rated contestant, Um_nik, decided to go for the hardest problem H, and that bet did not play out well as he spent too much time on it and as I understood also discovered an incorrect output in one of the samples (which means an incorrect reference solution?..). The admins have decided to make the round unrated for him because of the incorrect sample issue, but he asked them to reverse that decision (and thus make him lose rating) because he deemed the issue did not impact him significantly. Kudos to Um_nik for the admirable honesty!

Last week, I have mentioned an AtCoder problem: you are given n<=2000 positive integers, each up to a<=2000. Consider all 2n-1 nonempty subsets, and compute the sum of the numbers in each subset. Now sort those sums, and find the median sum — the 2n-1-th in the sorted order. You need to output this median sum. Time limit is 2 seconds.

At first sight, the problem appears somewhat standard: we can run the knapsack dynamic programming that computes the number of ways to get each possible sum, and then go from lower sums to higher sums until we get 2n-1 in total.

This has two issues, though: first, the maximum sum is n*a, so the dynamic programming runs in O(n2*a). Second, the number ways to get each sum can be on the order of 2n, so we need big integers, and the running time after accounting for that is more like O(n3*a). Even O(n2*a) is too slow.

Here comes the insight: notice that all sums can be split in pairs. If the sum of all numbers is s, then for each subset with sum x we'll have the complementary subset with sum s-x (let's also include the empty subset). One of those sums is <=s/2, and the other is >=s/2, so the median is roughly equal to s/2. More precisely, the median will always be the smallest sum that is >=s/2.

This allows us to get rid of the big integers, as now we merely need to know which sums are possible, and don't care about the number of ways anymore. Moreover, we can use bitmasks to speed up this O(n2*a) solution ~32-64 times which makes it fast enough.

Thanks for reading, and check back next week!

Monday, January 15, 2018

An Um_nik week

This week's contests shared one of the problemsetters — and the winner. On Monday, Codeforces hosted its Hello 2018 round (problems, results, top 5 on the left, analysis). With the hardest problem having too much piecewise polynomial integration, the competition focused on the remaining seven. Only Um_nik could get them all, and even with 24 minutes to spare — congratulations!

On Sunday, AtCoder Grand Contest 020 started the race to Japan (problems, results, top 5 on the left, my screencast, analysis). Once again the hardest problem by tourist was a tiny bit too hard (and I even solved it after the contest by integrating polynomials, probably inspired by the analysis of the previous contest), and once again Um_nik was way faster than everybody else on the remaining problems — big congratulations again!

I found problem C quite cute. You are given n<=2000 positive integers, each up to 2000. Consider all 2n-1 nonempty subsets, and compute the sum of the numbers in each subset. Now sort those sums, and find the median sum — the 2n-1-th in the sorted order. You need to output this median sum. Time limit is 2 seconds.

In my last post I've discussed what was arguably the hardest contest problem of 2017: you are given an undirected graph with at most 200000 vertices and edges. Your goal is to split its vertices into four disjoint groups A, B, C, D in such a way that:
  • The subgraph formed by groups A+B is connected and has at least 2 vertices.
  • The subgraph formed by groups C+D is connected and has at least 2 vertices.
  • Each vertex in A has an edge to each vertex in C.
  • There are no other edges between subgraphs A+B and C+D.
If there are many possible solutions, you need to output any one of them.

Thanks to great ideas and pointers by Swistakk, mnbvmar and lewin, we were able to put together a O(n*logn) solution — you can check it out in this comment thread.

Thanks for reading, and check back next week!

Monday, January 8, 2018

An Otherland week

The first week of 2018 was quite calm, with the most devoted taking stabs at the still-ongoing Prime New Year contest by snarknews. Only one problem remains unsolved there, and it looks like it might take some team effort to crack it. I think it's within the spirit of the contest to discuss its problems publicly (after all, many even have analysis posted online), so I propose to do just that. Please skip the following paragraphs if you don't want spoilers!

The problem goes like this: you are given an undirected graph with at most 200000 vertices and edges. Your goal is to split its vertices into four disjoint groups A, B, C, D in such a way that:
  • The subgraph formed by groups A+B is connected and has at least 2 vertices.
  • The subgraph formed by groups C+D is connected and has at least 2 vertices.
  • Each vertex in A has an edge to each vertex in C.
  • There are no other edges between subgraphs A+B and C+D.
If there are many possible solutions, you need to output any one of them.

I have been thinking about the problem for some time, but only have the following not very helpful observations:
  • If the graph has an articulation point, then we can take this point as set A, any of the components it separates as set B, and the rest as C+D. So we can assume the graph is vertex-biconnected.
  • The answer is not always "Yes" - the smallest counterexample is the graph with 4 vertices and 5 edges. This example also shows that if a biconnected graph has a vertex of degree 2, then this vertex and both adjacent vertices must be in the same half of our splitting, since both ends of a splitting edge must have degree of at least 3 (two splitting edges plus one inside the half).
  • There are also counterexamples without vertices of degree 2.
  • There might be some hashing trick involved. I.e., for example suppose we assign a random number ai to each vertex, and compute for each vertex the value sum(ai-aj), where the summation goes over all adjacent vertices j. Then if we add up those values within one half of the graph (A+B in the above terminology), then almost everything will cancel out, leaving us |C|*aA-|A|*aC, where aA is the sum of ain A. But where to go from here?..
Please share your ideas!

In my last post, I have held a vote for the best problem of 2017. And the winner is this AtCoder problem by maroonrk: you are given two rooted trees on the same set of 105 vertices. You need to assign integer weights to all vertices in such a way that for each subtree of each of the trees the sum of weights is either 1 or -1, or report that it's impossible. The solution is explained in this post. Congratulations to the problemsetter and to AtCoder, and thanks to all problemsetters of 2017!

Thanks for reading, and check back next week.

Sunday, December 31, 2017

Best problems of 2017

I have reviewed all problems I've highlighted this year in my blog, and also the problems sent by the readers in response to my question — thanks a lot for your input! I've distilled everything to a shortlist of five problems (in chronological order):

  • The Open Cup problem about interactively proving the pigeonhole principle, by tunyash, with solution in this post.
  • The AtCoder problem about moving two chips on a subset of a DAG, by sugim48, with solution in this post.
  • The IPSC problem about picking optimal sequences of red and black cards to get the first match, by misof, with solution in this post.
  • The AtCoder problem about assigning numbers to shared vertices of two trees so that all subtrees sum to +-1, by maroonrk, with solution in this post.
  • The Open Cup problem about interactively exploring a maze on Mars with a communication delay, by Gassa, discussed in this post

What is the best problem of 2017?

As usual around New Year, snarknews is hosting the Prime New Year contest, which consists of problems given at top-level team competitions in 2017 but not solved by anybody there. If you like real challenge and don't have much to do during the holidays, here's your link :)

Guten Rutsch!

A sparse week

This week had the now traditional last contest of the year: Good Bye 2017 at Codeforces (problems, results, top 5 on the left, analysis, my screencast). The hardest problem H, after removing some layers, ended up being about finding a proper vertex coloring for a graph with n<=23 vertices with the smallest number of colors. My first thought was: surely for such classical problem the best approaches are well-known? However, it turned out that the algorithms I could remember were a O(3n) subset dynamic programming and various heuristics to speed up backtracking, both of which seemed too slow. Thanks to this paper I have now learned how to do this in O(n*2n), see point 4.2 "Inclusion-exclusion". Always nice to discover better approaches to classical problems!

In my previous summary, I have mentioned a TopCoder problem: you are given n=2000000 tasks, to be done in m=2000 days, at most one task per day (so most of the tasks will end up not being done). For each task you know three numbers: the segment of days when it can be done (from li to ri), and the profit ci from doing it. You need to pick the task to be done in each day in such a way that the total profit is maximized.

This can be naturally viewed as a weighted maximum matching problem, which thanks to the transversal matroid can be solved by trying to add tasks to the matching in decreasing order of profit, trying to find the augmenting path each time, and only resetting the "visited" state of each vertex once the matching is increased (this is called "Kuhn's algorithm" in the Russian-speaking community, but I couldn't find an authoritative link for it). Since the matching is only increased m times, we process each edge of the graph at most m times. The problem is that we have O(n*m) edges, so the total running time seems to be O(n*m2). However, we can notice that for each task that we end up not adding to the matching we process its edges only once (since we will never visit it in subsequent iterations), so the number of actually revisited edges is O(m2), and the running time is only O(n*m+m3), which is of course still too slow.

But now we can use another relatively standard trick: let's reduce the number of "active" edges from O(m2) to O(m*logm) in the following manner: instead of a matching problem we'll now have a flow problem, and we'll add auxiliary vertices between left and right parts. The vertices will correspond to segments. First, we'll pick the middle point b=m/2, and add vertices corresponding to segments [1,b], [2,b], ..., [b-1,b], [b,b], and infinite capacity edges in this manner: [1,b]->[2,b]->...->[b,b]; [1,b]->1; [2,b]->2; ..., [b,b]->b. The idea is that if we add an edge from some vertex in the left part to a segment [a,b], then the flow can then go to any vertex in the right part between a and b using those infinite edges, and only to those vertices. We handle the other half in a similar way: [b+1,b+1]<-[b+1,b+2]<-...<-[b+1,m-1]<-[b+1,m]. These two sets of segments already allow to handle any task that can be done in days b and b+1 using only two edges: to [li,b] and to [b+1,ri].

We can then apply the same procedure recursively to segments [1,b] and [b+1,m]: for example, we'll pick the middle point c=b/2, and build the auxiliary vertices [1,c]->[2,c]->...->[c,c] and [c+1,c+1]<-[c+1,c+2]<-...<-[c+1,b], and so on. Overall we'll have logm sets of m auxiliary vertices each (one per level of recursion), with two outgoing edges for each vertex, and every task can now be handled with at most two edges to auxiliary vertices instead of O(m) edges to the right part. This approach resembles the centroid decomposition of a tree when applied to a chain; we could also have used the sparse table to achieve the same result.

Now our graph has O(mlogm) active edges, and additional O(n) edges that are visited only once, so the total running time is O(n+m2*logm), which is fast enough.

That's it for the contests of 2017 — thanks for reading, and check back tomorrow (hopefully) for the best problems of 2017!

Saturday, December 30, 2017

A quadratic week

Last week had a relatively high density of contests, so much that two of them collided: Codeforces Round 453 and TopCoder SRM 726 took place at the same time on Tuesday. The Codeforces round started 25 minutes earlier, so we'll begin with it (problems, results, top 5 on the left, analysis). dotorya had chosen a seemingly suboptimal order to solve the first four problems, as A and B required disproportionately much time relative to their value, but he was faster than the competition and thus still ended up in first place — well done!

TopCoder SRM 726 at roughly the same time attracted the second half of the community (problems, results, top 5 on the left, my screencast). The hard problem has proved decisive: you are given 2000000 tasks, to be done in 2000 days, at most one task per day (so most of the tasks will end up not being done). For each task you know three numbers: the segment of days when it can be done (from li to ri), and the profit ci from doing it. You need to pick the task to be done in each day in such a way that the total profit is maximized. This is naturally reduced to a weighted maximum matching problem, but surely it will time out with 2 million vertices in the graph — or will it?

Codeforces ran another round, Round 454, on Saturday (problems, results, top 5 on the left, analysis). All problems were tractable this time for moejy0viiiiiv and Radewoosh, and in case of the former with 30 minutes to spare. Congratulations on the victory!

In my previous summary, I have mentioned an Open Cup problem: you are given an integer n with at most a million digits. You need to determine if it's a perfect square or not. You don't need to actually find the square root.

As is often the case in problems with a Yes/No answer, it suffices to find an algorithm that always finds one of the two answers correctly, and the other with a certain probability that is greater than zero. We can then apply it repeatedly until the probability of not finding the correct answer is exponentiated to become essentially zero.

Let's pick a prime number p. If n is a perfect square, then n mod p is always a quadratic residue. If not, then assuming p is picked randomly the probability of n mod p being a quadratic residue is roughly 0.5, since roughly half of all numbers mod p are quadratic residues (this is merely handwaving; can anyone prove this probability estimation formally?).

So we can pick, say, 30 random prime numbers, and check if n is a quadratic residue modulo each of them. If at least one says no, then the overall answer is no, otherwise it's yes. Checking if a number is a quadratic residue modulo p is done by checking if n(p-1)/2=1 (mod p).

Thank for reading, and check back for this week's summary!

A Newton week

The Dec 11 - Dec 17 week contained the last Open Cup round of the year: the Grand Prix of Peterhof (problems, results, top 5 on the left). The deciding problem H required the ability to find the exponent a formal series fast, and the teams that were able to do so claimed the first two places — congratulations to the Moscow IPT team and to SPb Havka-papstvo!

I found the relatively easy problem J very nice. You are given an integer with at most a million digits. You need to determine if it's a perfect square or not. You don't need to actually find the square root. Can you see how to get this problem accepted on the 8th minute of the contest?

Thanks for reading, and check back soon!