Sunday, December 31, 2017

Best problems of 2017

I have reviewed all problems I've highlighted this year in my blog, and also the problems sent by the readers in response to my question — thanks a lot for your input! I've distilled everything to a shortlist of five problems (in chronological order):

  • The Open Cup problem about interactively proving the pigeonhole principle, by tunyash, with solution in this post.
  • The AtCoder problem about moving two chips on a subset of a DAG, by sugim48, with solution in this post.
  • The IPSC problem about picking optimal sequences of red and black cards to get the first match, by misof, with solution in this post.
  • The AtCoder problem about assigning numbers to shared vertices of two trees so that all subtrees sum to +-1, by maroonrk, with solution in this post.
  • The Open Cup problem about interactively exploring a maze on Mars with a communication delay, by Gassa, discussed in this post

What is the best problem of 2017?

As usual around New Year, snarknews is hosting the Prime New Year contest, which consists of problems given at top-level team competitions in 2017 but not solved by anybody there. If you like real challenge and don't have much to do during the holidays, here's your link :)

Guten Rutsch!

Saturday, December 30, 2017

A sparse week

This week had the now traditional last contest of the year: Good Bye 2017 at Codeforces (problems, results, top 5 on the left, analysis, my screencast). The hardest problem H, after removing some layers, ended up being about finding a proper vertex coloring for a graph with n<=23 vertices with the smallest number of colors. My first thought was: surely for such classical problem the best approaches are well-known? However, it turned out that the algorithms I could remember were a O(3n) subset dynamic programming and various heuristics to speed up backtracking, both of which seemed too slow. Thanks to this paper I have now learned how to do this in O(n*2n), see point 4.2 "Inclusion-exclusion". Always nice to discover better approaches to classical problems!

In my previous summary, I have mentioned a TopCoder problem: you are given n=2000000 tasks, to be done in m=2000 days, at most one task per day (so most of the tasks will end up not being done). For each task you know three numbers: the segment of days when it can be done (from li to ri), and the profit ci from doing it. You need to pick the task to be done in each day in such a way that the total profit is maximized.

This can be naturally viewed as a weighted maximum matching problem, which thanks to the transversal matroid can be solved by trying to add tasks to the matching in decreasing order of profit, trying to find the augmenting path each time, and only resetting the "visited" state of each vertex once the matching is increased (this is called "Kuhn's algorithm" in the Russian-speaking community, but I couldn't find an authoritative link for it). Since the matching is only increased m times, we process each edge of the graph at most m times. The problem is that we have O(n*m) edges, so the total running time seems to be O(n*m2). However, we can notice that for each task that we end up not adding to the matching we process its edges only once (since we will never visit it in subsequent iterations), so the number of actually revisited edges is O(m2), and the running time is only O(n*m+m3), which is of course still too slow.

But now we can use another relatively standard trick: let's reduce the number of "active" edges from O(m2) to O(m*logm) in the following manner: instead of a matching problem we'll now have a flow problem, and we'll add auxiliary vertices between left and right parts. The vertices will correspond to segments. First, we'll pick the middle point b=m/2, and add vertices corresponding to segments [1,b], [2,b], ..., [b-1,b], [b,b], and infinite capacity edges in this manner: [1,b]->[2,b]->...->[b,b]; [1,b]->1; [2,b]->2; ..., [b,b]->b. The idea is that if we add an edge from some vertex in the left part to a segment [a,b], then the flow can then go to any vertex in the right part between a and b using those infinite edges, and only to those vertices. We handle the other half in a similar way: [b+1,b+1]<-[b+1,b+2]<-...<-[b+1,m-1]<-[b+1,m]. These two sets of segments already allow to handle any task that can be done in days b and b+1 using only two edges: to [li,b] and to [b+1,ri].

We can then apply the same procedure recursively to segments [1,b] and [b+1,m]: for example, we'll pick the middle point c=b/2, and build the auxiliary vertices [1,c]->[2,c]->...->[c,c] and [c+1,c+1]<-[c+1,c+2]<-...<-[c+1,b], and so on. Overall we'll have logm sets of m auxiliary vertices each (one per level of recursion), with two outgoing edges for each vertex, and every task can now be handled with at most two edges to auxiliary vertices instead of O(m) edges to the right part. This approach resembles the centroid decomposition of a tree when applied to a chain; we could also have used the sparse table to achieve the same result.

Now our graph has O(mlogm) active edges, and additional O(n) edges that are visited only once, so the total running time is O(n+m2*logm), which is fast enough.

That's it for the contests of 2017 — thanks for reading, and check back tomorrow (hopefully) for the best problems of 2017!

A quadratic week

Last week had a relatively high density of contests, so much that two of them collided: Codeforces Round 453 and TopCoder SRM 726 took place at the same time on Tuesday. The Codeforces round started 25 minutes earlier, so we'll begin with it (problems, results, top 5 on the left, analysis). dotorya had chosen a seemingly suboptimal order to solve the first four problems, as A and B required disproportionately much time relative to their value, but he was faster than the competition and thus still ended up in first place — well done!

TopCoder SRM 726 at roughly the same time attracted the second half of the community (problems, results, top 5 on the left, my screencast). The hard problem has proved decisive: you are given 2000000 tasks, to be done in 2000 days, at most one task per day (so most of the tasks will end up not being done). For each task you know three numbers: the segment of days when it can be done (from li to ri), and the profit ci from doing it. You need to pick the task to be done in each day in such a way that the total profit is maximized. This is naturally reduced to a weighted maximum matching problem, but surely it will time out with 2 million vertices in the graph — or will it?

Codeforces ran another round, Round 454, on Saturday (problems, results, top 5 on the left, analysis). All problems were tractable this time for moejy0viiiiiv and Radewoosh, and in case of the former with 30 minutes to spare. Congratulations on the victory!

In my previous summary, I have mentioned an Open Cup problem: you are given an integer n with at most a million digits. You need to determine if it's a perfect square or not. You don't need to actually find the square root.

As is often the case in problems with a Yes/No answer, it suffices to find an algorithm that always finds one of the two answers correctly, and the other with a certain probability that is greater than zero. We can then apply it repeatedly until the probability of not finding the correct answer is exponentiated to become essentially zero.

Let's pick a prime number p. If n is a perfect square, then n mod p is always a quadratic residue. If not, then assuming p is picked randomly the probability of n mod p being a quadratic residue is roughly 0.5, since roughly half of all numbers mod p are quadratic residues (this is merely handwaving; can anyone prove this probability estimation formally?).

So we can pick, say, 30 random prime numbers, and check if n is a quadratic residue modulo each of them. If at least one says no, then the overall answer is no, otherwise it's yes. Checking if a number is a quadratic residue modulo p is done by checking if n(p-1)/2=1 (mod p).

Thank for reading, and check back for this week's summary!

A Newton week

The Dec 11 - Dec 17 week contained the last Open Cup round of the year: the Grand Prix of Peterhof (problems, results, top 5 on the left). The deciding problem H required the ability to find the exponent a formal series fast, and the teams that were able to do so claimed the first two places — congratulations to the Moscow IPT team and to SPb Havka-papstvo!

I found the relatively easy problem J very nice. You are given an integer with at most a million digits. You need to determine if it's a perfect square or not. You don't need to actually find the square root. Can you see how to get this problem accepted on the 8th minute of the contest?

Thanks for reading, and check back soon!

A correlation week

The Dec 4 - Dec 10 week was another of the calm ones, so let's use this post to discuss the interactive NEERC problem I mentioned in the previous one: you work with a device that takes an a as input and computes ad mod n using the following pseudocode:

1   modPow(a, d, n) {
2     r = 1;
3     for (i = 0; i < 60; ++i) {
4       if ((d & (1 << i)) != 0) {
5         r = r * a % n;
6       }
7       a = a * a % n;
8     }
9   }

However, instead of receiving the result of the computation, you only receive the time it took the device to execute it, which is computed in the following way: only the multiplications on lines 5 and 7 take nonzero time, and multiplying x by y modulo n takes (bits(x)+1)*(bits(y)+1), where bits(x) is the length of the binary representation of x without leading zeroes.

You know the number n but not the number d, and your goal is to find d after sending at most 30000 requests to the device. You know that n and d were chosen in the following way: first, two prime numbers p and with bits(p)=bits(q)=30 were picked independently and uniformly at random. Then n was computed as p*q. Then the number m=(p-1)*(q-1) was computed. Then d was picked uniformly at random between 1 and m-1 inclusive, such that it is coprime with m.

The official analysis has a pretty detailed description of the intended solution starting from page 8, so let me just recap it here briefly;
  • We're going to just send 30000 random queries.
  • Lowest bit of d is always 1, let's determine the second lowest.
  • If it's also 1, then we multiply a by a2, otherwise we don't.
  • For queries where a and/or ahave less bits than usual, the overall modPow time will also be less than usual, but only on average, so we can't just tell from one a.
  • However, the correlation between the time to multiply a by a2 and the overall modPow time over all 30000 queries can tell us this bit with extremely high certainty.
  • We can then determine the next bit in the same manner, and so on.
  • This Wikipedia page also describes the same idea.
This solution is very easy to code and generalizes well to bigger constraints, but it might be hard to intuitively feel if 30000 queries gives enough certainty. Egor has suggested another approach which is a bit harder to code but easier to believe that it works (but we have not actually tried to implement it): instead of sending random queries, we will send queries such that one repeated square of them is very small modulo n, say, at most 10 bits. Such a query will give much more signal on whether this repeated square is used in the multiplications, since multiplying 10 by 60 bits, compared to multiplying 60 by 60 bits, is about 3000 nanoseconds faster, and standard deviation of modPow computation time if we just feed it random inputs is on the order of 1700, so the middle point is roughly one standard deviation away from each expectation. If we try, say, 49 such numbers for every bit, then we'll have seven standard deviations and will always guess correctly, so we need only 60*49 which is about 3000 queries to find all bits.

Of course, finding a number that has few bits in a certain power modulo n is a hard task in itself, but here n is small enough that we can factorize it using Pollard's rho algorithm, and after that we just need to invert the said power modulo phi(n) to be able to find the root of this power of any number, including ones with few bits.

Thanks for reading, and check back for more!

A timing week

The Nov 27 - Dec 3 week started with TopCoder SRM 724 (problems, results, top 5 on the left). snuke, sugim48 and Swistakk had more coding phase points but this round was ultimately decided during the challenge phase, as the easy problem allowed for a lot of incorrect approaches.

On Saturday, Codeforces Round 449 gave ACM ICPC 2017-18 NEERC contestants the last chance to practice, although I'm not sure if many of them did (problems, results, top 5 on the left, analysis). In a somewhat unusual outcome, MrDindows has won the round with less problems than the second place, thanks to an extremely fast solution to the hardest problem — congratulations on the victory!

He's managed to pull this off thanks to being able to squeeze a "naive" quadratic solution under the time limit in a problem with n=100000. This is no small feat, as demonstrated by the fact that just one other contestant was able to do it at all, even within the full two hours. Thankfully he has explained how he was able to achieve this, so that we all can learn the tricks and/or deepen our understanding of compilers!

On Sunday, the said ACM ICPC 2017-18 NEERC took place in St Petersburg (problems, results, top 5 on the left, analysis, broadcast in Russian). The competition was very tight, and Moscow IPT 1 team has earned the first place with 10 minutes to go, in no small part thanks to solving the tricky problem K with just two attempts, whereas the only two other teams to get it accepted required 10 and 14 attempts. Congratulations!

I have set problem H for this round, which was unfortunately the only one left unsolved. This is an interactive problem where you work with a device that takes an a as input and computes ad mod n using the following pseudocode:

1   modPow(a, d, n) {
2     r = 1;
3     for (i = 0; i < 60; ++i) {
4       if ((d & (1 << i)) != 0) {
5         r = r * a % n;
6       }
7       a = a * a % n;
8     }
9   }

However, instead of receiving the result of the computation, you only receive the time it took the device to execute it, which is computed in the following way: only the multiplications on lines 5 and 7 take nonzero time, and multiplying x by y modulo n takes (bits(x)+1)*(bits(y)+1), where bits(x) is the length of the binary representation of x without leading zeroes.

You know the number n but not the number d, and your goal is to find d after sending at most 30000 requests to the device. You know that n and d were chosen in the following way: first, two prime numbers p and with bits(p)=bits(q)=30 were picked independently and uniformly at random. Then n was computed as p*q. Then the number m=(p-1)*(q-1) was computed. Then d was picked uniformly at random between 1 and m-1 inclusive, such that it is coprime with m.

Surprisingly, just knowing the time the computation takes for a few requests is enough to extract the value of d. Can you see how to do it?

Thanks for reading, and check back later today!

Thursday, December 28, 2017

A directly opposite week

Code Festival 2017 onsite for university students and recent graduates ran by AtCoder was the biggest event of the Nov 20 - Nov 26 week. It consisted of multiple rounds, but the main and rated one was the Final (problems, results, top 5 on the left, parallel round results, analysis). The problemset has provided a lot of variety, and the people at the top had quite different sets of problems solved. tourist was quite a bit faster than others — big congratulations on the victory!

The Open Cup continued its weekly cadence with the Grand Prix of Europe on Sunday (problems, results, top 5 on the left, CERC results on the same problems). Team japan02 had 10 problems solved even before the last hour, but could not make further progress and allowed ITMO 1 to steal the victory with two minutes to go — congratulations to both teams solving 10 problems, and to the Warsaw team who got 10 at the onsite round, also with a last-minute accepted submission!

In my previous summary, I have mentioned an interactive Open Cup problem happening on the faces of a cube with each face divided into n times n cells (n<=300). You are located on some cell on some face of the cube, facing one of the four cardinal directions, and your goal is located on some cell on some face of the cube, but you don't know where you are located and where the goal is located. All you can do is to issue commands left, right or forward, with the first two causing you to rotate in place by 90 degrees, and the last one causing you to move one cell in the direction you're facing, with moving through the sides of the cube handled in the natural way. For each move forward, the system tells you whether your new cell is closer, farther or at the same distance from the goal as the old one. The distance is defined as the Euclidean distance between the centers of the cells. You need to stop moving at the goal cell after using at most 100n commands.

The solution that we got accepted looked like this: first, we try to move closer to the goal whenever we can, until we reach a cell from which all four moves don't give us a "closer" reply. This can be coded with a simple loop without the need to remember the coordinates or to even represent the sides of the cube, and will always converge in O(n) steps.

Then, we're either in the goal cell or in the cell directly opposite it on the opposite face. We don't know the size of the cube, but even without it in order to transition from one to the other we can walk forward while we're not seeing a "closer" answer, and then keep moving forward while we're seeing "closer" answer. If we were in the opposite cell, we will reach the goal. If we were in the goal, then we'll either be in the opposite cell, or in the goal again after making a full loop. To distinguish the cases, we can compare the number of moves in the "not closer" state to the number of moves in the "closer" state. In case the former is bigger than the latter, we started our forward walk in the goal, and we will undo all those moves to go back to the goal. In case the latter is bigger, we're done.

It takes some case studying and convincing oneself that this approach works in various corner cases as well, but after that the actual implementation is just a few lines.

Thanks for reading, and check back for more!

Wednesday, December 27, 2017

A test-driven week

The Nov 13 - Nov 19 week had one each of the usual suspects. First off, Codeforces Round 446 took place on Friday (problems, results, top 5 on the left, analysis). moejy0viiiiiv has earned his first place at the very last minute of the contest by solving D — very well done!

Then, TopCoder SRM 723 happened on Saturday (problems, results, top 5 on the left, analysis, my screencast). Only ksun48 was able to solve the hard problem, but it took him so long that his score was on par with scores people got for their medium problem, and the first place was decided during the challenge phase.

And finally, the Open Cup Grand Prix of Siberia wrapped up the week on Sunday (problems, results, top 5 on the left). Team Past Glory has continued their streak which is now up to 4 consecutive victories, this one by a mere 29 penalty minutes. Well done!

Problem 2 in this round is a good example of a problem that is not so hard to solve in some way, but is harder to solve in a way that is really easy to code without bugs. This is an interactive problem happening on the faces of a cube with each face divided into n times n cells (n<=300). You are located on some cell on some face of the cube, facing one of the four cardinal directions, and your goal is located on some cell on some face of the cube, but you don't know where you are located and where the goal is located. All you can do is to issue commands left, right or forward, with the first two causing you to rotate in place by 90 degrees, and the last one causing you to move one cell in the direction you're facing, with moving through the sides of the cube handled in the natural way. For each move forward, the system tells you whether your new cell is closer, farther or at the same distance from the goal as the old one. The distance is defined as the Euclidean distance between the centers of the cells. You need to stop moving at the goal cell after using at most 100n commands. Which approach is the easiest to implement here?

In my previous summary, I have mentioned a Codeforces problem: how many permutations of size n have the following property: if you go from left to right, and keep track of the maximum seen so far, there will be no time where it will not change for k consecutive steps, except if it's already equal to n. n and k are up to a million, and the answer needs to be returned modulo 109+7.

First, let's come up with a standard but slow solution. We'll do dynamic programming that finds the number of such permutations of size m<=n that the last element is the maximum and there's always an update to the maximum in each k consecutive steps when we go from left to right. In order to find this number, we iterate over the step p where the previous maximum appeared: it has to be between m-k and m-1. Then we need to multiply the answer for p by the number of ways to choose which m-p-1 numbers out of m-2 form the numbers on positions between p+1 and m-1, and in which order, which turns out to be simply (m-2)!/(p-1)!, so our final formula becomes:

dp[m] = sum(dp[p]*(m-2)!/(p-1)!, m-k<=p<=m-1)

This looks to be O(n2), but we can now notice that (m-2)! can be moved outside the sum, and the remaining sum can be computed in O(1) if we store prefix sums of dp[p]/(p-1)!, bringing the overall running time to O(n).

You can watch as I code the solution in the screencast to see my proposed way of implementing such solutions: start by implementing the quadratic approach, get it to pass the sample cases, then start implementing the optimizations one by one while keeping it passing the sample cases. One step that I did not make but should have is to also add a few random cases that are still small enough for the quadratic solution to handle to the samples, and make sure the outputs don't change on them while optimizing, too.

Thanks for reading, and check back for more!

Tuesday, December 26, 2017

A F5 week

The Nov 6 - Nov 12 week had two contests on Sunday. First off, it was time for the Open Cup Grand Prix of Ukraine (problems, results, top 5 on the left). Team Past Glory has won their third Grand Prix in a row — what an amazing achievement! Both Moscow IPT Cryptozoology and japan02 teams were quite close and had real chances for the first place as well — congratulations, too!

A bit later, Codeforces hosted its Round 445 (problems, results, top 5 on the left, analysis, my screencast). V--o_o--V was way too fast and thus got an easy first place — great job!

Somewhat surprisingly, problem C was the most difficult to solve (as opposed to implement) for me this round — you can probably find around 20 minutes of staring at its statement in the screencast during the moments I was trying to solve it on paper. It asked: how many permutations of size n have the following property: if you go from left to right, and keep track of the maximum seen so far, there will be no time where it will not change for k consecutive steps, except if it's already equal to n. n and k are up to a million, and the answer needs to be returned modulo 109+7. Is it easy for you?

Thanks for reading, and check back soon for the solution!

An empty week

There were no contests I'd like to mention during the Oct 30 - Nov 5 week, so let me use this summary for an audience question.

I'm planning to finish all summaries for 2017 during this week, and then put together a shortlist of best problems of 2017 to my taste, chosen from the ones I've highlighted in my blog, just like I did last year. However, the "to my taste" criterion is more important to me than "highlighted in my blog", so if you think you know a 2017 problem that I'd like a lot, but which did not get mentioned in the blog, most likely because I solved less contests and thus saw less problems, please mention it in comments!

I should also note that the New Year tradition of picking the best of the past year in competitive programming was pioneered and is still being pursued by Snarknews, so consider taking part in his (unfortunately, Russian-only, but maybe automatic translation is good enough these days?..) poll as well!

Saturday, December 23, 2017

A transversal week

The TopCoder Open onsite continued during the Oct 23 - Oct 29 week with Semifinal 2 on Monday (problems, results on the left, our commentary). Two contestants got the medium problem this time, tourist and ikatanic, and bcip was the third advancer thanks to fast time on the easy. Congratulations to all advancers!

The final round of the TopCoder Open took place just 24 hours later (problems, results on the left, our commentary). One again just two contestants have managed to solve two problems, and xudyh was quite a lot faster than rng_58. Congratulations on the TopCoder Open victory!

Codeforces Round 443 took place on Thursday (problems, results, top 5 on the left, analysis). dotorya has dominated the proceedings, solving all problems in the end, but having enough points for a clear first place with just four of them. Amazing performance!

On the weekend, the Open Cup Eastern Grand Prix took place (problems, results, top 5 on the left). Five teams have solved all problems, and team Past Glory did it faster than everybody else. Well done!

In my previous summary, I have mentioned a Codeforces problem: you are given a bipartite graph with at most 200000 vertices each each part. Each vertex in the left part has exactly two edges to different vertices in the right part. Each vertex in the left part also has a weight. You need to find the weight of the maximum weight matching in this graph (but you don't need to output the matching itself).

First of all, we can find the maximum weight matching greedily, by trying to add the vertices in the left part in decreasing order of weight, since the set of left parts of matchings forms the transversal matroid. How do we check if a vertex in the left part can be added to the matching without building the traditional augmenting path, which would be too slow in this problem? Since we don't need to build the matching itself, we will rely on the Hall's theorem. Suppose we're adding a vertex a in the left part that is connected to vertices b and c in the right part. Consider the connected components of the graph formed by the right part plus the vertices in the left part which have been matched (whenever we can't match a vertex in the left part, we disregard it together with the adjacent edges). Each of those components has at least as many vertices in the right part as in the left part, since all vertices of the left part are matched. Moreover, if it has k vertices in the left part, then it has 2k edges, and since it also has at least k vertices in the right part, its total number of vertices is at least 2k. A connected graph with at least 2k vertices and 2k edges does not have that many possibilities: it either has exactly 2k vertices and has one cycle, or it has 2k+1 vertices and is a tree.

We claim that we can add the vertex a to the matching if and only if one of the components corresponding to b and c is such a tree, so all we need to maintain in our solution is a set of connected components using a disjoint set union data structure, and the number of vertices in each part for each connected component.

If both components corresponding to b and c are not trees, they have the same number of vertices in left and right parts, and the union of those components and vertex a has more vertices in the left part than in the right part, so the Hall's theorem tells us it's impossible to add a to the matching. If the component corresponding to say, b, is a tree, then let's prove it's possible to rearrange the matching inside this component in such a way that we can add a to the matching. Suppose it's impossible, then the Hall's theorem tells us that there exists a subset U of vertices of the left part of the component such that if V is the set of the right part vertices adjacent to U,  then the size of V is equal to the size of U.  If U has k vertices, then the graph formed by U and V has 2k vertices and 2k edges, which means that it has a cycle, which is a contradiction.

Thanks for reading, and check back for more!

An extremely formal week

The Oct 16 - Oct 22 week started with Codeforces Round 441 on Monday (problems, results, top 5 on the left, analysis). fateice demonstrated impressive speed in solving 6 problems in 50 minutes only, good 20 minutes earlier than everybody else. Congratulations on the victory!

The problem F in this round was quite nice: you are given a bipartite graph with at most 200000 vertices each each part. Each vertex in the left part has exactly two edges to different vertices in the right part. Each vertex in the left part also has a weight. You need to find the weight of the maximum weight matching in this graph (but you don't need to output the matching itself). Of course, the standard maximum matching algorithms would be quadratic or at least O(n*sqrt(n)) and thus too slow for this problem. How to solve it faster?

On Sunday, the onsite portion of TopCoder Open 2017 started with its Semifinal 1 (problems, results on the left). xudyh solved the medium problem in 25 minutes, while everybody else could not solve it at all — what a commanding performance! scott_wu and rng_58 also advanced to the finals thanks to fast times on the easy problem.

In my previous summary, I have mentioned an Open Cup problem about formal proofs (problem H here): you are given an undirected connected graph with at most 73 vertices and 492 edges, and a desired parity for each vertex (either odd, denoted by 1, or even, denoted by 0). The sum of all parities is 1 modulo 2. You need to prove that there doesn't exist a subset of edges of the graph with the desired parity of the number of edges adjacent to each vertex. It would be easy to prove, as it is well known that the number of vertices of odd degree in any graph is always even. However, you're only interested in proofs stated in a specific form.

Such proof is a sequence of clauses. Each clause is an "or" of zero or more variables and/or their negations. Each variable corresponds to an edge in the graph, and is true if we take this edge, and false otherwise. Each clause must be built in one of the three ways:
  1. In can be an axiom, which in our case means that it must contain the variables corresponding to all edges adjacent to some vertex of the graph, with the number of negated variables having the parity that is opposite to the desired parity for this vertex. For example, consider a vertex of degree 5 and desired parity 0, with adjacent edges a, b, c, d, e. A valid axiom for this vertex would be for example !a|!b|c|d|!e, which would be false only if variables a,b,e are true, and variables c,d are false, but that would mean that the vertex has 3 adjacent edges which doesn't satisfy its parity requirement, so this axiom is always true.
  2. It can be a conclusion, in which we take two previous clauses of form A|x and B|!x, and make a new clause A|B. Since both original clauses are true the conclusion is true as well.
  3. It can be a transformation of a previous clause. We take a previous clause and a permutation of the set of all variables and their negations, and apply the permutation to the elements of the clause to get the new clause. The permutation can not be arbitrary: it must be an automorphism of the set of axioms (map every axiom to an axiom). Note that transformations are not a strictly necessary operation, as we could have constructed the resulting clause by applying the automorphism to the axioms used to build the original clause, and then repeating how it was built, but it allows to dramatically reduce the number of clauses in a proof.
Your goal to derive an empty clause in at most 1000 steps, which would mean a contradiction, and prove that the parities goal is impossible to achieve.

First, let's solve our problem for the case where our graph is a tree. We will apply a recursive procedure that takes a subtree, and builds a clause with just one variable corresponding to the edge going into that subtree, either negated or not. For a leaf, such clause is just an axiom. For an internal vertex, we can recursively build such clauses for edges going to its children, and then build an axiom from negations of those clauses, and choosing whether to negate the edge to the parent to obtain the required parity for the axiom. We can then eliminate all variables but one from this axiom using the conclusion, as we have negated single variables for all edges going to the children.

When we reach the root of the tree, we apply the same procedure, only there's no extra variable for the edge to the parent. The condition on the sum of all parities will guarantee that the above approach is still applicable, i.e. the axiom we need to cancel out all variables will be a valid axiom for the root (a formal proof of that fact would probably inductively find that the edge going from a subtree gives us a negated or non-negated clause based on the sum of parities in that subtree). And we will obtain an empty clause after eliminating all variables in the root, thus completing the required formal proof in less than 2n steps, where n is the number of vertices.

Now how to adapt this approach for the case where the graph is not a tree? Let's pick an arbitrary spanning tree in it, and apply the same idea. Whenever we need to create a new axiom for a vertex, let's keep all variables corresponding to the edges that are not in the spanning tree non-negated. This does not affect the validity of axioms since it depends only on the number of negated variables. In the end, instead of an empty clause we will get a clause that is an or of all edges that are not in the spanning tree, and does not have variables corresponding to the edges of the spanning tree.

This is where the third operation type, the transformation, comes into play. For each edge not in the spanning tree consider its fundamental cycle formed by this edge and the path between its ends in the spanning tree. Let's apply a transformation that negates all variables corresponding to this cycle. This is a valid transformation since for each vertex it negates 0 or 2 variables, which does not affect the parity, and thus maps axioms to axioms. In our clause it will negate exactly one variable, since we don't have variables corresponding to the edges of the spanning tree, and we can then use conclusion to remove that variable from our clause. We can then repeat this process for all edges not in the spanning tree and obtain the empty clause in 2(m-n+1) additional operations, or less than 2(m+1) overall, where m is the number of edges, which is good enough.

Thanks for reading, and check back soon!

A delayed week

TopCoder SRM 722 on Thursday was the first contest of the Oct 9 - Oct 15 week (problems, results, top 5 on the left, analysis). TopCoder has put together an easier problemset with a very tricky medium problem this time, which put a great emphasis on the challenge phase. tourist, uwi and gainullin.ildar have found enough challenges to overtake the others into the top three. Congratulations!

Codeforces Round 440 took place early on Sunday (problems, results, top 5 on the left, analysis). All top scorers solved 4 problems, albeit different ones, and mere minutes separated them. khadaev was 1 minute faster than Errichto, and thus earned the first place. Well done!

Overlapping with that round, the Open Cup Grand Prix of SPb took place just an hour later (problems, results, top 5 on the left). Uncharacteristically for our team we had reasonable penalty time and just needed to solve the last problem in an hour to win — but missed a small detail in the problem statement and could not get it accepted, which allowed team Past Glory to score another victory — congratulations!

In continuing the great traditions of St. Petersburg State University championships, this contest featured many interesting problems, of which I'd like to highlight the two hardest ones. Problem F was an interactive problem about exploring a random maze with a communication delay. You are given a maze in a 20x20 grid with walls between cells. The maze is randomly generated to form a tree using the Kruskal-like approach. You control a robot that starts in the bottom-left corner. You can command the robot to move in one of the four cardinal directions, and you would receive a reply telling you that the robot has moved in that direction (if there was no wall there), or that it stayed in its current cell (if there was a wall). You need to get the robot in the bottom-right corner in 10000 moves. The catch is that you don't receive the replies right away: the reply to the i-th command only arrives after you send the (i+100)-th command. We can't afford 100 moves per cell, so we have to somehow continue exploring without knowing where exactly we are. Can you see an effective approach?

I have explained and analyzed our approach in this comment thread.

Problem H discussed formal proofs in the spirit of the Dirichlet problem from another St. Petersburg State University Championship this spring. You are given an undirected connected graph with at most 73 vertices and 492 edges, and a desired parity for each vertex (either odd, denoted by 1, or even, denoted by 0). The sum of all parities is 1 modulo 2. You need to prove that there doesn't exist a subset of edges of the graph with the desired parity of the number of edges adjacent to each vertex. It would be easy to prove, as it is well known that the number of vertices of odd degree in any graph is always even. However, you're only interested in proofs stated in a specific form.

Such proof is a sequence of clauses. Each clause is an "or" of zero or more variables and/or their negations. Each variable corresponds to an edge in the graph, and is true if we take this edge, and false otherwise. Each clause must be built in one of the three ways:
  1. In can be an axiom, which in our case means that it must contain the variables corresponding to all edges adjacent to some vertex of the graph, with the number of negated variables having the parity that is opposite to the desired parity for this vertex. For example, consider a vertex of degree 5 and desired parity 0, with adjacent edges a, b, c, d, e. A valid axiom for this vertex would be for example !a|!b|c|d|!e, which would be false only if variables a,b,e are true, and variables c,d are false, but that would mean that the vertex has 3 adjacent edges which doesn't satisfy its parity requirement, so this axiom is always true.
  2. It can be a conclusion, in which we take two previous clauses of form A|x and B|!x, and make a new clause A|B. Since both original clauses are true the conclusion is true as well.
  3. It can be a transformation of a previous clause. We take a previous clause and a permutation of the set of all variables and their negations, and apply the permutation to the elements of the clause to get the new clause. The permutation can not be arbitrary: it must be an automorphism of the set of axioms (map every axiom to an axiom). Note that transformations are not a strictly necessary operation, as we could have constructed the resulting clause by applying the automorphism to the axioms used to build the original clause, and then repeating how it was built, but it allows to dramatically reduce the number of clauses in a proof.
Your goal to derive an empty clause in at most 1000 steps, which would mean a contradiction, and prove that the parities goal is impossible to achieve. Can you do that?

In my previous summary, I have mentioned a Codeforces problem: two players are playing a real-time game on a tree. The first player has one token, and in one second can move it to an adjacent vertex of the tree. The second player has several tokens, and can move them with infinite speed whenever they want. However, as soon as a token of the second player meets the token of the first player, this second player's token is removed from the game. The goal of the first player is to remove all tokens of the second player as fast as possible. How long would it take in worst case? You are given the tree and the starting locations of all tokens. There are at most 50 vertices in the tree, and at most 50 tokens.

We can notice that whenever we (the first player) are walking along an edge, only two numbers matter: how many tokens are there on each side of this edge in the tree. When we approach a vertex, the second player has to decide how to distribute the tokens on the side we're approaching between different subtrees of that vertex. We must then proceed into one of those subtrees, as going back along the edge we just arrived on just wastes two seconds. We must proceed into a subtree that has at least one token. Eventually, we will reach a leaf and remove all tokens which are there (at least one).

This makes our game acyclic, and we can solve it using dynamic programming where the state is the directed edge of the tree we're traversing, and two numbers: the number of second player tokens on each side of it. When we reach a vertex, in order to find how the second player will divide their tokens, we use an inner dynamic programming which computes the highest minimum time to finish the game if we placed a tokens into the first b subtrees of this vertex. This means we process one state in O(n3), and we have O(n2) states, for the total running time of O(n5). However, we can notice that the inner dynamic programming actually computes the answers not just for one state, but for all states with the same total number of tokens of the second player left, and thus the running time becomes O(n4) which is fast enough.

Thanks for reading, and check back soon for the next week's summary!

Sunday, December 17, 2017

A future week

The Oct 2 - Oct 8 week contained Codeforces Round 438 (problems, results, top 5 on the left, analysis). With problem G out of reach, the competition turned to the first six problems + challenges, and cubelover has excelled at both — congratulations on the win!

Problem E continued the streak of problems with nice and simple statements. Two players are playing a real-time game on a tree. The first player has one token, and in one second can move it to an adjacent vertex of the tree. The second player has several tokens, and can move them with infinite speed whenever they want. However, as soon as a token of the second player meets the token of the first player, this second player's token is removed from the game. The goal of the first player is to remove all tokens of the second player as fast as possible. How long would it take in worst case? You are given the tree and the starting locations of all tokens. There are at most 50 vertices in the tree, and at most 50 tokens.

In my previous summary, I have mentioned a similarly simple sounding problem: you are given the price of a stock for each of n days. You start with 0 stocks, want to finish with 0 stocks, and do one of three things each day: buy 1 stock, sell 1 stock, or do nothing. How much money you can earn? n is up to 300000.

The solution proceeds as follows. A new day starts, with stock price a. We will then create two futures for this stock: we will remember that we can buy two units of stock at price a (the reason for creating two futures instead of one will become clear soon). Then we will sell one unit of stock at price a. Which one? Of course, we will take the remaining future with the lowest price, act on it (i.e. buy a unit a stock at that price), and then sell it at price a, realizing the profit. Note that this might be the future that we just created, in case all previous ones were more expensive, in which case the buy and sell cancel out.

Note that for each stock, we create one sale and at most two buys via the futures. We can treat the first buy as cancelling the sale, and the second buy as the actual buy — this is the reason we need two futures.

As an example, suppose the stock prices are 4, 3, 6, 5. Then our algorithm proceeds as follows:

  • Add two 4s to the futures priority queue, now it has [4, 4].
  • Remove the lowest future and pair it with a sale at 4, for a profit of 4-4=0. Our queue is now [4].
  • Add two 3s: [3, 3, 4].
  • Remove the lowest and pair it with 3: profit is 3-3=0, queue is [3, 4].
  • Add two 6s: [3, 4, 6, 6].
  • Remove the lowest and pair it with 6: profit is 6-3=3, queue is [4, 6, 6].
  • Add two 5s: [4, 5, 5, 6, 6].
  • Remove the lowest and pair it with 5: profit is 5-4=1, queue is [5, 5, 6, 6].
The total profit is 0+0+3+1=4, and what happened is that we bought at 4 and 3 and sold at 6 and 5. After careful consideration one can see that all valid sequences of stock sales can be expressed in our framework, and that greedily choosing the cheapest future every time leads to an optimal solution thanks to an exchange argument. Still, I find the fact that this greedy approach works very beautiful!

Thanks for reading, and check back later!

A stock week

The last September week featured the relatively rare guest: a TopCoder SRM, number 721 (problems, results, top 5 on the left, analysis). eddy1021 scored the most points from the problems, but K.A.D.R has overtaken him thanks to his 100 challenge points. Congratulations on the win!

MemSQL Start[c]UP round 2 took place on Saturday (problems, results, top 5 on the left, onsite results, analysis). There was a wide variety of problems to solve, and tourist has made the right choice and claimed the first place with A+B+C+D+F. Well done!

Problem D had deceptively simple statement which led to a lot of frustration as I was unable to come up with its solution for two hours :) You are given the price of a stock for each of n days. You start with 0 stocks, want to finish with 0 stocks, and do one of three things each day: buy 1 stock, sell 1 stock, or do nothing. How much money you can earn? n is up to 300000.

Finally, the third Open Cup stage, the Grand Prix of Eurasia, took place on Sunday (problems, results, top 5 on the left). This time the active Russian ICPC teams stepped forward, with Moscow SU and ITMO leading everybody else by a problem. Well done!

Thanks for reading, and check back later as we dive into October!

Monday, December 11, 2017

A global shift week

The Sep 18 - Sep 24 week featured the second Open Cup stage: the Grand Prix of Ural (problems, results, top 5 on the left). The Seoul National U 2 team continued to amaze, this time claiming the victory by 1 problem to Past Glory and by 2 problems to everybody else. Very well done!

Later on Sunday, Codeforces hosted a contest titled "Manthan, Codefest 17" (problems, results, top 5 on the left, analysis). anta has stumbled on the tricky problem D (less than a third of contestants who submitted got it accepted), but has still came out clearly on top thanks to being the only contestant to submit all problems. Congratulations on the victory!

In my previous summary, I have mentioned a couple of problems. One was with a recursive background: there are many contestants taking part in a competition, and top c will be awarded with t-shirts. There are n available t-shirt sizes (n<=200000). Each contestant has indicated one of two things: either they need a t-shirt with some concrete size, or they need a t-shirt with one of the two concrete adjacent sizes. More precisely, for each size i we know the number ai of contestants (up to 108) that need a t-shirt of size i, and the number bi of contestants (also up to 108) that need either a t-shirt of size i or a t-shirt of size i+1. What is the minimal number of t-shirts we need to buy in order to be able to give t-shirts to top c contestants, no matter which ones end up in top c?

First, we can notice that we need to buy such set of t-shirts that in the resulting (huge) graph there's a full matching for any set of c nodes in the left part. Hall's theorem gives us an equivalent formulation: we need to make sure that for any set of sizes for which we buy x t-shirts in total, and x is less than c, the number of people who can only wear a t-shirt from this set must not exceed x.

Then, we can notice that we can only consider segments of sizes, and not arbitrary sets, in the above condition, as when we have multiple disjoint segments that violate the constraint together, one of them will also provide a constraint violation.

Then, just like in many other problems with constraints on segments, we can buy the t-shirts greedily: going in increasing order of sizes, for each size, we will buy just enough t-shirts to make sure the constraint is not violated for all segments that ends with this size. It doesn't make sense to buy more of this size as we can replace those extra t-shirts with ones of size one higher without making things worse.

Formally speaking, si=max(min(c,ai), min(c,ai+bi-1+ai-1)-si-1, min(c,ai+bi-1+ai-1+bi-2+ai-2)-si-2-si-1, ...). As we move from i to i+1, the arguments to max() change in the following way: the subtracted part increases by si for all terms, the a+b part increases by ai+1+bi, and we get a new term. The latter causes some terms to jump over c, in which case the corresponding min() will forever stay equal to c, and these terms will be zero or negative in all subsequent steps, so we can forget about them.

This leads to the following approach: let's keep a set of terms for which the min() is still less than c, then we only need to be able to add a constant to the positive part of all terms, add a constant to the negative part of all terms, to find all terms where the positive part exceeds c, and to find the term with the largest difference between positive and negative parts. Such set of operations can be most straightforwardly implemented by keeping two priority queues of those terms: ordered by positive part and ordered by difference, and to implement adding a constant by keeping an extra "global shift" variable.

This yields a O(nlogn) solution which is fast enough, but it can also be made O(n) as described in the official analysis.

Another problem I mentioned was: two players have a sequence of 50000 integers, and alternately take numbers from either end of the sequence. When all numbers have been taken, each player computes the bitwise xor of their numbers, and the one with a higher bitwise xor wins. Who will win?

I have described our solution in this comment, but I still feel a bit uneasy about it. It can probably be proven by induction, but it would seem that in such simple setting there should be a simple argument that proves it. I'm really curious to see such argument, please share if you know one!

Finally, I've just remembered an interesting bit about the TopCoder problem which I explained in the previous summary, about the attractions, mornings, evenings and constraints. During the contest, I was unable to come up with a max flow solution even though I felt like one is possible, and out of desperation started to implement backtracking that greedily does things that are obviously good to do, branches when we don't have an obvious next step, and cuts branches that are definitely worse than the current best answer. To my surprise, I've received a wrong answer instead of time limit exceeded on the system test. The reason for that was that I assumed that I've taken a transitive closure of the set of constraints when implementing the solution, and forgot to actually take the closure. After making this fix, the backtracking passed the system test in the practice room in 42ms :)

Thanks for reading, and check back later for more September stories!

A chain cut week

The Sep 11 - Sep 17 week had its contests on the weekend. On Saturday MemSQL Start[c]UP has returned to Codeforces after a three-year absence with Round 1 of its third edition (problems, results, top 5 on the left, my screencast, analysis). moejy0viiiiiv was 25 minutes faster than everybody else to solve all problems, and made sure to protect his lead with a few challenges. Congratulations on the win!

The round had a few nice problems, but let me highlight problem F: there are many contestants taking part in a competition, and top c will be awarded with t-shirts. There are n available t-shirt sizes (n<=200000). Each contestant has indicated one of two things: either they need a t-shirt with some concrete size, or they need a t-shirt with one of the two concrete adjacent sizes. More precisely, for each size i we know the number ai of contestants (up to 108) that need a t-shirt of size i, and the number bi of contestants (also up to 108) that need either a t-shirt of size i or a t-shirt of size i+1. What is the minimal number of t-shirts we need to buy in order to be able to give t-shirts to top c contestants, no matter which ones end up in top c?

Apart from being an interesting problem in general, it's a great example for this recent discussion about stories in problem statements. I think that here the t-shirt and contestant story actually makes the problem statement really easy to understand and think about, while a completely formal mathematical statement would be hard to parse.

On Sunday the new season of the Open Cup started with the Grand Prix of Romania (problems, results, top 5 on the left). Team Past Glory did not show any signs of slowing down and won convincingly by a problem — well done! The great performance and second place of the Seoul National U 2 team was a great surprise, and even more so given that this is a different team from the one that earned gold medals at the last ICPC World Finals for Seoul.

Problem L of this contest showed that we can still find new games with non-trivial solutions in a very simple setup: two players have a sequence of 50000 integers, and alternately take numbers from either end of the sequence. When all numbers have been taken, each player computes the bitwise xor of their numbers, and the one with a higher bitwise xor wins. Who will win?

And right after the end of the Grand Prix, Codeforces held its Round 434 (problems, results, top 5 on the left, analysis). The battle for the first place was quite tight. Congratulations to kmjp who came out on top by a mere 25 points!

In my previous summary, I have mentioned a hard TopCoder problem: you are given 50 attractions in a city. You will visit exactly one attraction per day. Each attraction can be visited either in the morning, costing ai, or in the evening, costing bi. Every time you visit an attraction in the morning after visiting another attraction the previous evening, you pay an extra cost c. Every time you switch from evening to morning, you pay an extra cost d. Additionally, you are given a directed graph of restrictions — a set pairs of attractions that constrain the order of visits: in each pair, the first attraction must be visited before the second one. What is the minimal cost to perform all visits?

First of all, let's split our visits into groups: a group of morning visits, then a group of evening visits, then a group of morning visits, and so on (we need to consider two cases based on whether the very first visit is in the morning or in the evening). Since we only pay extra costs when we switch groups, the total cost depends on the number of groups and on the assignment of visits to groups, but not on the order of visits within each group.

Now let's try to build a network in which a cost of a cut would correspond to the total cost of the visits, so that we can solve our problem with the minimum cut algorithm. For each attraction, let us build a chain of vertices, with all chains having the source and sink as start and end respectively. A chain for one attraction looks like: s->v1->v2->..->vn->t. Let's add infinite capacity edges going backwards in the chain, to guarantee that any finite cut separates some prefix of this chain from the rest. The size of this prefix will correspond to the number of the group that this attraction belongs to, and thus the capacity of the edges should be alternating ai and bi: if the vertex belongs to an even-numbered group, the corresponding edge will contribute ai to the total capacity of the cut, and otherwise bi, just as we need.

In order incorporate a restriction "p must be visited before q" into our network, we will add infinite capacity edges from the vertices of the chain for p to the corresponding vertices of the chain for q. This guarantees that with any finite cut, the position we cut the chain for p will be the same or earlier than the position we cut the chain for q.

Finally, in order to incorporate the morning/evening switching costs, let us add an extra attraction that must be visited after all others, and set up the capacities on its chain not as alternating ai and bi, but rather as 0, c, c+d, 2c+d, 2c+2d, ...

The finite cuts in the resulting network correspond to valid ways of visiting the attractions, and the capacity of the cut is equal to the total visit cost, so now we just need to find the minimum cut. Also note that we have built our network using two relatively standard building blocks: infinite edges can give us constraints on the cut, and a chain of vertices with infinite backwards edges implements a choice from several alternatives with specific costs.

Thanks for reading, and check back soon!