Algorithms Weekly by Petr Mitrichev: October 2013

Sunday, October 27, 2013

This week in competitive programming

This week wasn't too heavy with open programming competitions, with just TopCoder SRM 595 early on Friday, in the famous "5am" slot (for Moscow time zone) that has traditionally separated dedicated programming contest participants from casual ones :)

Here are the problems, results; top 5 is on the left. I've managed to solve the hardest problem very quickly because it relied on one of my favorite methods - using linearity of expectation, or, in other words, that the expected value of the sum of random values is equal to the sum of their expected values, even if the random values are dependent. I won't give more details for now so that you can try to discover the actual solution yourself. Here's the problem statement: you are given 50 points on the plane, with each point assigned a probability that it exists. Existence of different point is independent. What is the expected area of the convex hull of existing points?

There were also some more ACM ICPC competitions this week, the most notable (at least to my knowledge) being the NEERC Northern Subregional contest, featuring the best teams from St Petersburg and surroundings. Here are its results, top 5 is on the left.

SPb SU 4 leads the field by just one problem, but such small gap doesn't really do them justice - for example, after 3 hours they led 11 problems to 8. Looking at other recent competitions - for example the Moscow Subregional or the two Open Cup stages I mentioned earlier, it becomes clear that they're probably the strongest Russian ACM ICPC team this year by far.

The Northern subregional not just confirmed that they are very strong, which we already knew, but it confirmed that they are participating in ACM ICPC this year - something that was not certain until the very last moment, as they considered to skip this year to get more practice and get better results next year (since each person is allowed at most two ACM ICPC World Finals participations, and they've been there this summer, they have just one attempt left). The Northern subregional has also confirmed that Gennady Korotkevich (tourist), who's number one in all ratings right now, the current ACM ICPC World Champion, and who has beat SPb SU 4 on many occassions, is indeed not taking part in ACM ICPC this year, leaving the way clear for SPb SU 4.

Of course, no offense meant to other Russian teams - but right now SPb SU 4 look to be the main contender for the world championship from Russia. I don't have a good picture for other countries, though - please share what other strong teams have been formed in commments! In particular, do teams from the University of Tokyo and from National Taiwan University who got gold medals this summer compete again?

Sunday, October 20, 2013

This week in competitive programming

This week started with two competitions on Tuesday.

The first was Codeforces Round 207: problems, results. I didn't take part, so I don't have anything to share about the problems. Congratulations Gennady, and it's nice to see Bruce again in top 5!

Then, there was TopCoder SRM 594: problems, results, my screencast. The problemset composition was somewhat classical for TopCoder: a 250 that involves a careful check of all possibilities, a 550 that requires a standard algorithm - in this case König's theorem, and a 950 that nobody solves, although it does look quite tractable. This type of problemset places a huge emphasis on the challenge phase, and Gennady has again came out on top with 4 successful challenges.

The 550 allowed two approaches - minimum vertex cover in a bipartite graph and minimum cut in a network. The two approaches are, of course, very similar in this case. The first goes like this: let's make a bipartite graph, with the left part containing white cells, the right part containing the empty cells, and edges connecting adjacent cells from different parts. Consider a vertex cover in the graph - such set of vertices that every edge has at least one of its ends in this set. It's not hard to see that vertex covers correspond exactly to sets of cells that we can leave occupied at the end: if we put black stones to the empty cells from the vertex cover, then all white cells not from vertex cover will have all adjacent cells occupied and will be removed. Since we need to maximize the number of empty cells in the end, we need to minimize the number of occupied cells in the end, and thus minimize this vertex cover.

The second is: let's make a network consisting of source S, sink T, all white cells, and all empty cells. We draw edges from S to all white cells with capacity 1, from each white cell to its adjacent empty cells with infinite capacity, and from each empty cell to T with capacity 1. Why infinite capacity? It helps enforce the following property: for each finite cut A | B of this network, all neighbors of each white cell in A will also be in A - because otherwise there'd be an infinite edge from A to B and the cut would be infinite. Given this property, each cut gets a natural meaning: the white cells in A are those white cells that we will capture, and the empty cells in A are where the black stones will need to go. Finally, the capacity of the cut is the sum of the number of white cells in B and the number of empty cells in A, which is exactly the cells that will end up occupied, so we need to find the minimum cut.

This idea - adding appropriate infinite edges to the network to make sure each finite cut possesses the properties we require - actually comes up pretty often in algorithmic competitions.

Apart from the open competitions, this week featured quite a few ACM ICPC rounds for eligible university students. The ACM ICPC season is in full swing now, with subregionals and regionals happening across the world. Most regionals will happen in October or Novemeber, with a few slipping to December. There were several rounds this week featuring teams that will definitely challenge for medals in the World Finals, for example the NEERC Saratov Subregional (results) and the NEERC Moscow Subregional (results). Only teams from the corresponding geographical area can participate in each of those subregionals, but other teams can and usually do take part online and thus we get an early glimpse at the relative strength of the teams and determine the favorites for the regionals and World Finals.

Addition problem

Let me remind you of a problem I've mentioned last week: you are given three non-negative numbers up to 10^18: A, B and S. At each step, you can replace either A or B with A+B. Is it possible to make at least one of A or B equal to S?

The first step is to find the greatest common divisor of A and B: gcd(A, B) = G. If S is not divisible by G, then clearly we can't get it. Otherwise, let's divide everything by G - in other words, we've reduced the problem to the case where gcd(A, B) = 1.

Now, it's clear that each number we can obtain is equal to N*A+M*B for some non-negative N and M. But is any pair of non-negative N and M possible? No. For example, it's not hard to see that we can't obtain 2*A+2*B. We start with A and B, continue to A+B and B (A+B and A is similar), and then it's not hard to see that the multiplier of B in all following numbers will be greater than the multiplier of A.

It helps to look at this problem as a geometric one. Let's plot N on one axis and M on the other axis. We start with vectors (0, 1) and (1, 0). At each step we add one of our vectors to another. We can notice an invariant: the area of the triangle induced by our vectors stays equal to 0.5! The area is one half of the vector's cross product, and the cross product doesn't change when we add one vector to another. This picture is similar to what happens when studying continued fractions.

In other terms, if our numbers are N₁*A+M₁*B and N₂*A+M₂*B, then N₁*M₂-M₁*N₂ is always equal to 1. One consequence of this invariant is that N₁ and M₁ (or N₂ and M₂, for that matter) are relatively prime: gcd(N₁, M₁)=1. So we have limited the set of obtainable numbers to those where gcd(N, M)=1.

But it's not hard to see the reverse also holds! Suppose gcd(N,M)=1. That means there exist such numbers X and Y that N*X+M*Y=1. Let's say that X is non-negative and Y is non-positive. Then we can define N₁=N, M₁=M, N₂=-Y, M₂=X, and we obtain the above invariant: N₁*M₂-M₁*N₂=1. But then we can repeatedly subtract either (N₁, M₁) from (N₂, M₂) or vice versa, reconstructing the path from (1,0), (0,1) from the end.

So the original problem boils down to: is there a way to pick non-negative N and M such that gcd(N, M)=1 and N*A+M*B=S? We can solve the latter Diophantine equation in the standard way using the Extended Euclidean algorithm, obtaining N(T)=N₀+B*T, M(T)=M₀-A*T. The condition on N and M being non-negative gives us a range of possible values of T, and in case that range is empty, we have no solution. But in case it's non-empty, how to check if N and M will be relatively prime for some T in that range?

And here's the key trick: I've guessed that we can just check possible values of T until we find one that gives relatively prime N and M, and we will either find such T quickly or run out of possible values of T quickly. Why? I didn't prove it during the contest, and the more I try to prove it now, the more it looks it might be false :) Any ideas?

Sunday, October 13, 2013

This week in competitive programming

At the end of last week, the Fourteenth Open Cup has started. The Open Cup format is similar to the tennis rankings or to the cross-country skiing/biathlon world cups: there are about two five-hour contests each month, and you get 100 points for the first place, 80 points for the second place, ..., 1 point for the 30th place. Then your scores from different contests are added up and the team with the highest total score wins the cup. It started as a competition for Russian teams, but more and more teams take part each year, including veteran teams like my team.

Last Sunday, the first stage took place: results. You can see top 5 to the left. The problemset was composed by the St. Petersburg State University coaches, and it had a few very nice problems. One surprising problem was: you are given N vectors in D-dimensional space. How many different 2-dimensional planes do they induce? Mathematically, this isn't really a problem - each pair of vectors defines a plane, and we should count the number of different planes among them. But how to check if two such 2-dimensional planes in D-dimensional space are the same plane? I'll tell my approach in a later post.

This Sunday (today), the second stage took place: results. Again, top 5 is on the left. This time the problemset was taken from the South-Eastern European Regional Contest, the ACM ICPC regional for Ukraine, Romania and neighboring countries. The Open Cup teams performed much better than the teams participating in the regional itself - the winner of the regional, SobolevTeam from Kharkov, placed tenth in the Open Cup. The problemset was composed by people from many different universities in the region. Here's one problem I enjoyed: you are given three non-negative numbers up to 10^18: A, B and S. At each step, you can replace either A or B with A+B. Is it possible to make at least one of A or B equal to S? Again, will share my solution later, feel free to discuss the problem in comments.

After two stages, three teams are close at the top of overall standings: tourist and SPb SU 4 have 160 points each (60 + 100 for tourist, 80 + 80 for SU 4), then my team has 150 points (100 + 50).

Finally, later today Codeforces Round 206 took place. I didn't take part and didn't have a chance to read the problems yet, so I can only admire rng_58's yet another commanding win. Congratulations!

Thanks for reading, and see you next week!

Friday, October 11, 2013

A problem that requires being careful and determined

In my earlier post I've told you about a problem that almost guaranteed me a win in the Russian Code Cup. Here's how I solved it.

Problem statement: you are given a rectangular grid of white and black cells, N rows of M columns each. Top-left and bottom-right corner cells are black. You start at the top-left cell. Every second you move to an adjacent cell (sharing a side). You're not allowed to stay in the same cell. If you move to a white cell, then you're instantly teleported to a random white cell that is picked uniformly and independently. Note that it's possible that you're teleported to the cell you're already in. Your goal is to reach the bottom-right corner as fast as possible. Of course, random teleportations mean you can't guarantee how long it will take, so you need to minimize the expected (average) time to reach the bottom-right corner.

The first step is standard for this kind of problem: we treat it as a dynamic programming problem. We have N*M states, one per cell of the grid, and we should calculate the expected time to reach the bottom right-corner starting from each of those cells. Let's denote this expected time for cell X as T_X. Suppose the possible moves from cell X lead to cells Y₁, Y₂, ... If all of those cells are black, then it's not hard to see that T_X=1+min(T_Y₁, T_Y₂, ...). This equation simply says: we will spend one second and end up in one of those cells - so it's obvious we should pick the cell with the smallest expected time. If at least one of those cells is white, then T_X=1+min (W, T_Y₁ if Y₁ is black, T_Y₂ if Y₂ is black, ...), where W is the average value of T_Z over all white cells Z. This equation says: we can either go to a black cell, in which case we know the expected time, or to a white cell, in which case the expected time is the average expected time over all cells we could be teleported to.

It might seem that we already have a working dynamic programming solution - but we don't. The equations we've written down have cycles. Adjacent cells depend on one another, and many cells, including some white cells, depend on all white cells. So we can't just compute the values of T.

Here's the problem-specific part: we will now simplify the equations so that it becomes clear how to find T. First, suppose we stand on some cell. It's not hard to see that there are only two feasible strategies: either we go to the bottom-right cell using only black cells, and using the shortest possible path over black cells, or we go to the closest white cell in order to teleport. Indeed, if we ever go to a white cell, then our state is completely reset - we can end up in any white cell. So there's no need to pick a particular white cell to jump to, and it's fastest to go to the nearest one.

Now, let's consider the value of T_Z for a cell Z. Let A_Z be the length of the shortest path over black cells from Z to the bottom-right corner, and B_Z be the length of the fastest way to get to a white cell starting from this white cell. Then T_Z=min(A_Z,B_Z+W). Now remember that W is the average value of T_Z over all white cells Z: W=sum(T_Z)/NW, where NW is the number of white cells. So W=sum(min(A_Z,B_Z+W))/NW. We have an equation over W, but we can't solve it immediately because it has 'min's in it.

Since A and B are integers, we could tell which part of each 'min' is smaller if we know the integer part of W: K<=W<K+1. Indeed, we can just compare A_Z with B_Z+K. If A_Z is smaller or equal, then it will be the value of the corresponding 'min'. If A_Z is greater, then the corresponding 'min' is simply B_Z+W. After we know what each 'min' turns out to be, we're left with a simple linear equation on W that we can easily solve. The only remaining thing is indeed to check that K<=W<K+1. Of course, we also need to find the proper K. We can just check all values starting from 0, and exactly one will yield the answer. Alternatively, it's not hard to see that if the value of W that we obtain from the above equation ends up being larger than K, then the value of K is too small, and if it's smaller than K, then K is too big - that allows us to do a binary search on K.

Having found W, we can solve the rest of the problem easily. Remember that the answer for each cell, black or white, is just T_Z=min(A_Z,B_Z+W). One further observation that doesn't help much in this problem: B_Z for white cells is either 1 or 2. If there's another white cell next to Z, we can just jump there, and if not, we can go to any neighboring black cell and back.

What do you feel when looking at the above solution? Well, for one, it's rather long. But at the same time, it didn't require any out-of-nowhere tricks or fancy algorithms (well, it did require breadth-first search for shortest paths over black cells). The only real requirement is the ability to carefully simplify the problem step by step until it becomes solveable. Note that we could make many wrong assumptions that would lead us to an incorrect answer - the main difficulty is to be careful and determined enough to avoid that. Many programming contets problems are like that, and I think this problem is an excellent example that you can train on to improve this skill. I think this skill is also extremely useful in software engineering and helps me a lot in my daily work.

Do you have a favorite problem of this kind? Please answer in comments!

Sunday, October 6, 2013

This week in competitive programming

The past week didn't have any major competitions in the beginning, but it ended with one big contest on each of Friday, Saturday and Sunday.

On Friday, Codeforces Round 204 took place (problems, results). It featured five problems, each requiring a non-trivial insight/trick to be solved. Congratulations rng_58 for solving them all and achieving a very compelling victory! I could only solve three, and was a bit demoralized by seeing a lot of people submit problem E which I had no clue about. You can follow my lackluster performance here, although the video doesn't show the frustration:

Saturday was the day of TopCoder SRM 593 (problems, results). The problems were very technical and required careful implementation, in stark contrast to yesterday's mostly "a-ha" questions. Of course, this has also resulted in a lot of challenge opportunities. I was lucky that incorrect solutions for the easy problem in my room remained throughout the entire challenge phase, allowing myself full 15 minutes of challenging. Had there been somebody else in my room looking for the same bugs as I were, Egor would probably top the scoreboard. This is yet anothe example of the basic principle: always challenge the problem with the easiest solution first. Even if you think the solution for difficult problems are more likely to be incorrect, the time it takes to understand just one of them is much more than the time it takes to check a lot of easy solutions. Since a successful challenge is +50 no matter what the problem is, the choice is clear. You can follow my contest here:

Finally, on Sunday, the Fourteenth Open Cup has started. The final results of today's contest are still not published, so it will have to go in the next week's summary.

I also want to share an interesting similarity between this week's problems. Codeforces problem A was: you are given 2N numbers that you can round either down or up to the nearest integer, but exactly N numbers need to be rounded down, and N rounded up. What is the smallest possible difference between the sum of original numbers and the sum of the rounded numbers?

Somehow, I couldn't solve this problem. My thinking went like this: we need to make 2N choices, N one way and N the other way, so that the sum of the chosen numbers is as close as possible to the given number. That is a natural dynamic programming problem: let's calculate whether we can make A choices "the first way" from the first B numbers, yielding the sum C. But the number of possible combinations of A, B and C is ay too big - what to do?

It turned out that I was missing one simple observation. Suppose there are no exact integers in the input, so we have a full set of 2N choices. Then no matter how we choose N numbers to round up, and N numbers to round down, their sum will be the same! It's not hard to see that it will be the sum of all numbers rounded down plus N. Exact integers add some trouble, but the problem remains easily solvable after this observation is made. In the above dynamic programming terms, we can now see there will be just one feasible value of C for each (A, B) pair and thus there will be much less states (not to mention we don't need the dynamic programming at all now).

Now, TopCoder's medium problem went like this: you are given N unknowns, where each unknown lies in a given range [L_i, R_i]. You want to separate the unknowns into two parts in such a way that the maximum possible different between the sums of unknowns in each part is minimized. For example, if the first part contains unknowns [2, 5] and [3, 4], and the second part contains just one unknown [5, 6], then the sought difference is 4: the sum of the first part can be any number between 5 and 9, and the sum of the second part is any number between 5 and 6. The worst case is when the first sum is 9 and the second sum is 5, for the difference of 4.

Again, it's easy to jump to dynamic programming quickly. Suppose we have processed the first K unknowns, and picked whether each unknown goes into the first part or into the second part. It's not hard to see that our state is the range of possible values for the difference between the first part and the second part. For example, suppose the three unknowns discussed above have been processed. Then the difference between the first part and the second part can be any number in range [-1, 4].

How does the dynamic programming transition look like? Suppose the current range for the difference is [P, Q], and we put unknown [A, B] into the first part. Then the new range for the difference is [P+A,Q+B]. Had we put the unknown into the second part, the range would be [P-B, Q-A].

But we aren't done yet. There are way too many combinations for K, P and Q. In order to solve this problem, you need one more observation. The difference between Q and P is always equal to the sum of lengths of ranges of the unknowns! In other words, adding an unknown [A, B] to either part increases Q-P by exactly B-A. So the parameter P is redundant, and we can only keep K and Q in our dynamic programming state, which was good enough in the TopCoder problem.

These two problems are examples of a somewhat general trick: if you come up with a dynamic programming solution that has too many states, it might well be that most states are unreachable and/or some states can be joined into one since they are handled in the same way anyway. However, as always with dynamic programming, there's no general way to come up with this reduction in size (or is there? one might try to implement the dynamic programming and see if there are many unreachable states on random inputs).

Thanks for reading, and see you next week!