Algorithms Weekly by Petr Mitrichev: April 2016

Monday, April 25, 2016

0.636 of a week

TopCoder SRM 689 on Saturday presented two very creative problems (problems, results, top 5 on the left). The medium problem required mostly mathematical thinking, while the hard problem involved both mathematics and algorithms. Scott Wu has figured out both, and didn’t forget to add 150 challenge points to stay in the first place – way to go!

The more algorithmic one went like this: you are given 2n points on the plane, no three points on the same line, and a matching between them – n segments connecting the points to each other in such a way that each point is the endpoint of exactly one segment. Your goal is to find another matching with an additional property: the segments must not intersect. That would be a classical problem, but here there’s an extra constraint that changes the problem completely: the total length of the segments in your matching must be at least 0.636 (zero point six three six) times the total length of the segments in the given matching. n is at most 100.

This problem is yet another example of the brilliant problems that the TopCoder admins and problem authors manage to come up with time and time again. Huge kudos to cgy4ever and all the TopCoder problem authors!

VK Cup 2016 Round 2 took place a day later (problems, results, top 5 on the left, parallel round without first problem results) . Team SPb SU Base have continued their super impressive recent form, both in personal and in team competitions – congratulations to -XraY- and ershov.stanislav!

Last week I’ve described a nice problem: you are given a 20x100000 table of zeroes and ones. You are allowed to invert (replace 0s with 1s and vice versa) rows and columns of this table. What is the smallest number of ones you can make the table contain?

The table is hugely asymmetrical: there are a lot of columns but just 20 rows – and we definitely need to exploit that. Since the order of inversions doesn’t change the result, let’s say we start by inverting the columns, and then handle the rows. Consider a column. Let’s say the ones in it form a binary number x. Let’s also say that the ones corresponding to the rows we’re going to invert form another binary number y. Then the ultimate number of ones in this column is the number of ones in x^y (here ^ stands for bitwise exclusive-or), let’s denote it ones(x^y). Note that we can also invert the column before inverting the rows, in which case we will get ones((~x)^y), which is actually equal to n-ones(x^y), where n is the number of rows, and ~ is the bitwise negation of a n-bit number. Our goal is to pick the y that minimizes the sum of min(ones(x^y), n-ones(x^y)) over all columns.

Note that the binary numbers are used here purely for the purposes of an efficient and easy-to-code implementation – fundamentally we’re just operating with sets of rows.

The key insight is: let’s use dynamic programming to count for each y and each number k from 0 to n the number of x’s such that ones(x^y)=k. Let’s call it a(y, k). It’s easy to see that knowing these numbers allows counting the above sum easily, and thus picking the value of y that minimizes it.

For k=0, this is easy – we just need to count the number of times each bit pattern appears in a column (we can just iterate over all 100000 columns and increment the corresponding value in an array). For k=1, it’s also somewhat easy: a(y, 1) is equal to the sum over all y’ that differ from y in exactly one bit of a(y’, 0). Continuing this idea to larger values of k is not as easy, though: we will count each way of flipping two bits twice, depending on the order in which we flip the bits, and even more worryingly we might flip the same bit twice, thus declaring that a number differs from itself in two bits. One can probably account for this via some clever combinatorics, but there’s a better way.

It’s actually quite logical: since we don’t want to overcount different orders of flipping the bits, let’s always flip the bits in a pre-determined order. In other words, we add another dimension to our dynamic programming: b(y, k, t) is the number of x’s (remember, those are the bit patterns of columns of the input matrix) such that ones(x^y)=k, with an additional restriction that x and y may only differ in the first t bits. Now we can easily see that b(y, k, t+1) = b(y, k, t) + b(y ^ (1 << t), k - 1, t): we either flip the t-th bit, or we don’t. And the number a(y, k) that we need is simply found as b(y, k, n). This solution needs O(2ⁿn²) very simple operations, which fits in time for n=20.

As shown on the picture on the left, this dynamic programming is similar in spirit to the discrete Fourier transformation. I’m not sure if that similarity has something meaningful behind it.

There was also an Open Cup round this Sunday, sharing the problems with an international (ex-USSR) onsite competition called Vekua Cup, but its results are still not available, so I will cover it next week.

I was preparing this post on a plane, so instead of the usual nature photo I only have this :) Thanks for reading, and check back next week!

Sunday, April 17, 2016

A greedy week

This week started with TopCoder Open 2016 Round 1B on Tuesday (problems, results, top 5 on the left). Just like in Round 1A, the top of the scoreboard featured many contestants who don't compete much anymore — but the first place went to Kankuro who's actually quite active in other competitions, just doesn't like doing TopCoder SRMs for some reason. Congratulations, Vlad!

TopCoder SRM 688 followed on Friday (problems, results, top 5 on the left, my screencast). The main story of this round was that of the 1000-pointer, which went like this: you are given a sequence of opening and closing parentheses. You want to color each parentheses either red or blue, in such a way that all red parentheses form a correct parentheses sequence, and all blue parentheses form a correct parentheses sequence. Out of all ways to do that, you want to pick the one that minimizes the total cost of the two resulting parentheses sequences. The cost of a parentheses sequence, in turn, is defined as the sum of costs of all its matching parentheses pairs. And finally, the cost of a matching parentheses pair in a parentheses sequence is defined as the square of the maximum level of nesting of parentheses present inside that pair (i.e. the cost of "()" is 1, the cost of outer parentheses pair in "(()())" is 4, and the total cost of "(()())" is 6).

Almost all accepted solutions for this problem, to the best of my knowledge, implement the following greedy algorithm: go from left to right, and color each parentheses (either red or blue) in such a way that the balance of the red sequence is always greater than or equal to the balance of the blue sequence, and they differ by at most 1. It's not hard to see that this rule determines the color of each parenthesis uniquely.

For example, consider input string "((())()(()))". Here's how the greedy solution works on this input:
Char Color Red String Red Balance Blue String Blue Balance
---------------------------------------------------------------
( red ( 1 0
( blue ( 1 ( 1
( red (( 2 ( 1
) red (() 1 ( 1
) blue (() 1 () 0
( blue (() 1 ()( 1
) blue (() 1 ()() 0
( blue (() 1 ()()( 1
( red (()( 2 ()()( 1
) red (()() 1 ()()( 1
) blue (()() 1 ()()() 0
) red (()()) 0 ()()() 0

Intuitively it seems like this algorithm splits the parentheses most evenly into two halves, and thus the cost of each parenthesis pair will be as small as possible, and thus the total cost will also be as small as possible. A formal proof, however, evades me. The problemsetters also didn't expect this greedy solution, and expected something much more complex. Do you see how to prove (or find a counterexample) to this solution?

Some further points:

This solution passes stress test using random sequences of length up to 16.
It seems that this solution also works with any other non-decreasing cost function (from nesting level to cost).
Minor details are important: for example, if you don't always maintain that it's the red sequence that has higher balance when the total balance is odd, and allow the blue sequence to have higher balance sometimes, then the solution isn't optimal anymore. For example, a somewhat logical implementation is to always put the next parenthesis to the red sequence if the current balances are equal, regardless of it being closing or opening. It fails, for example, the following testcase: "((())(()))". It splits it as "(())(())" + "()" for the total cost of 4+4+1+1+1=11, while "(()())" + "()()" is better at 4+1+1+1+1=8.

At the same time with the SRM, CROC Championship 2016 Final Round took place in Moscow (problems, results, top 5 on the left, online round results with 4/5 problems in common, partial analysis in Russian). The dynamic scoring system on an onsite competition with just 50 contestants meant that just the two most difficult problems really mattered. Gennady was able to claim the victory by solving one of those problems well before anybody else could — congratulations!

Problem C was the most interesting for me in this round, in particular because it had a very simple and beautiful statement: you are given a 20x100000 table of zeroes and ones. You are allowed to invert (replace 0s with 1s and vice versa) rows and columns of this table. What is the smallest number of ones you can make the table contain?

Google Code Jam 2016 continued with the Round 1A on Saturday (problems, results, top 5 on the left, analysis). nika has managed to solve all three problems in just 21 minutes — amazing! Congratulations to everybody who advanced to Round 2, and those who don't have two more chances in the coming rounds 1B and 1C.

Thanks for reading, and check back next week!

Monday, April 11, 2016

A binary week

Last week was reasonably calm, TopCoder SRM 687 being its highlight (problems, results, problem discussion). Once again, tourist didn't give others many chances, but matthew99a has managed to shine in a slightly different way — by getting into the top 5 for the third SRM in a row. Congratulations to both!

Now I want to come back to the problem I mentioned in my previous summary. However, since I've published it just a few minutes earlier, I'd like to mention once again that the problem is super interesting and that I encourage you to try solving it yourself before reading the solution!

The problem was to interactively guess a hidden number between 1 and n (which was up to 10⁹) spending at most log₂(n+1)+0.1 steps on average (notice the lack of ceil() around the log). One guess was asking "is the hidden number less than x?" for any value of x. The guessing was repeated 10000 times with possibly different (and possibly the same) hidden numbers to compute the average.

Suppose k=ceil(log₂n). Then the standard binary search requires k-1 steps for 2^k-n hidden numbers, and k steps for the remaining n-(2^k-n)=2n-2^k hidden numbers. On average, this fits under the log₂n+0.1 limit (which by itself doesn't seem to be an obvious mathematical statement, but it feels right and can be verified empirically, and that's probably how the +0.1 part was chosen by the problemsetter); however, the hidden numbers are not chosen randomly, so we might not get the average treatment.

On the other hand, there are many possible binary search trees with such structure (2^k-n leaves on level k-1, and 2n-2^kleaves on level k), That leads to the following solution framework: we need a family of such binary search trees such that every hidden number is equally likely to be on level k across the entire family (and thus equally likely to be on level k-1, too). Our strategy will then be to pick a tree from the family at random, and then do binary search according to it, giving us the required average number of steps irrespective of what the hidden numbers are!

How does one find such a family? Well, after formulating so clearly what we need to find, the actual construction is not that difficult. It's quite clear that the only restriction we have on the tree is that leaves on level k have to come in consecutive pairs; modulo that restriction, we can construct any possible binary search tree. More precisely, let t=n-2^k-1 be the number of those consecutive pairs of leaves on level k.

One tree in our family will have numbers (1, 2), (3, 4), ..., (2t-1, 2t) as level-k leaves. Another tree will have (3, 4), (5, 6), ..., (2t+1, 2t+2). We can continue like this, taking the leaf numbers modulo n until we make a full cycle, and it's easy to see that the resulting family has each leaf at level k the same number of times (the example for n=6 is on the left).

This approach requires n to be even, as otherwise at some point we'd need to group the first and last numbers into one consecutive pair, and we can't do that. Here's when the "+1" part in the problem statement comes into play: since we're allowed log₂(n+1)+0.1 guesses on average, and not log₂n+0.1, we can afford to just solve the problem for n+1 when n is odd.

Thanks for reading, and check back next week!

A doubles week

The March 28 - April 3 week was ignited by Monday's VK Cup 2016 Round 1 on Codeforces (problems, results, top 5 on the left, parallel round results, analysis). Tourist and subscriber left little doubt as to who's the favorite of the entire VK Cup, solving five problems in 1 hour and 6 minutes, while it took other teams at least 1 hour and 41 minutes to do the same!

Just a few hours later, in the early hours of Tuesday morning TopCoder SRM 686 allowed algorithmists from other time zones to enjoy some problem solving (problems, results, top 5 on the left, solutions discussion). jcvb has denied matthew99a his second consecutive win — congratulations to both on great performance!

On Sunday, the penultimate Open Cup 2015-16 round, the Grand Prix of Moscow, challenged teams with some really difficult, if a bit tedious, problems (results, top 5 on the left, analysis). Team SPb SU Base demolished the competition, solving half of their problems in the last hour!

My personal favorite in this round is problem A. Its submission stats: 296 submitted, 0 accepted. And the problem was just about implementing binary search on n choices in log₂(n) time :)

More precisely, one needed to interactively guess a hidden number between 1 and n (which was up to 10⁹) spending at most log₂(n+1)+0.1 steps on average. One guess was asking "is the hidden number less than x?" for any value of x. The guessing was repeated 10000 times with possibly different (and possibly the same) hidden numbers to compute the average.

Where's the catch? In the lack of rounding. Normal binary search requires ceil(log₂(n)) steps in the worst case, and we can't afford that much here. If the hidden number was also random, then we'd get the required number of steps on average — but it's not.

For example, suppose n=6. Normal binary search requires 2 or 3 steps: it starts by splitting the numbers into [1,2,3] and [4,5,6]. Suppose the hidden number is in [1,2,3]. Then it splits the numbers into, say, [1] and [2,3], and in case the hidden number is 1, we've found it in 2 steps, while for 2 and 3 we require 3 steps. We could've split the triple into [1,2] and [3] instead, solving 3 in 2 steps and 1 and 2 in 3 steps, so one might expect that randomizing this choice will make our solution work under our limit of log₂7+0.1=2.907... on average. But on closer examination, we always require 3 steps for the number 2 (and for 5)! So if we're always asked to find 2 or 5, then we will not fit under the required average number of steps.

In other words, for n=6 we have to split the numbers into 2+4 instead of 3+3 sometimes! And to add insult to injury, we also have to take our previous decisions into account as the binary search narrows down. For example, suppose n=5. Most probably we'd start with 2+3 or 3+2 split for the first request, for symmetry reasons with probability 50% for each decision. Note that all numbers are not equal now: if number 3 is hidden, then we'll always get the bigger segment as the answer for our query! So in order to get a good number steps on average when 3 is hidden, we have to prefer splitting the remaining numbers as [1,2] vs [3], as opposed to [1] vs [2,3]!

This is where our team got stuck at the actual contest :) Can you see how to progress further?

Thanks for reading, and check back soon for last week's summary and this problem's solution!

Saturday, April 9, 2016

A toroidal week

The March 21 - March 27 week was much more relaxed than previous ones.

TopCoder Open 2016, one of the most important tournaments of the year, kicked off on Saturday with Round 1A (problems, results, top 5 on the left). Top 250 contestants who have competed in at least one SRM in 2016 have been automatically qualified to Round 2, so Round 1s are often a chance for mostly retired contestants to come back in style. This time it was nika who combined very good problem solving with excellent challenging to get the first place — awesome job! Please tell your friends at MemSQL that they should come back, too :)

The first problem mentioned in the previous summary, about high-school geometry, has an editorial explaining my approach, so I won't go into more detail here. The second problem, though, is worth discussing.

You were given a 8x8 grid, with three points on this grid designated as "houses", and three as "wells". You need to pick a positive integer d<=1000, subdivide each square of the grid into d times d small squares, obtaining a 8d times 8d grid, and then draw nine non-intersecting paths on this grid: from each house to each well. Of course, the problem as stated is impossible to solve since K_3,3 is not planar; to make it solvable, the grid was toroidal: the left edge was identified with the right edge, and the top edge was identified with the bottom edge (picture on the left from the original problem statement).

This problem has two parts: topological and combinatorial. The topological part is concerned with laying out nine required curves on a toroidal square in such a way that they don't intersect; the combinatorial part is placing those paths on a (subdivided) grid.

The key insight for the topological part is that it's actually very easy, and there are a lot of ways to do it. Why? Because we can embed a much more complex graph into a torus: K₇! Because of this, we can adopt the following approach for the topological part: let's draw the paths one by one in random order, and in case we fail, just try again. This is likely to work because there are so many possible topological configurations.

How do we draw the paths combinatorially? Let's start by simply drawing shortest paths using breadth-first search. Of course, this can fail - at some point there will be no path on the grid between the two points that we want to connect. But what can be the reason for that? There are two cases:

1) There's no path even if we forget about the grid restriction - the graph that we have drawn doesn't allow to connect the points, meaning that we've failed topologically.

2) There's a path, but as soon as we try to embed it into the grid, it disappears.

Our goal now is to avoid the failures of the second type. As mentioned above, in case we get a failure of the first type, we'll just restart the process from scratch with a different random ordering of paths.

The key idea now is that we can subdivide the grid as we go, instead of doing this at the start. In other words, first we build paths on the original grid. As soon as we can't build some path, we can subdivide each square of the grid into a 2x2 subgrid. As a result, now there will be free grid points between every two existing paths, and thus it's very unlikely that we can't draw a path because of combinatorial restrictions.

More precisely, the only combinatorial restriction that can still exist after that is around the endpoints of the path. If two existing paths enter an endpoint at 90 degrees to each other, then even after subdividing the grid we can't insert a third path between them. But since we need to draw only three paths for each endpoint, we can fix this issue by restricting the second path to exit it at 180 degrees compared to the first path. Then we will always be able to eliminate combinatorial obstacles by subdividing the grid!

I'm pretty sure there are many other approaches possible in this problem. Please share yours!

And of course, check back soon for the following week's summary, and don't forget to participate in the Google Code Jam 2016 Qualification Round — there are still a few hours left!

A Helvetic week

The weekdays of the March 14 - March 20 week were quite calm, but that was more than compensated by the super packed weekend.

Before we dive further into that March week, though, let me remind you that Google Code Jam Qualification Round starts in a few minutes. Google Code Jam's problems are always among the most interesting (yeah, I'm biased since I help create them, too), so consider giving it a try! You can already register, and the problems will appear in a few minutes. This round lasts for 27 hours, but you don't need to be present all that time - a few hours at a convenient time should work just fine. Good luck!

The competitive programming extravaganza started Friday evening with the Elimination Round of the 2016 CROC Championship on Codeforces (problems, results, top 5 on the left, parallel round results). The deciding problem G, quite surprisingly, involved high-school-style geometry.

You were given a set of 100000 points, and needed to draw a line through some two of them. The line you draw must have the lowest imbalance out of all possibilities. The imbalance of a line is defined as the maximum over all points X on that line of |XP-XQ|, where P and Q are two additional given fixed points.

At the first sight, this looks like a very strange definition. However, as you progress through the layers of this problem, the solution ends up being quite exciting, so I encourage you to try :)

Early on Saturday, Codeforces ran another company round, called IndiaHacks 2016 (problems, results, top 5 on the left). Once again TooDifficult earned the most points; note that he did earn the most in the above CROC round too, he was just participating in the parallel unofficial track — congratulations!

A bit later on Saturday an onsite Swiss competition called Helvetic Coding Contest 2016 took place in Lausanne (results, top 5 on the left). To the best of my knowledge, the problems will be used for an online contest soon, so I will not spoil them.

Instead, I'd like to mention that it's really nice to see such "local" competitions spread all over the world, helping build algorithmic programming communities and thus enabling the enthusiasts to make new friends and discover new opportunities. Thanks a lot for the organizers from EPFL PolyProg for this exciting event!

TopCoder SRM 685 wrapped up the Saturday (problems, results, top 5 on the left). ACRush has continued his climb back to the top of the rankings, but it was matthew99a who did the best this time — congratulations to all winners!

And finally, Open Cup 2015-16 Grand Prix of Baltics wrapped up the week on Sunday (results, top 5 on the left). Without Warsaw Eagles participating, team Past Glory is unstoppable — congratulations on another victory!

Problem B was the most exciting in this round for me. You were given a 8x8 grid, with three points on this grid designated as "houses", and three as "wells". You need to pick a positive integer d<=1000, subdivide each square of the grid into d times d small squares, obtaining a 8d times 8d grid, and then draw nine non-intersecting paths on this grid: from each house to each well. Of course, the problem as stated is impossible to solve since K_3,3 is not planar; to make it solvable, the grid was toroidal: the left edge was identified with the right edge, and the top edge was identified with the bottom edge.

Every time a problem requires one to come up with some not necessarily optimal structure (for example in this problem the paths don't have to be the shortest), the added freedom usually means that the solution requires more creativity than usual, so I hope you will enjoy solving it! To test your solution, you can just generate random locations of houses and wells on the 8x8 grid many times.

I've mentioned two interesting problems in my last summary. The first one was quite unusual:you're given an array of N<=50 positive integers, summing up to M<=1000. For each of N! possible permutations of the array, we compute the value M!/(a[0]+a[1])/(a[0]+a[1]+a[2])/.../(a[0]+a[1]+...+a[N-1]) - dividing M! by the product of partial sums of the permuted array, except the first one. What is the sum of those values over all N! permutations?

First, we try to solve the problem where the first partial sum is not skipped, in other words when the expression above is also divided by a[0]. Instead of dividing the factorial by the partial sums, we'll simply remove the corresponding numbers a[0], a[0]+a[1], ..., a[0]+a[1]+...+a[N-1] from the factorial's product. The last removed number is M=a[0]+a[1]+...+a[N-1], the next one is M-a[N-1], and so on. Which combinatorial objects does such a product count?

Experienced contestants might recognize the pattern of almost-factorial with some numbers skipped. It arises when we try to count permutations by cycles. Suppose we have a permutation on numbers between 1 and M. Consider its cycle passing through 1. There are M-1 choices for the next element in the cycle, M-2 choices for the element after it, which corresponds to our product, and when the number M-a[N-1] is skipped, we say that the cycle goes back to number 1, and then we consider the cycle passing through the smallest untouched number, and so on until we build all cycles.

So our product counts permutations where the cycle passing through 1 has a[N-1] elements, the cycle passing through the smallest number untouched by the first cycle has a[N-2] elements, and so on. When we sum those products over all permutations of array a, we simply get the total number of permutations with the given cycle lengths (a[0], a[1], ..., a[N-1]), where cycles of the same length also have an order on them.

This is where the key simplification happened: after summing by all permutations of a, we actually have to count combinatorial objects that are easier to describe! To solve the problem, we must now adjust for the divisor a[0] being missing (hint: now instead of counting permutations, we need to sum the lengths of the cycle with the largest smallest number), and learn how to count the sum quickly (hint: dynamic programming with state "how many cycles we've already processed, what is the position of the number that is destined to become the largest smallest number in a cycle").

The second problem from the last summary was more standard, but not less beautiful: you're given two strings of 0s and 1s with at most 10000 characters in each. You're allowed to take any substring of a string which has an even amount of 1s, and reverse it. Can you obtain the second string from the first string?

Let's go from left to right, and find the first positions where the strings don't match: one of them has a 0 in that position, and the other has a 1. Let's find the next two ones in the string that contains 0 there, and reverse the substring starting with this 0 and ending with the second one. Now both strings have ones in this position, and we can continue going to the right.

When would this approach stop working? When there's less than two ones left in one of the strings (and thus in the other, too, if they have the same amount of ones). It's tempting to say that if the sole remaining ones in the two strings match, then we're done, otherwise there's no solution. But how to prove that?

During the actual competition, our team tried to prove this for some time, failed to do so, tried to come up with a counterexample, also failed to do so, and decided to take chances and submit this solution - and it was accepted!

After the competition ended, we were still very much interested to prove why this solution works, and finally found the invariant: if we look at positions of ones in the string, and make an alternating sum out of them (p₁-p₂+p₃-p₄+...), then the value of this alternating sum doesn't change when we flip a string with an even amount of ones in it! And it's quite obvious that this sum is different when all ones match except one.

Thanks for reading, and check back soon for more algorithms!

Saturday, April 2, 2016

An inverse combinatorics week

The March 7 - March 13 week was quite busy with competitions. First off, Codeforces Round 345 featured problems from a contest for high school students in Moscow (problems, results, top 5 on the left). Tourist has continued his string of Codeforces victories, despite taking a bit longer than second-placed TooDifficult to solve all problems.

Just a few hours later, TopCoder SRM 684 featured problems by Adam 'subscriber' (problems, results, top 5 on the left). The deciding hard problem presented a very unusual challenge: given a formal description for counting some unknown combinatorial objects, find out which combinatorial objects are being counted and then come up with a faster way to count them.

Here's the problem statement: you're given an array of N<=50 positive integers, summing up to M<=1000. For each of N! possible permutations of the array, we compute the value M!/(a[0]+a[1])/(a[0]+a[1]+a[2])/.../(a[0]+a[1]+...+a[N-1]) - dividing M! by the product of partial sums of the permuted array, except the first one. What is the sum of those values over all N! permutations?

Coming up with the combinatorial description is pretty hard here, so let me give you a vert small hint: what if the first partial sum wasn't missing - in other words, if the above formula was also divided by a[0]? Which objects does such formula count?

To wrap up the week, Open Cup Grand Prix of Tatarstan presented teams preparing for the ICPC World Finals in Thailand one more chance to test their skills (results, top 5 on the left). Team Past Glory has earned another convincing victory, while team Dreadnought from Shanghai JTU was the first among the teams coming to Thailand - congratulations all!

Problem K was a nice mathematical puzzle. You're given two strings of 0s and 1s with at most 10000 characters in each. You're allowed to take any substring of a string which has an even amount of 1s, and reverse it. Can you obtain the second string from the first string?

I'm sorry for yet another long pause, and hope to get back into the rhythm soon, so check back for new posts!