Sunday, July 23, 2017

A red-black week

Last week, Codeforces presented the problems from the VK Cup finals as two regular contests. First off, Codeforces Round 423 took place on Tuesday (problems, results, top 5 on the left, analysis). W4yneb0t has continued his string of excellent Codeforces performances with the best possible result - a victory, which has also catapulted him to the second place in the overall rating list. Well done!

In between the Codeforces rounds, TopCoder held its SRM 718 very early on Thursday (problems, results, top 5 on the left, anlaysis). The round seems to have been developing in a very exciting manner: three people submitted all three problems during the coding phase, then two of them lost points during the challenge phase, and the last remaining person with three problems failed the system test on the easiest problem! When the dust settled, snuke still remained in the first place thanks for his solution to the hard problem holding. Congratulations on the second SRM victory!

Codeforces Round 424 rounded up the week's contests (problems, results, top 5 on the left, analysis). TakanashiRikka was the fastest to solve the first four problems, and (probably not a sheer coincidence :)) the only contestant to have enough time to consider all cases in the hardest problem correctly while solving at least one other problem. Congratulations on the well-deserved first place!

Here's that tricky problem. You are given an undirected graph with 105 vertices and edges. You need to assign non-negative integer numbers not exceeding 106 to the vertices of the graph. The weight of each edge is then defined as the product of the numbers at its ends, while the weight of each vertex is equal to the square of its number. You need to find an assignment such that the total weight of all edges is greater than or equal to the total weight of all vertices, and at least one number is nonzero.

In my previous summary, I have mentioned a difficult IPSC problem: we start with a deck of 26 red and 26 black cards, and a number k (1<=k<=26). The first player takes any k cards from the deck, and arranges them in any order they choose. The second player takes any k cards from the remaining deck, and arranges them in any order they choose, but such that their sequence is different from the sequence of the first player. The remaining 52-2k cards are shuffled and dealt one by one. As soon as the last k dealt cards exactly match one of the player's sequences, that player wins. In case no match happens after the cards run out, we toss a coin, and each player wins with probability 50%. What is the probability of the first player winning, assuming both play optimally?

Solving this problem required almost all important programming contest skills: abstract mathematical reasoning, knowledge of standard algorithms, coming up with new ideas, good intuition about heuristics, and of course the programming skill itself.

We start off by noticing that the second player has a very simple way to achieve 50% winrate: he can just choose a sequence that is a complement of the first player's sequence (replace red cards by black and vice versa), and then everything is completely symmetric.

How can the second player achieve more? He has two resources: first, he can choose a string that is more likely to appear in the sequence of the remaining cards. Second, he can choose a string that, when it appears together with the string of the first player, tends to appear earlier.

The strings that are more likely to appear are those that leave an equal proportion of reds and blacks (after taking out the string of the first player once and the string of the second player twice), and have no borders (prefixes that are equal to suffixes). This is because we can count the number of ways a given string can appear by multiplying the number of positions it can appear in by the number of ways to place the remaining characters after the matching part is fixed. The number of ways to place the remaining characters is maximized then the remaining characters have equal numbers of blacks and reds. This slightly overcounts the number of ways because in some cases the string can appear more than once; the lack of borders minimizes the number of such occurrences.

The strings that tend to appear earlier when both appear are those which have a suffix which matches a prefix of the first player's string. At best, if the first player string is s+c, where s is a string of length k-1 and c is a character, the second player should pick his string from 'r'+s and 'b'+s. In this case as soon as there's a match of the first player's string not in the first position, we can have a >50% chance to have a match of our string one position before.

Now we can already make the first attempt at a solution: let's try likely candidates for the first player's best move - it should likely be among the strings that have the most appearances; the second player should then choose either another string with lots of appearances, or a string that counter-plays the first player's string in the manner described above. However, this is not enough to solve the problem - we will get a wrong answer.

As part of implementing the above solution, we had to also implement the function to count the sought probability for the given pair of strings. It's also not entirely trivial, and can be done by using dynamic programming where the state is the number of remaining red and black cards, and the state in the Aho-Corasick automaton of the two strings.

So, where do we go from there? Since we already have the function that computes the probability, we can now run it on all pairs of strings for small values of k and try to notice a pattern. We get something like this:

1 0.5 r b
2 0.5 rb br
3 0.3444170488792196 rbr rrb
4 0.35992624362382514 rrbb rrrb
5 0.3777939526283981 rrbrb brrbr
6 0.413011479190688 rbrbrr brbrbr
7 0.45319632265323256 rrbrrbb brrbrrb
8 0.4782049196004824 rrbbrrbb brrbbrrb

No obvious pattern seems to appear. However, we can notice that for large values of k, more precisely when 3k>52, the answer will be 0.5 simply because there is not enough remaining cards for either string to appear. So we only need to research the values of k between 9 and 17 now.

And here comes another key idea: we need to believe that by cutting enough branches early, our exhaustive search solution can run in a few minutes for all those values. At first, this seems improbable. For example, for k=16 we have 65536 candidates for each string, and four billion combinations in total, not to mention the Aho-Corasick on the inside. However, from our previous attempts at a solution we have some leads. More precisely, we know which strings of the second player are the most likely good answers for each string of the first player.

This allows us to get a good upper bound on the first player's score for each particular string reasonably quickly, which leads to the following optimization idea: let's run the search for all strings of the first player at the same time, and at each point we will take the "most promising" string - the one with the highest upper bound so far - and make one more step of the search for it, in other words try one more candidate for the second player's string, which may lower its upper bound. We continue this process until we arrive at the state where the most promising candidate does not have anything else to try, because we already ran through all possible second player strings for it - and this candidate then gives us the answer.

This search runs relatively fast because for most first player strings, our heuristics will give us an upper bound that is lower than the ultimate answer very quickly, and we will stop considering those strings further. It is still quite slow for larger values of k, so we need a second optimization on top: we can skip the Aho-Corasick part in the simple case where there's simply not enough cards of some color for the second player's string to appear. With those two optimizations, we can finally get all the answers in a few minutes.

Thanks for reading, and check back soon for this week's summary!

No comments:

Post a Comment