SEBICPU - Unofficial Editorial

narri · September 21, 2019, 3:03am

I only had a couple of days to write code for this problem (I solved my last regular problem very late in the week). However, it didn’t take long to figure out the following:

It is hard to manipulate only the higher bits of the register (because there is no sub-register to directly reference it). In particular, you can’t touch the top 32 bits of a register without either mutating potentially the entire number (either self-modification, or add/sub/mul/etc. with another register).
Registers can be changed into 0 or 1 in at most one operation (0 via self-xor is obvious, I leave 1 as an exercise for the reader)
There’s a lot of different operations to explore. Doing it manually is probably not the way to go in general; there isn’t a lot of time to come up with manual insights.
It is best to record the manipulations, then actually do them later when you have confirmed it as the course of action. This allows easy searches.

Base Case

First, I will discuss the base algorithm, which is setting a register to an arbitrary value:

Algorithm:
Lets assume you have a register that is any number (0xpppppppppppppppp) and you need to make 0xWWWWXXXXYYYYZZZZ

Set the highest remaining 16 bits of the goal in the lowest 16 bits of the register (0xppppppppppppWWWW)
Shift left by 16 bits if you haven’t reached the end yet. (0xppppppppWWWW0000)
Set the next set of bits, etc. (0xppppppppWWWWXXXX)

I chose to examine the manipulation of the lowest 16 bits because there were the most opportunities for manipulation (x,h,l registers). So how do we fill the low 16 bits? In order to shift left by 16 bits, of course you need a spare register with the value of 16 in it. My insight is that the spare register with the value of 16 is also useful in manipulating the lower 16 bits of my target register as well. 16 turns out to give a lot of useful operations, such as shifting left/right by 4 bits (via mul/div) or clearing all but the last 4 bits (via mod).

Therefore, I coded a BFS to take any value of the lower 16-bits and change it to a target value with the ability to use a spare read-only register with value 16 as well. This turns out to be fairly efficient (something on the order of 6-7 operations on average, with a worst-case of 8-9). However, the general case takes much too long to run (at least the way I wrote it), and I would likely have timed out.

The next insight I had was that other than the first iteration of the above algorithm, the lower sixteen bits will start at 0 because of the shift. Therefore, I precalculated the entire BFS search tree assuming a starting value of 0 (and a spare register with value 16). I also precalculated with a starting value of 1 (since it takes at most one operation to change a register to 1). This makes things very fast again. Thus our real algorithm is:

When setting the lower 16 bits, test first setting the bits to 0 or 1, then use the stored BFS search results. Take the path with the fewest commands.
Shift left by 16 bits.
Use the results of the BFS search starting from 0, etc.

We can also save some iterations if the upper 32 bits or upper 48 bits are already correct (by either not shifting, or shifting using the eyx register rather than the full ryx register).

Given that base case, we can build on that for the other types (because we usually have to set 1 or 2 registers at least).

Type 2
We set A using the base algorithm. Then B and C can be done by making them the appropriate 1 or 2 bit number, then XOR-ing A into B/C. For this one, I did a BFS on all 64-bit numbers with at most 2 bits set (and allowing a special starting state for starting numbers with 3 or more bits). This was done live and happened to be fast enough. I also added an optimization where after we have set A,B,C to the correct values for the tuple, we can test immediately set B and C back to the 1/2-bit values so that we have the possibility of using less move to shift some bits around. This seem to work extremely well (and contributed to major jumps on the leaderboard).

Type 3
This is very, very simple, as we can store D in a register, then set one of the registers using the base algorithm. No matter which one of the three are chosen, we can then derive the other two values in 2 operations each by adding/subtracting K as appropriate. Thus, we can simulate which one is best before actually performing the operations.

Type 5
This case is very similar to type 3, but instead we set two of the registers using the base algorithm, then deriving the third value immediately from a stored value K.

What happened to type 4?

Luckily, nobody else appears to have figured out type 4 during the contest either. The proper way to do this is, after the first triple, to simulate all 60^3 possible moves, then find which set of moves gives the 3 target values. This yields scores of <300.

Other ideas to try that I didn’t have the time for

You could try to come up with a fancy multiplication that would set the upper 32 bits at once by making the lower 32 bits something convenient, probably by squaring. I couldn’t figure a way that wasn’t just a clunkier version of my base case algorithm since you would still be left trying to set the lower 32 bits after the multiplication.
Reorder the goal triples into a best path. Especially for Type 3 (where we can work in groups of 8), we can gain some operations if we find commonalities in bits or even entire registers between triples by trying many permutations first.
More clever use of the 4th register. I was always setting a register to 16 and leaving it like that the entire time. Maybe other values can work better sometimes.
Make the general 16-bit to 16-bit DFS work faster. This would make the first 16 bits of the base case algorithm work optimally (instead of taking an extra step to make the bits 0 or 1 first).
Do a version of the type2 search that takes both targets into account simultaneously (because potentially you can use one target to aid in setting the other).