Some problems have such interesting names that it makes them hard to ignore. Pancake flipping is one of those problems; it is intuitive to reason about, yet likely not what you think about as you eat a stack of pancakes. Oddly the first paper published by Bill Gates was also on this topic. The original problem was posed by Jacob E. Goodman under an alias Harry Dweighter (hairy waiter) in a 1975 edition of Math Monthly.1
The chef in our place is sloppy, and when he prepares a stack of pancakes they come out all different sizes. Therefore, when I deliver them to a customer, on the way to thetable I rearrange them (so that the smallest winds up on top, and so on, down to the largest at the bottom) by grabbing several from the top and flipping them over, repeating this (varying the number I flip) as many times as necessary. If there are n pancakes, what is the maximum number of flips (in terms of n) that I will ever have to use to rearrange them?
Introduction
So we may consider the problem of "pancake flipping" to be equivalent to finding the maximum number of prefix reversals (flips) required to sort any given permutation (pancake stack) containing distinct elements of size n.
That sounds pretty interesting as a problem and if you had a suspicion that this problem keeps ticking categories in the "HARD" class of problems - you'd be on the right track. Earliest works conducted by Gates gave a bound for the maximum number of flips we'd ever have to perform for to sort a stack of pancakes (n) - which we will consider as the function
These bounds were given by Gates and his TA Papadimitriou2:
These have since been improved but aren't the focus of this note. Instead, I want to talk about some cool ways we can speed up the computation using dynamic programming.
So just how slow is it to compute f(n)? Well... lets examine the Online Encyclipedia of Integer Sequences, the sequence for f(n) is OEIS: A058986. We've only computed up to n=19. That seemed pretty crazy to me, but then I read the papers that calculated n=14,15,16 and I understood. Despite them being published in the mid 2000s, it took the authors over a month running their code in parallel for n=16.
A first attempt
So whats the the simplest method for calcuating f(n)? We can do a complete search (bfs). It may look something like this:
std::vector<int> reverse(int n, std::vector<int> s)
{
int l = 0, r = n;
while (l < r) {
std::swap(s[l], s[r]);
l++, r--;
}
return s;
}
std::vector<int> init(int n)
{
std::vector<int> ret;
for (int i = 1; i <= n; ++i)
ret.push_back(i);
return ret;
}
std::uint64_t hasher(std::vector<int> const& vec)
{
std::uint64_t seed = vec.size();
for (auto& i : vec) {
seed ^= i + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
return seed;
}
int f(int n)
{
int ret, size, i, j;
std::unordered_map<std::uint64_t, int> distance;
std::queue<std::vector<int>> q;
q.push(init(n));
ret = 0;
while ((size = q.size()) > 0) {
for (i = 0; i < size; ++i) {
std::vector<int> next = q.front();
q.pop();
for (j = 1; j < n; ++j) {
std::vector<int> vec = reverse(j, next);
std::uint64_t candidate = hasher(vec);
if (distance.find(candidate) == distance.end()) {
distance[candidate] = ret;
q.push(vec);
}
}
}
ret++;
}
return --ret;
}
So not too bad, one thing to notice here is the choice of hashing a vector into a uint64_t
. This will come in handy later, the current hash function has the opportunity for collisions at larger numbers but should be fine for now.
So what's the complexity here? Well its not good. We are checking every possible reversal for each new permutation. Ontop of that there are a total of n! permutations. So along the lines of:

pancake graph n = 5 1
The above graph shows the symmetrical properties of the pancake graph. Ff we look at n=1, 2 and 3 pancake graphs we can see the recursive structure. Where at each step, the previous graph duplicates n times.
The speedup
To lay some ground work and see if we can exploit the recursive structure, in order to avoid blind search (bfs).
Let be the set of all permutations of n symbols from 1 to n. Consider n distinct pancakes, where the smallest pancake corresponds to 1, ..., second largest to n-1 and largest to n. We can then define some stack of pancakes (a permutation) of size n as .
Lets define the minimum distance (number of flips) between some stack and the sorted pancake stack be .
Now onto the last setup we will need, lets consider all stacks of size n, which require a minimum of m flips to become equal to the sorted stack as: .
Having set all that up, we can revisit our earlier definition of f(n) and be present it as
Now comes the big leap. Lets consider some . Is it possible to construct the set of permutations (size=n) that require m flips using some union of sets where size = n-1?
Well lets go through an example. Lets say I have some stack of pancakes [1,3,2], by examination this stack requires a minimum of 3 flips. Hence its in the set . Now lets make this a stack of size 4, by adding 4 to the end of the stack [1,3,2,4]. What we notice is that, under any circumstance - a sorted suffix never needs to be flipped. By adding 4 to the end, we've placed the largest possible element at the end bottom of the stack so the distance must be the same as before.
Cool, can we extend this further? Well sure - what if we also reverse the entire stack after we append n to the end. Then we know it must be equal to the previous distance + 1.
Then lastly, after we've done this reversal - it follows that any prefix reversal (not of size n - beacuse that would cancel out) must also add an additional +1 to the distance.
From these observations, its possible then to construct
As the candidate set, where upon applying the set of operations mentioned above - we can find all possible permutations of distance m.
This finding saves us a lot of time - allowing us to build out diagonally from the source and construct each new set as a function of its ancestors.
Parallel
So when In testing this approach - it still takes a really long time for some number greater than 10. There is also a clear dependency at each level that prevents us from considering this an "embarrissingly parallel" problem.
So to distributed this work across an arbitrary number of nodes - perhaps we can take a "level-order" approach. Where we partition the candidate space for each stack size = n. Distributed this among some number of nodes p, then gather the results and continue onto n+1.
The obvious disadvantage here is that we are doing all the combining in a single node - however for practical purposes a merge sort approach doesn't shave off too much time - even for large values of n tested.
Using MPI the serial code executed on each node, for each layer could be:
vll manage_layer(vll stacks, int nstacks, int lb, int len,
int rank, int size) {
int tmp_dist, max_dist = 0, p[NMAX + 2];
ll cur, tmp, added = 0;
unordered_map<ll, bool> seen;
vll next_layer;
for (int i = 0; i < nstacks; ++i) {
if (i % size != rank)
continue;
cur = append_len(stacks[i], len);
if (!seen[cur]) {
seen[cur] = true;
// get dist and update max_dist
int_to_perm(len, cur, p);
tmp_dist = distance(len, p, UPPER_BOUNDS[len - 1], lb);
max_dist = tmp_dist > max_dist ? tmp_dist : max_dist;
// check if it meets the LB
if (tmp_dist >= lb)
next_layer.push_back(cur), added++;
}
cur = flip_int(len + 1, cur, len);
if (!seen[cur]) {
seen[cur] = true;
// flip the whole stack
// get dist and update max_dist
int_to_perm(len + 1, cur, p);
tmp_dist = distance(len + 1, p, UPPER_BOUNDS[len], lb);
max_dist = tmp_dist > max_dist ? tmp_dist : max_dist;
// check if it meets the lb
if (tmp_dist >= lb)
next_layer.push_back(cur), added++;
}
for (int i = 1; i <= len; ++i) {
tmp = flip_int(len + 1, cur, i);
if (!seen[tmp]) {
seen[tmp] = true;
// get dist and update max_dist
int_to_perm(len + 1, tmp, p);
tmp_dist = distance(len + 1, p, UPPER_BOUNDS[len], lb);
max_dist = tmp_dist > max_dist ? tmp_dist : max_dist;
// check if it meets the lb + 1
if (tmp_dist >= lb)
next_layer.push_back(tmp), added++;
}
}
}
return next_layer;
}
Where the management code, (worker and master) could distribute the work, work and barrier at each level. Progressing until we reach our desired target.
Speed
The solution proposed achieves about a linear speedup but is heavily inhibited by memory use for larger calculations (as it adopts a BFS) type approach.
For the serial code (on one processor), the time taken to compute n=13 is just over half an hour. On 128 processors (2.3gGhz Xeon E-5) it took only 22 seconds.
The race for n=20
Unfortunately there hasn't been much research interest into pancake graphs in recent years. To compute 20, it is really a mammoth task that requires access to a HPC network. The approach listed above won't quite work above n=16 without flushing to disk due to limits on main memory capacity.
These numbers usually don't increase by just one at each new discovery either. Likely the next person to tackle the problem - will reach for 21 or perhaps even 22.
refs
1 D.J. Kleitman, Edvard Kramer, J.H. Conway, Stroughton Bell, and Harry Dweighter. 1975. Elementary Problems: E2564-E2569. The American Mathematical Monthly (1975).
2 William H. Gates and Christos H. Papdimitriou. 1979. Bounds for sorting by prefix reversal. Discrete Mathematics (1979).
3 Syp, Wikipedia