(Replying to PARENT post)
(Like how linear search wins for small N, or insertion sort helps with small subarrays in introsort: more steps, but each step is also a lot cheaper on modern hardware due to fewer branch mispredictions, better cache locality, or something else that better fits the hardware's strengths.)
(Replying to PARENT post)
So, it's nice to see a detailed analysis of the structure like this! Perhaps if it becomes more popular, I will find more places to use it.
(Replying to PARENT post)
You don't need C++20. Even C++98 had std::bitset<>::count(), which has a nice conversion from unsigned long, and which, when compiled for SSE3 or better (e.g. Core2), uses a POPCNT instruction. It is pretty simple to produce various other results, including countl_zero, from a popcount, with just a couple of additional bitwise ops.
Modern compilers are happy to keep a bitset<32> in the same register as an unsigned long, so the conversion takes exactly zero cycles. POPCNT takes three cycles, and the extra bitwise ops another couple.
(Replying to PARENT post)
So in that case I would use this data structure even if it weren't faster. I can't count the number of times I have had to mess with inserting negative priorities into a min heap to create a max heap! We should just have one data structure that does both.
(though taking this idea to the logical extreme means we should just use Order Statistic Tree for everything since it not only gives you log(n) min/max, but also log(n) find kth and get rank of x)
(Replying to PARENT post)
Very useful when you need it!
(Replying to PARENT post)