Donald Knuth’s Algorithm D, its implementation in Hacker’s Delight and elsewhere

(skanthak.homepage.t-online.de)

👤graderjs🕑4y🔼168🗨️32

(Replying to PARENT post)

Just a note about the name of the algorithm: Knuth's The Art of Computer Programming indeed uses letters for algorithms, but the intended convention is that, for example, this algorithm be called "Algorithm 4.3.1D" (that's the name used in "Appendix C — Index to Algorithms and Theorems"). From page 2 of Volume 1 of TAOCP:

> The format above illustrates the style in which all of the algorithms throughout this book will be presented. Each algorithm we consider has been given an identifying letter (E in the preceding example), and the steps of the algorithm are identified by this letter followed by a number (E1, E2, E3). The chapters are divided into numbered sections; within a section the algorithms are designated by letter only, but when algorithms are referred to in other sections, the appropriate section number is attached. For example, we are now in Section 1.1; within this section Euclid’s algorithm is called Algorithm E, while in later sections it is referred to as Algorithm 1.1E.

In the case of this particular "Algorithm D", the letter "D" is chosen probably because it's an algorithm for "Division of nonnegative integers" (it follows algorithms "A", "S" and "M" for you-can-guess-what); indeed the same volume also has another "Algorithm D (Factoring with sieves)" (=Algorithm 4.5.4D), and an "Algorithm D (Division of polynomials over a field)" (=Algorithm 4.6.1D). I think that, as several times in the same section it's referred to simply as "Algorithm D" (for example), this convention is not very obvious unless one has looked at either the second page of the first volume, or the appendix. Or I guess within a certain field a particular section of TAOCP is implicit, so it's clear from context.

👤svat🕑4y🔼0🗨️0

(Replying to PARENT post)

I'd encourage people to also read Niels Möller and Torbjörn Granlund's division paper: https://gmplib.org/~tege/division-paper.pdf

It offers a modification to Knuth's algorithm D which avoids all hardware division by utilizing an approximate reciprocal. The 3-by-2 division is particularly interesting as it allows you to skip the quotient loop (D3). It's a nice little optimization that I've spent some time tinkering with in my own code. This technique is notably used in GMP and other arbitrary precision integer libraries.

👤chjj🕑4y🔼0🗨️0

(Replying to PARENT post)

I have implemented this algorithm twice, once in ARM assembly and once in C.

In some ways the assembly implementation was easier as you aren't forever fighting with the compiler to get it to generate the correct instructions.

The 32 X 32 to 64 bit multiply is one example. There isn't a way of expressing that in C - you do a 64 X 64 bit multiply of 32 bit variables casted to 64 bits and hope the compiler gets the hint that it should be doing a 32 X 32 to 64 bit multiply not a 64 X 64 bit multiply. This of course involves reading the assembly output.

The other thing you'd like is addition with carry. It is possible to express this in C and have the compiler recognise what you are doing and generate the correct instructions but it is a pain! I noted the other day that zig has @addWithOverflow which is a step in the right direction.

I can see why the Author of the article found so many things to criticize in the algorithm studied, but I think they are mostly an artifact of the fact that C (and most high level languages) is a poor choice for low level arithmetic.

👤nickcw🕑4y🔼0🗨️0

(Replying to PARENT post)

It seems rather petty to criticize the use of uint64_t in the 64-bit versions in multiple places when the original does the same:

32-bit version:

  const unsigned b = 65536; // Number base (16 bits).
  unsigned un1, un0,        // Norm. dividend LSD's.
           vn1, vn0,        // Norm. divisor digits.
           q1, q0,          // Quotient digits.
           un32, un21, un10,// Dividend digit pairs.
           rhat;            // A remainder.

So a properly adapted 64-bit version should use uint64_t and a 32-bit number base.

I did not attempt to follow the author's arguments why he advocates for 32-bit variables or why that deviation from the original would be correct. Using 64-bit does not seem to be any bottleneck at all here, so why bother.

👤taylortx🕑4y🔼0🗨️0

(Replying to PARENT post)

Here is my optimized in-place Rust implementation [1].

It is a very tricky algorithm to get right. There are many edge cases that only happen for ~2^-64 fractions of the input space, so hard to find even with fuzz-testing. Best strategy is to implement it for small limbs first, and fuzz that well.

[1] https://github.com/0xProject/OpenZKP/blob/master/algebra/u25...

👤remcob🕑4y🔼0🗨️0

(Replying to PARENT post)

Very interesting exposition and comments. I think it shows one of the 'weaknesses' of TAOCP is that it does not formalize or at least make explicit the relationship between the available operations and the final functionality of the algorithm. In that sense the TAOCP remains a receipe book and not something that is 'executable' in the sense of being able to generate algorithms tailored to the available operations (and their performance) or at least to be used in a tool that could support the synthesizes or verification of algorithms.

👤fjfaase🕑4y🔼0🗨️0

(Replying to PARENT post)

This is my favorite algorithm ever, and the first serious one that I tried (and failed) to implement in my life when I was a teen. I remember dearly implementing the first three algorithms (A, S, M) in the nasm assembler on my debian "hamm". Then I tried for many weeks to get D right, but never managed to be sure all the cases worked.

These algorithms are probably easier to implement in assembly than in a higher-level language.

👤enriquto🕑4y🔼0🗨️0

(Replying to PARENT post)

Looks like the links to the book are broken in the article. The book in question is this: https://en.m.wikipedia.org/wiki/Hacker%27s_Delight

👤stellalo🕑4y🔼0🗨️0

(Replying to PARENT post)

So, Golang and V8 both use this algorithm when you choose to use an arbitrary-precision integer type. In Python (and Ruby?), all integers are arbitrary-precision - do those languages also use this algorithm for division?

👤pansa2🕑4y🔼0🗨️0