Why numbering should start at zero (1982)

(cs.utexas.edu)

👤feynma🕑9y🔼161🗨️216

(Replying to PARENT post)

Dijkstra's argument is obsolete. Iterators have largely replaced the explicit specification of integer ranges in loop control.

Also, the recent rise in programming with multidimensional data invites the question of why matrix integer indices should include zero, which Dijkstra's essay doesn't address.

Likewise, the increasing use of real variables in software invites asking whether an elegant syntax for specifying integer ranges applies equally well when used for real ranges, since <= operators make less sense when constraining floats. Also not addressed.

👤randcraw🕑9y🔼0🗨️0

(Replying to PARENT post)

Dijkstra is of course brilliant and I recognize much of his tone is tongue in cheek and irreverant but sometimes I find his writing so nasty, pompous and arrogant that it becomes irritating; claiming, after a series of opinions about what he personally finds "nice" or "natural" that there is a "most sensible" way of doing things and that everyone else is being ridiculous. A tamer one of his offenses but still unpleasant.

I agree on the numbering but many people don't. He doesn't mention how nicely it works with the modulo operator, which is I think the most compelling reason. But having L[2] be the third index in an array is horribly counterintuitive and leads to countless mistakes. CS students when they begin do not like it, it is not natural. It is beaten into them by scary red error messages and instructors over months and years until they finally accept it.

👤wfo🕑9y🔼0🗨️0

(Replying to PARENT post)

I always that the best argument, that every assembly programmer knows, that starting at i=0 allow i to be used as an offset from the base address of an array. I bet that's why K&R choose this for C (unlike Wirth for Pascal which allowed any index range for arrays).

👤waynecochran🕑9y🔼0🗨️0

(Replying to PARENT post)

Dijkstra's handwriting is quite charming. If you've never seen it, it's worth clicking the “EWD831” link at the top right to see the scan of the handwritten original.

👤mayoff🕑9y🔼0🗨️0

(Replying to PARENT post)

I answered a StackOverflow question a while ago about why date ranges in Postgres are [x, y), or in other words include all t where x <= t < y:

http://stackoverflow.com/questions/37953786/why-does-postgre...

It is interesting to see the same question applied to natural numbers, and Dijkstra landing on the same [x, y) recommendation. I'd say for natural numbers the argument is even stronger, since there is a least possible value.

👤pjungwir🕑9y🔼0🗨️0

(Replying to PARENT post)

0-based arrays are IMHO simply a consequence of C's array access operator given transparent semantics in terms of pointer arithmetic.

Try some string manipulation using the awk language, which has 1-based strings and arrays, and you'll see that programs and string expressions are generally shorter and more idiomatic.

Later languages building on awk such as Java and JavaScript (awk is almost a subset of, and the inspiration for early JavaScript) have cargo-culted 0 into the language as base offset. Though probably it may have prevented some errors to make commonly used languages behave same in this respect.

👤tannhaeuser🕑9y🔼0🗨️0

(Replying to PARENT post)

If you represent base-b coefficients as such: a = a[0] b^0 + a[1] b^1 + b[2] 2^2 + ... , then the exponent agrees with the coefficient, so that's nice...

👤divbit🕑9y🔼0🗨️0

(Replying to PARENT post)

This makes a lot of sense in the context of programming languages, where counting usually starts "where you are", thus 0 moves forward.

But it doesn't really apply in the natural world, where 1 is a much better starting number for probably all contexts where you have to number or reference things in an order.

Maybe this clarification is redundant, I don't know. But when I read the headline, I assumed it was talking about everywhere, which is why I found it interesting and even clicked it in the first place.

👤_vya7🕑9y🔼0🗨️0

(Replying to PARENT post)

He argues that 1 < i … is bad because it doesn't really work if your lower bound is 0 (because then you'd write -1 < …). Funny thing is, in most programming languages, there is a biggest natural number, too. For example, in Rust, you have a problem, if you want to enumerate all possible byte values:

    for i in 0u8..256 { }  // <-- doesn't compile, because 265 is not a valid value for a u8

👤grimoald🕑9y🔼0🗨️0

(Replying to PARENT post)

Or we could just accept and implement the "pernicious three dots".

when you see `-1` in code it is often a failure of the expressiveness of 0-indexed lists. For example, the last item in an array:

    arr[arr.length - 1]

Or a found item by index number:

    if (arr.indexOf("thing") > -1) {

This is hardly an elegant convention.

👤protonfish🕑9y🔼0🗨️0

(Replying to PARENT post)

A little bit about 0-based indexing on the practical side from Guido van Rossum: http://python-history.blogspot.hu/2013/10/why-python-uses-0-...

👤Walkman🕑9y🔼0🗨️0

(Replying to PARENT post)

As much as I respect Dijkstra, I never accepted this as something that makes sense, it is ugly remnant from older times where calculations were precious and it was considered smart to put number that you would add to base address, to calculate array address.

This is not commonly held belief. I am in minority but this is what makes sense and I always believed that computers should serve us, not the other way around.

👤desireco42🕑9y🔼0🗨️0

(Replying to PARENT post)

Absolutely not, quoting from here: https://github.com/amark/theory#notes

1. Naturally, the first element in a list cardinally corresponds to 1. Contrarily, even official documentation of JavaScript has explicit disclaimers that the "first element of an array is actually at index 0" - this is easily forgotten, especially by novices, and can lead to errors.

2. Mathematically, a closed interval is properly represented in code as for(i = 1; i <= items.length; i++), because it includes its endpoints. Offset notation instead is technically a left-closed right-open interval set, represented in code as for(i = 0; i < items.length; i++). This matters because code deals with integer intervals, because all elements have a fixed size - you can not access a fractional part of an element. Integer intervals are closed intervals, thus conclusively proving this importance.

3. Mathematically, matrix notation also starts with 1.

4. The last element in a list cardinally corresponds to the length of the list, thus allowing easy access with items.length rather than having frustrating (items.length - 1) arithmetic everywhere in your code.

5. Negative indices are symmetric with positive indices. Such that -1 and 1 respectively refer to the last and first element, and in the case where there is only one item in the list, it matches the same element. This convenience allows for simple left and right access that offset notation does not provide.

6. Non existence of an element can be represented by 0, which would conveniently code elegantly as if( !items.indexOf('z') ) return;. Rather, one must decide upon whether if( items.indexOf('z') == -1 ) return; is philosophically more meaningful than if( items.indexOf('z') < 0 ) return; with offset notation despite ignoring the asymmetry of the equation.

👤marknadal🕑9y🔼0🗨️0

(Replying to PARENT post)

Offset (distance, 0-based) and Ordinal (position, 1-based) are two different things, and the confusion disappears when your program/language properly treats different types.

https://en.wikipedia.org/wiki/Zero-based_numbering

👤gohrt🕑9y🔼0🗨️0

(Replying to PARENT post)

The cost of index off by one bugs must be many billions. Probably more than null pointer bugs.

👤guelo🕑9y🔼0🗨️0

(Replying to PARENT post)

Dijkstra mentions Mesa offering all four options. Mesa's interval notation was

   [0..5]   0,1,2,3,4,5

   (0..5)   1,2,3,4

   [0..5)   0,1,2,3,4

   (0..5]   1,2,3,4,5

This reflects the concept of closed and open intervals in mathematics. It might have been useful when porting FORTRAN programs to Mesa. In FORTRAN, arrays start from 1. Pascal required a range in the declaration. When C started consistently from 0, that was considered radical.

For programming, as Dijkstra mentions, a consistent start from zero seemed to be helpful.

👤Animats🕑9y🔼0🗨️0

(Replying to PARENT post)

I assumed that indexing started at zero because of binary representations. For example if you're using two bits to represent four states, the first one will be 00 and the last one will be 11, which is three.

👤ruraljuror🕑9y🔼0🗨️0

(Replying to PARENT post)

Some people prefer 0, and some people prefer 1, so it should be a per-module setting that anyone can change.

Let's call it $[, so a statement like '$[ = 17' causes arrays to start indexing at 17.

http://search.cpan.org/dist/perl-5.17.1/ext/arybase/arybase....

👤MichaelBurge🕑9y🔼0🗨️0

(Replying to PARENT post)

I like Ada's arbitrary array indices. If you want to loop through an array, you loop from Array'First to Array'Last.

👤JohnStrange🕑9y🔼0🗨️0

(Replying to PARENT post)

I agree with Dijkstra on using one-side open, one-side closed boundaries to denote a range of integers.

(C.S. way) 0 <= a < N

(Math way) 0 < b <= N

As he says, both ways have the property that "upper bound - lower bound = Number of elements", both can represent an empty set "lower bound = upper bound", and both make partitioning easy "(0 <= a < M)U(M <= a < N)".

However I disagree that including the lower bound way is preferable to the starting at 1, math way. Math is the older science and children are still taught to count in this way.

As many of the benefits of one way or the other are mostly just network effects (i.e. in math subscripts typically start at 1 or in C, ints and structs are pointed to by their lowest char), we should have used the notation of the older science.

👤_5ysi🕑9y🔼0🗨️0

(Replying to PARENT post)

How about we split the difference and start at 0.5? That sounds like a fair compromise.

👤paulmd🕑9y🔼0🗨️0

(Replying to PARENT post)

Is this whole comment train 'bike shedding'? Like tabs/spaces, brace indent levels and all the others, these arguments and explanations about why one way is better is like watching people run around a tree.

👤kruhft🕑9y🔼0🗨️0

(Replying to PARENT post)

Only few days ago I was thinking that numbering should start with zero, then things will be easier.

[1-based index] Year 2016 -> Century 21 -> Millennium 3

New year on 2017-01-01 01:01:01

[0-based index] Year 2015 -> Century 20 -> Millennium 2

New year on 2016-00-00 00:00:00 :-)

👤gungsukma🕑9y🔼0🗨️0

(Replying to PARENT post)

Dijkstra was wrong about Fortran at that time. The standard describing arrays with arbitrary bounds is dated 1978.

Why do people make the fuss about indexing and not about column-major arrays?

👤gnufx🕑9y🔼0🗨️0

(Replying to PARENT post)

There is a certain elegance tos tarting at zero, most easily appreciated when writing assembler. And I agree with dijkstra that Adhering to convention a) yields, when starting with subscript 1, the subscript range 1 ≤ i < N+1; starting with 0, however, gives the nicer range 0 ≤ i < N.

Such minimalism is elegant and efficient, to be sure, and I'm a big believer that beauty often shows the path to truth. So from a computer science standpoint, I wholly agree.

From a software engineering standpoint it's really fucking stupid. People generally learn to count on their fingers and they need to grasp the concept of counting and numbers as an abstraction before they can get with the concept of a zero.

Counting on your fingers is one of those very basic things like learning the alphabet (ro its equivalent in other languages) or learning to tie shoelaces. Nobody I have ever met defaults to counting from zero, not even people who claim to do so - easily tested by buying them a drink at some later time and tricking them into counting something.

If you're a software engineer your job is to build something that works and is maintainable. This is not the same thing as doing computer science or mathematical computing, even though it may all be math from the computer's point of view - just as architects are not mathematicians despite their reliance on geometry. Array bound errors are omnipresent in software projects because counting everything from zero is directly at odds with how we count things in the real world, and when we're in a hurry or under pressure we default to learned behaviors. Even people who have been bilingual for years will burst into their default language or switch between two default languages when they're excited. Building your code to go from 1 to n+1 may not be quite as beautiful, but it is a hell of a lot easier for other people to read and (in my experience) you make fewer mistakes if you switch away from starting at zero.

Yes, this requires developing some new habits that feel really awkward at first. Since I've only ever programmed for myself rather than as part of a team I haven't had to deal with the inevitable resistance to this. But it is worth making the switch. Just as a lab and a construction site are very different environments [1], so are the practices of computer science and commercial software development. It would also make a huge difference in programming education, where many prospective students are alienated by being asked to do something highly counterintuitive very early on (typically while trying to grasp the concept of looping) and lead to errors which they are likely to keep making forever because nobody defaults to counting from zero in any other context.

If you're a computer scientist, carry on as you were. If you're an engineer, then build for the needs and instincts of your end users - some of whom will eventually be programmers - and not for the computer gods. They're not going to be around to help when your code breaks down.

I'm not very optimistic about this plea (especially not in the USA where y'all won't even adopt the damn metric system), but please at least give it a try. Technology should adapt to the needs of the people that use it, rather than the other way around. If you're working in a high level language, then use high level concepts.

1. Remember the story about the three little pigs who all built their houses out of subatomic particles formed into atoms formed into molecules and whose houses were topologically identical but had different coefficients of structural stability? Me neither.

👤anigbrowl🕑9y🔼0🗨️0

(Replying to PARENT post)

> Different tasks call or different conventions

https://xkcd.com/163/

👤paulddraper🕑9y🔼0🗨️0

(Replying to PARENT post)

Sometimes the only way, for people to take your ideas serious, is to get them to want to take them apart. 500 pages of proof, but to silence this grating, always arrogant, ####### at the next conference, so worth it.

Surely not the Noble kind of sciences, but if animosity leads too thoroughly read papers and found errors, maybe the dark side is just willing to work harder.

👤Pica_soO🕑9y🔼0🗨️0

(Replying to PARENT post)

Are PL geeks still arguing about this?

👤johan_larson🕑9y🔼0🗨️0

(Replying to PARENT post)

No. Just no. Mathematically, it just doesn't make sense in so many use cases. But, hey, this is CS, math is an afterthought, right.

👤polarvortex🕑9y🔼0🗨️0