Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One should always use a single kind of half-open range, i.e. with the start closed and the ending open.

The whole point here is to use a single kind of range, without exceptions, in order to avoid the errors caused by using the wrong type of range for the context.

For backwards iteration the right range is [-1,-1-n), i.e. none of those listed by you.

Like for any such range, the number of accessed elements is the difference of the range limits, i.e. n, which is how you check that you have written the correct limits. When the end of the range is less than the start, that means that the index must be decremented. In some programming languages specifying a range selects automatically incrementing or decrementing based on the relationship between the limits. In less clever languages, like C/C++, you have to select yourself between incrementing and decrementing (i.e. between "i=0;i<n;i++" and "i=-1;i>-1-n;i--").

It is easy to remember the backwards range, as it is obtained by the conversion rules: 0 => -1 (i.e. first element => last element) and n => -n (i.e. forwards => backwards).

To a negative index, the length of the array must be added, unless you use a programming language where that is done implicitly.

In the C language, instead of adding the length of the array, one can use a negative index into an array together with a pointer pointing to one element past the array, e.g. obtained as the address of the element indexed by the length of the array. Such a pointer is valid in C, even if accessing memory directly through it would generate an out-of-range error, like also taking the address of any element having an index greater than the length of the array. The validity of such a pointer is specified in the standard exactly for allowing the access of an array backwards, using negative indices.



I'm following the convention that lists the smaller value to the left. I would write [-1,-1-n) as (-1-n, -1], which is a shifted version of (-1, n-1].

The supposed advantage of 0-based indexing with half-open ranges is that the programmer wouldn't have to add ±1 to the loop bounds as often as they would with 1-based indexing. But backwards iteration is an example where that's not the case. The open range calls for a bound of n-1 or -1-n, whereas with closed ranges it would be just n.


[-1,-1-n) and (-1-n, -1] are not the same thing, even if they contain the same elements.

They are the same thing as sets, when the order does not matter, but the ranges used in iterations are not sets, but sequences, where the order matters (unless the iteration is not a sequential iteration "for", but a "parallel for", where all the elements are processed in an arbitrary order and concurrently, in which case there exist neither forward iterations nor backward iterations).

Therefore (-1-n, -1] is actually the same as [0, n), where the array is accessed forwards, not backwards (the former range is used with a pointer to the first element past the end of the array, while the latter range is used with a pointer to the first element of the array).

The advantage of half-open ranges is not that you would always avoid adding ±1 but that you avoid the off-by-one programming errors that are very frequent when using closed ranges.

However, if that is what you wish, you could easily avoid any addition or subtraction of 1, by using pointers instead of indices. With half-open ranges, you use 2 pointers for accessing an array, a pointer to the first element and a pointer to the first element after the array. The C standard ensures that both these pointers are valid for accessing the array.

Then with these 2 pointers you can iterate either forwards

  for (p=p1; p!=p2;) *p++;
or backwards

  for (p=p2; p!=p1;) *--p;
There is no difference between the 2 directions and the overhead is minimum.


What would you do if your array is so large that it requires an unsigned int64 index?


The current AMD64 specification only uses 48-bits of pointer space, coming from 40-bits. So we still have 16 bits remaining. I'm sure we can use 1 for a sign.


In C/C++ there are no true unsigned integers (true unsigned integers do overflow, generating errors in such cases or they generate carry/borrow on certain operations).

The so-called unsigned integers of C/C++ are in fact modular integers, i.e. where the arithmetic operations wrap around and where you can interpret any 64-bit number greater than 2^63-1 as you please, as either a positive integer or as a negative integer. For instance you can interpret 2^63 as either -2^63 or as +2^63.

So using 64-bit "unsigned" integers for indices does not create any problems if you are careful how you interpret them.

However, as another poster has already said, in all popular ISAs the addressable space is actually smaller than 2^64 and in x86-64 the addresses are interpreted as signed integers, not as unsigned integers, so your problem can never appear.

Some operating systems use this interpretation of the addresses as signed integers in order to reserve the negative addresses for themselves and use only positive addresses for the non-privileged programs.

The reason why the addressable space is smaller than afforded by 64 bits is that covering the complete space with page translation tables would require too many table levels. x86-64 has increased the number of page table levels to 4, in order to enable a 48-bit address space, while some recent Intel/AMD CPUs have increased the number of page table levels to 5, in order to increase the virtual address size to 57 bits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: