VFP implementations of transcendental functions and decimal conversion
Jeffrey Lee (213) 6048 posts |
The rounding mode Transcendental functions: The result must be correctly rounded, according to whatever mode the user specified. So as long as you can guarantee that, you can use whatever rounding mode you want internally. Decimal conversion: I think the spec might mention they only need to be correct for default rounding? You’d have to check the spec. Invalid Operation, Division by Zero, Overflow, Underflow & Inexact exception bits Depending on whether the goal is to avoid them or to generate them :-) (for exceptions that need to be generated, assume we’ll have a function which you can call in order to trigger it to be generated) |
Steve Drain (222) 1620 posts |
I have been doing a lot of reading, and I certainly have a better handle on this floating point business than I did a week ago. ;-) One thing I have had to change is the scientist/mathematician’s view of real numbers and what we mean by accuracy and precision and replace it with a computer view, where there are only machine numbers, or more simply floats. We may want to use floats to represent reals, but they cannot be treated the same or there will be trouble. Here are some observations that might be relevant.
I have read, or at least looked over, the papers on Dragon4 (Steele and White), Grisu3 (Loitsch) and also ‘dtoa’ (Gay). I have also read about the history of the algorithms and quite a bit of comment. It is quite revealing of how the conversions have been badly implemented by languages and compilers in the past, including C and printf. I get the impression that in the modern era this is generally done well, but it is certainly not perfect. How does our Shared C Library fare? The challenge to implement one ot these is way beyond me, but a bit of lateral thinking offers a solution, at least pro tem. The FPE software contains conversion code – it is used to convert binary floats to packed decimal floats and vice versa. All RISC OS machines are provided with the FPE, so let us use it, even with VFP. Looking at the FPE source I cannot find a reference to the origin of the algorithm, but I would guess that it is ‘dtoa’ rather than the others. ‘dtoa’ itself does exact conversions, even if it is long winded, and the FPE source has a sentence that says that conversions to-and-from double precison and packed decimal will be exact. This is not proof, but far better than nothing. I have already experimented with using this.
The IEEE spec is about ‘round trip’ conversions – binary > decimal > binary (BDB) and decimal > binary > decimal (DBD). It requires that these are exact, that is that they always return to the same machine number. Achieving this is not trivial, as the paragraphs above indicate. I came across a mathematical proof that concluded this about double precision floats:
Here lies the origin of “15-17 digits”, I think. On the question of VFPv4 and IEE 754-2008, I see that it would have to provide the conversion of decimal strings itself to comply. Have I got that right? It would also have to provide a ‘remainder’ operation, which would be handy. ;-) |
Rick Murray (539) 13872 posts |
Yes, remember a jazz CD is not a jazz concert but a convenient adequate representation… |
Jeffrey Lee (213) 6048 posts |
Internally it uses the FPA packed decimal conversion instructions, both for float-to-decimal and decimal-to-float. So presumably it’s good enough, although as you note we don’t have any definitive proof.
It’s not that hard to implement a fully accurate reference algorithm – I implemented a routine in BASIC which would convert BASIC 5-byte floats to decimal form. See the “AtPercent” test in the BASIC sources: https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/Programmer/BASIC/Tests/Math/. It basically treats the fraction as an integer, converts it to a string, then repeatedly multiplies/divides by two in order to apply the binary exponent (performing arithmetic on the decimal string). In BASIC this approach only really works for single-precision floats, since the larger exponent range for double-precision could result in a number with more digits than a BASIC string can store. Writing an efficient algorithm is of course a fair bit harder. To fix the inaccuracies I spotted in 5-byte BASIC (where it was doing lot of base 10 operations on the base 2 float) I essentially adopted the same approach as my reference implementation, except using 16 digit BCD instead of a string. It’s not fully accurate, but it’s a lot better than the original was. https://www.riscosopen.org/viewer/view/castle/RiscOS/Sources/Programmer/BASIC/s/fp2#rev1.7
The spec doesn’t dictate which operations must be supported by hardware – it’s down to the implementor to decide how much/little they want to support. So in ARM’s case, they’ve only dealt with the basic arithmetic operations in hardware. Everything else must be implemented in software (usually by the runtime library of whatever your programming language is) |
Steve Drain (222) 1620 posts |
Not so much lateral thinking then – or maybe ‘great minds think alike’. ;-) Does ObjAsm use the same?
But you are Jeffrey and I am Steve. |
Jeffrey Lee (213) 6048 posts |
I’d assume it uses its own algorithm, for a number of reasons:
|