VFP implementations of transcendental functions and decimal conversion

56 posts, 9 voices

Pages: 1 2 3

Jan 15, 2018 3:25pm Jeffrey Lee (213) 6046 posts	Converting BASIC64 to VFP Adding VFP support to Norcroft Making FPEmulator use VFP The above three topics all have one common theme: the current code is reliant on FPA/FPEmulator for non-trivial things like the transcendental functions and conversion of floats to/from decimal format. So it would be very helpful if we had “VFP” implementations of these operations. Requirements: “VFP” implementations of the required operations. Note that they don’t necessarily have to use VFP instructions for all the arithmetic, they just have to be VFP-compatible. E.g. pure ARM implementations would be fine. NEON would be fine as well, as long as there’s a fallback to ARM/VFP if NEON isn’t available. Source can be C or assembler, but is expected to have tight constraints placed on it (e.g. no reliance on the C standard library, no memory allocation) so that as few restrictions as possible are placed on how/where the resulting routines can be used. The lowest-common-denominator VFP system that must be supported is VFPv2-D16. If your routines are to be written in assembler, aim for them to be APCS compatible, so that they can be used directly by C code. Minimum required transcendental functions would be the set required by BASICVFP and the set required by the C standard library. Routines for conversion to/from decimal format(s) are also required. Doing this correctly and efficiently is difficult, so it’s probably best to take known-good implementations from elsewhere. Routines need to support double & single precision operation. A test suite/framework is also necessary, in order to prove the accuracy and IEEE compliance of the routines. E.g. compare the output against that of a known-good arbitrary-precision arithmetic library. Comparing against FPA alone isn’t acceptable since I don’t think we have any test suites which show how accurate FPA is. Source code licensing must be suitable for including the code in ROMs If these requirements can be met, we can add the routines to VFPSupport so that arbitrary software can make use of them. Chances are we could offer both a SWI interface and a direct function call interface. Then we can remove the last bits of FPA code from BASICVFP, and it’ll be one less task to worry about for updating Norcroft to use VFP.

Jan 15, 2018 5:30pm Steve Drain (222) 1620 posts	@Jeffrey I think you might remember my Float/SmartFP ¹ module, which has code to satisfy many of these things. I have not looked at it for quite a while now, and I know there are improvements to be made, but if it is suitable I will return to it. Implementation of trancendental functions for double-precision: I have done the maths and sorted out the algorithms – no trivial task. The precision is better than 17 decimal places for nearly all, but they cannot be called IEEE compliant. There is no single-precision or use of NEON. There are a couple of routines for conversion, like STR$ and VAL, but they could be more efficient, and you once pointed out where that might be done. They are based loosely on the routines in the BASIC module. There are also routines to convert 5-byte floats to and from 8-byte. Source is in plain BASIC assembler, but ought to be easily translated for ObjAsm. There are no lbraries or memory usage. I an happy for anyone to adapt it. It assumes VFPv2-D16 as far as I understand these things. ;-) Clearly the code is not APCS compatible as it stands. The trancendental functions replicate those of the FPE, so they should satisfy your minimum. I have simple test routines in BASIC and I have a Toolbox app to demonstrate, but it needs work. Comparison is made against the FPE, which was certified by Acorn to be IEEE compliant, and that should be adequate, I think. Although there is a SWI interface this imposes a significant overhead. Each routine can also be called as an index into a table. Please have another look at Float and let me know if you would like a later version under the name SmartFP. ¹ Neither name is registered, but Float is already allocated, I think. Edit: Providing single-precision should not be a problem, and the default could just be to use the double-precision with conversion at each end.

Jan 15, 2018 7:23pm Steve Fryatt (216) 2045 posts	Neither name is registered, but Float is already allocated, I think. Float is indeed already allocated… :-) …although I’m not sure it was when you initially announced the module in question!

Jan 15, 2018 8:23pm Jeffrey Lee (213) 6046 posts	I think you might remember my Float/SmartFP module Indeed. they cannot be called IEEE compliant In which case they’re not very useful :-( (IEEE compliance now added to OP!) There are a couple of routines for conversion, like STR$ and VAL … They are based loosely on the routines in the BASIC module But are they accurate? BASIC’s 5-byte float routines weren’t. For accurate implementations, you need to look at things like Dragon4, Errol, Grisu3, etc. http://www.ryanjuckett.com/programming/printing-floating-point-numbers/ https://github.com/marcandrysco/Errol I have simple test routines in BASIC Unless I’m looking in the wrong place (the ‘test’ folder), the only tests I see are performance tests. Nothing seems to be checking that results are correct (and each operation is only tested with one set of input values?). Comparison is made against the FPE, which was certified by Acorn to be IEEE compliant, and that should be adequate, I think. Let’s say the IEEE spec says that results need to be accurate to at least 15 digits. If the FPE is on the lower end of the spectrum (e.g. only designed to be accurate to 15 digits), but the new algorithm could easily be tuned to be accurate to 16 or 17 digits, then if you were to only perform your comparison against the FPE then you wouldn’t have any way of validating that extra accuracy. So that’s one reason why checking against an arbitrary-precision reference implementation is a good idea. Other reasons: We don’t know when the FPE was last checked, and neither do we have a test suite for it. So we don’t have a way of verifying that it still conforms, or that any future changes (like making it use VFP) have adversely affected things. The current version of the spec is IEEE 754-2008. VFPv4 aims to be compliant (that’s why the fused-multiply-and-accumulate instructions were introduced). To allow for a fully ’2008 compliant system to be produced, it would make sense for these support functions to be ’2008 compliant as well. The previous version of the spec was the 1985 version, which is the one Acorn would have used. (And the presence of VFMA in VFPv4 means you could even have two implementations of each function, one which uses VFMA for extra accuracy, and one which doesn’t) Please have another look at Float and let me know if you would like a later version under the name SmartFP. Float ticks many of the required boxes, but not the two biggest ones, which are accuracy/IEEE compliance, and a test suite to prove it. So if the SmartFP version includes both of those things then yes, I’d be interested in seeing it. Edit: Providing single-precision should not be a problem, and the default could just be to use the double-precision with conversion at each end. Yes, that would be an acceptable way of doing things.

Jan 15, 2018 11:00pm Steve Drain (222) 1620 posts	OK. I think a more professional approach than I can offer is needed. ;-) If my algorithms are useful please feel free to use them. IEEE compliance now added to OP! That is a steep condition. The polynomial methods I have used can probably not be made more precise with the resources I have available. It might need a another approach. I think that might be Cordic and I have a suspicion that may be used in the FPE. But are [conversion routines] accurate? Probably, but not provable by me. you need to look at things like Dragon4, Errol, Grisu3, etc. I will. Let’s say the IEEE spec says that results need to be accurate to at least 15 digits … I only have this to offer here. We are only looking for double-precision (64 bits). The FPE is designed to carry out all internal calculations to extended-precision (80 bits). Only the return value is as double. If the FPE is still doing its job I think that is adequate, except: We don’t know when the FPE was last checked, and neither do we have a test suite for it. That is a real problem of proof. The current version of the spec is IEEE 754-2008. VFPv4 aims to be compliant Does VFPv2 aim to be compliant? I vaguely recall that it is not, even if it has the required precision. that’s why the fused-multiply-and-accumulate instructions were introduced That is the problem with a polynomial method starting with double-precision values – the precision is quite soon reduced. As I said above, attaining the necessary final precision this way is probably not achievable. And the presence of VFMA in VFPv4 means you could even have two implementations of each function, one which uses VFMA for extra accuracy, and one which doesn’t But we cannot assume VFPv4, can we? You said VFPv2 in the OP. More thought needed. ;-)

Jan 15, 2018 11:32pm Jeffrey Lee (213) 6046 posts	The current version of the spec is IEEE 754-2008. VFPv4 aims to be compliant Does VFPv2 aim to be compliant? I vaguely recall that it is not, even if it has the required precision. It can’t be officially compliant, since one of the requirements for 2008 is the fused MAC. VFPv3 and earlier are only advertised as being compliant with the ‘85 version of the spec. (Maybe there are some bits where it aims above ’85 in order to reach ’08 levels of compliance, but without going back through the specs myself I’m not sure if there are any other situations where the features covered by the VFP hardware have changed) As I said above, attaining the necessary final precision this way is probably not achievable. Which is why ARM & NEON implementations are acceptable :-) (potentially NEON could be useful if you have lots of 64bit integer arithmetic to perform – not sure how practical it would be in practice though) But we cannot assume VFPv4, can we? You said VFPv2 in the OP. You can’t assume, but there’s no reason why you can’t test for features (or provide multiple implementations, and let someone else worry about the feature testing). (which reminds me, that I still need to extend VFPSupport to expose the extra VFPv4/NEONv2 feature registers via SWIs!)

Jan 16, 2018 12:54am Jeffrey Lee (213) 6046 posts	The polynomial methods I have used can probably not be made more precise with the resources I have available. It might need a another approach. I think that might be Cordic and I have a suspicion that may be used in the FPE. The FPE uses a polynomial approach. Exactly how accurate it is, I don’t know (doing everything at extended precision should help the accuracy of the single/double results, at least). https://www.riscosopen.org/forum/forums/2/topics/3457?page=8#posts-64437 (actual code is in this file)

Jan 16, 2018 9:09am Sprow (202) 1113 posts	Requirements An extra angle I’ve not seen mentioned is modifying the PCS used to require 8 byte stack alignment (or not). Like we have APCS-R and APCS-32 would that justify an APCS-VFP? Certainly inspecting the output of Keil MDK when working on Cortex-M family I’ve noticed they do, even if that means pushing a (integer) register you didn’t really need to just to keep things aligned. This allows double precision FP arguments to be pushed too which in turn allows them to be accessed in one gulp with VLDR/VSTR on 8 byte boundaries. So you might want to get in early and start aligning your function entries/exits ahead of that, so that if an exception occurs the VFP state can be pushed without having to worry about it being only 4 byte aligned. Thinking about compilers, there’s a bounty open right now which is only £2200 off target. The list price for a 1 year renewal for Keil MDK is £680 (versus £25), so think of it as only 4 renewals short! I think there’s a reasonable case to follow up that ARMv6/ARMv7 work with VFP support, else ROOL are snoozing on the job.

Jan 16, 2018 9:47am Jeffrey Lee (213) 6046 posts	An extra angle I’ve not seen mentioned is modifying the PCS used to require 8 byte stack alignment (or not). Like we have APCS-R and APCS-32 would that justify an APCS-VFP? True, modern APCS versions use 8 byte stack alignment. But I think that’s more for LDR[EX]D/STR[EX]D than VLDR/VSTR. LDREXD/STREXD require doubleword alignment, and LDRD/STRD will only be atomic if they’re doubleword aligned (and you have LPAE). I think VLDR/VSTR will see a performance boost with 8-byte alignment, but should still work fine with 4-byte (at least in AArch32). Likewise with NEON (it’ll work fine with 4-byte alignment, but if the compiler expects everything to be 8-byte aligned it can include alignment specifiers on the instructions to boost the speed).

Jan 16, 2018 11:20am Jeffrey Lee (213) 6046 posts	I was curious what newlib does for the transcendental functions (since newlib’s licensing is possibly permissive enough for inclusion in ROM), and it looks like they’ve had a couple of different attempts at it over the years. The first attempt was to use single/double precision Taylor series’s from Cody & Waite, essentially the same as the FPE. But that was deemed to be too inaccurate. The new version seems to be a mixed bag. Some look like they’re still based around the Taylor series (e.g. sin), others are different (e.g. log)

Jan 16, 2018 11:33am Steve Drain (222) 1620 posts	The FPE uses a polynomial approach. Yes, of course. I was remembering when I first glanced at the code, but forgot your later pointer. I only looked more carefully afterwards, and some time after I had worked things out using the other resources. About a year ago I acquired a copy of Cody and Waite ¹ but I have not studied it in conjunction with the FPA source yet. I am intrigued enough now to do that. ;-) It will be interesting to see how it compares. I have some grounding in Maths, but this has been a fascinating journey so far. ¹ Interestingly, my copy has the this stamp inside: “CRAY RESEARCH INC. NEW PRODUCT DEVELOPMENT LIBRARY 1-7-86”.

Jan 16, 2018 1:26pm Rick Murray (539) 13392 posts	but this has been a fascinating journey so far. :-) I read through the source linked above. Understood maybe every third word… Just, you know, idiot question here. Why do we need accuracy at, like, seventeen decimal places? If that, why not eighteen? Nineteen? When is it “enough”, as we’re only really making a digital approximation of a number not easily stored.

Jan 16, 2018 1:49pm Steve Drain (222) 1620 posts	Why do we need accuracy at, like, seventeen decimal places? I expect Jeffrey will have a more cogent answer, but it is really about standards. If computers and their languages are to share data then a standard is necessary. The standard that has been agreed is IEEE 754. I have found that the Wikipedia article on double-precision quite informative. For any particular application a lower precision may be acceptable, but having lots of different standards, as was once the case, would be confusing. Note that there is a distinction to be made between accuracy and precision. ;-)

Jan 16, 2018 2:23pm GavinWraith (26) 1531 posts	Why do we need accuracy at, like, seventeen decimal places? This is a good question. Computers can only handle a small finite number of numbers. Humans, on the other hand, have little difficulty cooking up all kinds of infinitary systems. Mapping these onto finitary systems cannot avoid being a Procrustean outrage. The IEEE formats, as Steve says, are about uniform standards. None of them support in full the rules of arithmetic, such as associativity of addition. There are many other possible formats, some of which do support associativity of addition (and multiplication), but they require unbounded memory allocation; i.e. the amount of memory required to store a number depends on the number. One huge virtue of the IEEE formats is that the amount of memory needed to store a number is independent of the number. If you really want the rules of arithmetic to hold on the nose then you need something like lazy streams of rational numbers. The only number systems that do work exactly on computers are modular arithmetics with small moduli, or algebras over small Galois fields. But these have few physical applications except cryptography.

Jan 16, 2018 3:31pm Jeffrey Lee (213) 6046 posts	Just, you know, idiot question here. Why do we need accuracy at, like, seventeen decimal places? If that, why not eighteen? Nineteen? When is it “enough”, as we’re only really making a digital approximation of a number not easily stored. You’ve just answered your own question – we’re making a digital approximation of a number not easily stored. The more accurate that approximation can be (without requiring extra storage space), the better it is for everyone concerned. decimal places For the most part, decimal places are the wrong thing to be looking at, since modern computers perform all their FP in binary. It’s only us humans who insist on using our outdated decimal counting system that care about decimal places ;-) Measuring/specifying the accuracy/error of floating point calculations is probably best done by calculating how many ULPs there are between the two values. When is it “enough” The basic rule is that calculations should be performed as follows: The input is zero-expanded out to infinite precision. E.g. a 4-bit binary floating point number 1.101, if expanded out to infinite precision, would be 1.10100000… The calculation is performed at infinite precision, producing an infinite-precision (and therefore precise) result The result is rounded to the destination width For simple arithmetic operations (addition/subtraction/multiplication/division/etc.) this is easy to do since there are well-understood rules for how many digits a result might contain (e.g. multiplying two 32 bit numbers won’t require more than 64 result bits). So the spec dictates that the results of these operations must be exact for any combination of inputs. For the transcendental functions the spec is a bit more relaxed, under the recognition that it could be quite hard to produce efficient hardware/software implementations which perform the calculations at “infinite” internal precision before rounding to the result to the destination width. Binary to decimal conversion faces a slightly different problem, because you often don’t specify ahead of time how many digits you want the output to contain. E.g. 2^64 can be precisely represented in a single-precision float. If you were to convert it to decimal, you could get any number of different results: 18446744073709551616 (full result, integer) 1.8446744073709551616E19 (full, scientific notation) 18446744100000000000 (minimal result, integer) 1.84467441E19 (minimal result, scientific notation) 184467441E11 (minimal result, trailing zeros trimmed) For a single-precision float, only the first eight or nine decimal digits are significant. So all of those numbers are correct, and if you were to type those numbers back into your program it should result in the same floating point value of 2^64. But to a human they look very different, potentially misrepresenting how accurate the computer really is. If the floating point number is exact, the first two ways of presenting the result would be the most sensible If the number is inexact, the last two would be the most sensible The third isn’t very useful under any situation, since it suggests that the trailing zeros are accurate (in fact it’s probably wrong to show the third at all, since it’s showing zeros past the point where rounding has been performed) Unfortunately floating point numbers don’t contain an “inexact” bit, so a standard number formatting routine can’t intelligently select between “full” and “minimal” representations, so I think most will err on the side of caution and produce a minimal result. “But how do I know how many digits my minimal result should contain?” I hear you ask – luckily that’s easy to calculate, and the ’08 version of the spec contains the formula.

Jan 16, 2018 6:21pm Rick Murray (539) 13392 posts	Thank you for your interesting replies, especially Jeffrey’s detail, but kudos also to the Procrustean¹ outrage. :-) I think it worth considering my weather forecaster ( http://heyrick.ddns.net/ ) where the recent history reads as follows: 00 18/01/16 19:00 F 1007.8 >C 08.1 66 002.07 000.00 N 6.1199999992 01 18/01/16 18:30 F 1008.0 >C 08.1 68 003.10 000.00 NW 5.4000000004 02 18/01/16 18:00 F 1008.3 >C 08.4 70 003.10 000.00 W 8.2800000012 03 18/01/16 17:30 F 1008.1 >C 08.9 68 003.10 000.00 NW 5.0399999991 04 18/01/16 17:00 F 1008.7 >C 10.1 63 003.10 000.00 NE 3.96 05 18/01/16 16:30 F 1007.9 >C 10.4 61 003.10 000.00 WNW 5.0399999991 06 18/01/16 16:00 F 1007.7 >C 10.5 63 003.10 000.00 WNW 4.3200000003 I ought to mess with @% to clip the format to two decimal places as those results are daft. I haven’t because I keep thinking one day I’ll rewrite it in C²… …point is, though, the computer (in this case, standard BASIC) clearly has difficulty representing numbers that seem quite simple to us. 4.32? 5.03 (or should that have been 5.04?)? Nope! Thanks again. ¹ Amazingly, I knew who Procrustes was. The ancient Greeks had a rather…pragmatic…manner of solving problems. :-) ² C won’t make the issue go away, but the printf format specifier %.02f can hide it.

Jan 16, 2018 7:03pm Rick Murray (539) 13392 posts	If anybody has the time/inclination: A floating point value is stored as a sign bit, followed by a number of exponent bits, and finally a larger number of fraction bits. A single precision number is described in the FPA datasheet (page 8 real, 12 in file) as: 31 30 . . . . 23 22 . . . . . . . . 0 Sign Exponent (msb) Fraction (lsb) If the number was: 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 How does one convert this to a readable value? Conversely, how does -1.234 get converted to the above format? Is the fraction “1234” and the exponent “3” (meaning push the dot 3 places to the left)? Is that how it works? It must be harder, as the first value is an exponent of 85 and a fraction of 2796202, so… um…?

Jan 16, 2018 8:06pm Jeffrey Lee (213) 6046 posts	If you start from the equation `number = mantissa * 10^exponent`, and ignore some special cases (like zero): The exponent is in base 2, and it’s biased. So rather than being `10^N`, for single-precision floats it’s `2^(N-127)` If your float format only had 4 fraction bits, ‘abcd’, then the corresponding mantissa would be ‘1.abcd’. I.e. the fraction bits really are the first 23 fractional bits of a number of the form ‘1.N’ The sign is 0 for positive, 1 for negative So the resulting equation is essentially `number = (-1)^sign * (1 + fraction/(2^23)) * 2^(exponent-127)` Wikipedia has a much wordier version. https://en.wikipedia.org/wiki/Single-precision_floating-point_format

Jan 16, 2018 10:54pm GavinWraith (26) 1531 posts	The dominance of the floating point format, even down to the CPU level,is a testament to the importance of real numbers in physics, and to the relative unimportance of integers and algebra. Maybe the rise of cryptography will lead to standard implementations of other number formats? Occasionally I had to teach engineers, among whom I discovered a curious misapprehension about numbers: they confused numbers with notations for numbers – thing with name. They were not happy with pi, for example, because they thought a number was a (necessarily truncated) decimal expression. So symbolic expressions, or the idea that a number could have a multiplicity of descriptions, were seen as a cheat. When I pointed out that 5 and 2+3 both described the same number they did not agree; 5 was a number, but 2 + 3 was an expression. It is computer people, not mathematicians, who make a distinction between literals and non-literals. Of course ‘pi’ consists of two letters, but its decimal description requires an infinity of digits. Usually the decimal description of a number is not very interesting. Indeed, single numbers are rarely interesting, compared to the expressions from which they are evaluated. A colleague of mine, Yoshindo Suzuki, once offered a crate of beer to the first student who could give a complete and accurate specification of two to the power of the square root of two. He was not asking for any digits of its decimal expansion – he wanted a definition of what it was from first principles (i.e in terms of positive whole numbers). He never had to pay up. He chose that number because `2^sqrt(2)` is a simple expression, but to define it precisely you need to know what real numbers are; something that most students think they know but usually do not. Did you ever come across a short story about a young man returning home, in the valleys of South Wales, from college, being asked by his Mam what he has to study? She gets very agitated when he describes the decimal point. But who owns it? , she asks. What right have they got to shift it right and left like that?

Jan 17, 2018 12:22am Rick Murray (539) 13392 posts	5 was a number, but 2 + 3 was an expression I will leave you (should have gone to bed an hour ago!) with something to think about… I agree. Not because I’m a programmer, but because it took a lot of thinking to understand that two plus three is five. Two is not five, three is not five, but the two of them are. They cease becoming a two and a three and become something completely unrelated, a five. Or to put this into context, the equation is b and c together become e. Or an apple and a pear becomes a banana. Once you’ve got your head around that, you’ll start to understand something that eluded far too many teachers ;-)

Jan 17, 2018 7:47am John Williams (567) 768 posts	Or an apple and a pear becomes a banana. Now I was following you up to there! Now had the result been a fruit pie or cocktail …

Jan 17, 2018 10:46am Steve Drain (222) 1620 posts	But are they accurate? [STR$ and VAL equivalents] For accurate implementations, you need to look at things like Dragon4, Errol, Grisu3, etc. I now appreciate the problems with the simplistic way I implemented these. They are useful, but not sufficient for the purposes you have outlined. Back to the drawing board. ;-)

Jan 17, 2018 11:15am Steve Drain (222) 1620 posts	I was curious what newlib does for the transcendental functions […] The first attempt was to use single/double precision Taylor series’s from Cody & Waite I feel confident in challenging the use of “Taylor series” here. They are polynomials, but use coefficients calculated for a particular approximation that reduce the number of terms. Taylor series themselves can rarely be used for many reasons. The new version seems to be a mixed bag. Some look like they’re still based around the Taylor series (e.g. sin), others are different (e.g. log) sin is a straightforward polynomial, log is a ratio of polynomials. I have had a closer look at the FPE source and ‘Cody and Waite’ and I recognise a few things. ;-) C&W is a ‘cookbook’ and it refers back to ‘Hart et al’, which is from where I drew my own algorithms. What the FPE source and C&W do, of course, is the professional bit of coding that is outside my experience. One problem that I noted above, and feel might be insurmountable, is that the FPE uses extended precision throughout in the algorithms, whether using the FPA or its internal code for the arithmetic functions. Using VFP we can only use double precision and I think there must be an inevitable loss of precision. I may need some tutoring about how VFPv4 can overcome this. ;-) I did get a hint that replicating extended precision using NEON integer operations might be the way. Edit: replaced ‘constants’ with ‘coefficients’

Jan 17, 2018 11:47am GavinWraith (26) 1531 posts	When I had a BBC B computer I bought an excellent book with a commented disassembly of its ROM. Its routines for transcendental functions used continued fraction approximations; in effect approximations by rational functions. These are much more efficient than Taylor series, which give approximations by polynomial functions. You use lookup tables for the coefficients. I always reckoned that they should have made it possible for the user to supply her own lookup tables if she should happen to need a function not provided in the ROM. The code is basically `for n = 1,N do x= 1/(a[n] + x*b[n]) end` .

Jan 17, 2018 12:35pm Steve Drain (222) 1620 posts	continued fraction approximations To expand slightly, and touch the edge of my understanding: sin is a straightforward polynomial That is a Chebyshev polynomial, derived from but not the same as a Taylor series. log is a ratio of polynomials That is a compact form of a [truncated] continued fraction. Either method might be used for any of the transcendental functions, depending on the range and precision required. They are each evaluated using a table of coefficients, which can be ‘plugged into’ the appropriate routine. The polynomials are efficiently calculated using only multiplication and addition using ’Horner’s method’. That is why the ‘fused multiply and accumulate’ (VFMA) operation in VFPv4 is very useful. Edit: add the word [truncated]