VFP Emulator - Would It Work?
Graeme (8815) 103 posts |
I’ve been having a go at making a VFP Emulator. So far, load and store are working both single and multiples. There are working VMRS and VMSR that incorrectly allow full access but do work. Addition and subtraction working to perfection when compared with real VFP, including flags. The risks of creating this could be catastrophic and I wonder if there are any solutions or ideas on how it could work. Take the following code: … Under VFP, the first instruction will work. The second instruction (VEOR) is a NEON instruction and will either reset D0 to zero if this processor supports NEON or will cause an undefined instruction error. Under emulation, we have a problem. The first instruction will work. The second instruction will be completely ignored. It will not set D0 to zero but neither will it crash the code out. It will just plough on with D0 potentially at the wrong value. This is because NEON instructions (and a number of VFP) have the condition code set to binary 1111. The old ‘never execute’ condition. It will not jump to the undefined instruction vector which is used to emulate an instruction. With this in mind… is it really possible to make a VFP Emulator that would work as expected? Is there a list of VFP instructions that can be created by the C compilers? It does appear all of VFPv2 and most of VFPv3 could be emulated. Most of VFPv4 and none of NEON can. |
Rick Murray (539) 13751 posts |
Well, that’s a little inconvenient, isn’t it? I suppose a nasty way could be to scan for such instructions at load time and replace them with jumps into the emulator? |
Stuart Swales (8827) 1326 posts |
I think it’s likely that if VFP is being targeted, the CPU selection will be for ARMv7 (or 8) so that’s even more pain. Certainly that’s the route I have chosen with Fireworkz when it’s compiled against my VFP supporting library (using /softfp as Norcroft doesn’t yet do VFP).
You’d need to construct a set of trampoline-like objects somewhere when each application was loaded as the instructions can encode all sorts of register use, e.g. a replacement branch to some code in a DA that was a branch into the emulator followed by the registers that the replaced instruction was using (simplest to just dump the replaced instruction there). |
Chris Johns (8262) 242 posts |
Could it be made to work? Probably. is it a good use of time? IMHO not. The only machines that would need it are (I think) IOMD and Iyonix, and really we need to draw a line in the sand and say “sorry, we aren’t going to support 20+ year old hardware anymore.” |
Steve Pampling (1551) 8126 posts |
The counterargument is that emulated IOMD is sitting on a platform that is probably as new as the Pi2 and quite likely newer. |
Rick Murray (539) 13751 posts |
Yes. But the even older chestnut…by whom? |
Steve Pampling (1551) 8126 posts |
I did say it was an old chestnut, probably a bite stale. |
Dave Higton (1515) 3479 posts |
Irrelevant. |
Chris Johns (8262) 242 posts |
So we are coding for a 29 year old system so we can emulate it on a 5 year old one?
The people who would otherwise be writing VFPEmulator? FPEmulator was from a time where hardly anyone had hardware FP. It was an option for some of the higher end systems, but even then hardly anyone had it. The A7000+ was, I believe the only Acorn machine to ship with FP as standard. Writing code to do in enulated ARM what the host can do itself seems .. perverse at best. RPCEmu have ruled out doing it there, so other than forking RPCEmu that route isn’t an option. It would be better to spend the effort in targetting QEmu and saying 5.30 is as far as we got for IOMD, Iyonix and maybe some of the others and move on. |
Peter Howkins (211) 235 posts |
I believe you’re referring to the stance put forward in this note to the RPCEmu mailing list some time ago; http://www.riscos.info/pipermail/rpcemu/2021-August/002985.html Starting about half way down the mail you’ll see the arguments and technical discussions of implementing ARMv7 in RPCEmu. This was my conclusion then and remains so today; “The balance of ‘work’ versus ‘benefit’ will not be reached for me. However as this is an open project, anyone is welcome to work on any feature that they want to.” If someone wants to do it, they can, just it won’t be me, and I’d rather they went into it knowing, at least, some of the scope of the work involved. Practically if you want an emulator for a more ‘modern’ risc os platform the start points are 1) GXemul, for emulating an Iyonix as modified John-Mark Bell many years ago Both of these would need emulation deficiencies, with regards to the corresponding RISC OS, fixed, and ideally up-streamed. Again I have no interest in doing this work. |
Rick Murray (539) 13751 posts |
Slight difference in complexity between making a maths emulator… and a full system emulator!
Not a surprise given the maths chip alone was most of the cost of an entire RISC OS machine. September 1989 – AKA20, floating point expansion card £688.85 (inc. VAT). An entire A3000 was only £744.10, and a Master 128 £503.35.
It would be far better to update CLib and the DDE compiler to handle VFP, and let developers decide whether or not they wish to support older machines. After all, the ABC compiler can be set to FPA or VFP and it’s happy either way, so it’s not exactly impossible. |
Dave Higton (1515) 3479 posts |
I don’t want to hijack this thread. I just want to apprise this audience of a little thing I discovered recently. My FindIPP app doesn’t work on an emulated system because, as it uses mDNS, it needs to open a particular port on a particular multicast IP address. The host system has already got that address+port open, so FindIPP can’t use it. It’s never going to work unless the host’s mDNS daemon is disabled. All this does is show that there’s always going to be some case where emulation can not be perfect. Like I said, I don’t want to hijack this thread. |
James Peacock (318) 129 posts |
Played with GXemul a few years back to experiment with creating an OS. It made it really easy to get started as it had really simple to use emulated hardware. I even submitted a fix to some of the ARM MMU emulation, which was accepted by the maintainer but he indicated there wasn’t likely to much further development. If I were targeting it with RISC OS, I’d probably look at creating a HAL for the simple devices, rather than trying to extend the emulator to enumate complex hardware. |
Graeme (8815) 103 posts |
I’ve stopped the project. |
Sprow (202) 1150 posts |
As you already found out there are several different versions of VFP, and even sub versions within them where some exotic instructions require a helping hand from VFPSupport. The solution there is pretty simple by looking what is already done for disc based applications and their use of the ARM instruction set – you build them for the lowest common denominator ARMv3. If you know for sure which processor you’re running on you can then use the specific features of that, eg. an OMAP4 ROM can assume ARMv7 and so the compiler can use the Cortex-A9 features. So I’d suggest any VFP emulator only needs to support the lowest common denominator too. While I don’t have a definitive list the softfp functions are probably a good place to start as those are the primitives that expressions reduce down to: add, subtract, multiply, divide, convert to/from, and so on. It’s not a huge number of instructions, the list in the ARM ARM is a bit daunting but that’s just because ARM lumped in all the NEON ones beginning with ‘V’. I may have miscounted but I think there are only 28 actual FP instructions, and 5 of those (like VMOV immediate) aren’t in all of VFPv2/v3/v4 so could be ignored. To be clear, when I’m thinking about a VFP emulator it’s something which emulates VFP instructions on older ARMs, rather than trying to use VFP instructions to speed up the internals of FPEmulator. It’s also worth sparing a thought about the different number of FP registers (some have D0-D15, some D0-D31). Same principle again: for disc based apps if you want to provide a single !RunImage it’d have to limit itself to whatever the minimum number of FP registers is. Specific builds, such as !Draw in ROM, could use those extra registers of course, but a disc based !Draw wouldn’t.
Since NEON is not IEEE-754 compliant (“fast-n-loose”) I don’t think a compiler would ever intermix VFP and NEON in that way for applications that would ever be a client of a VFP emulator. GCC can vectorise some things, but you have to opt in to say you’re happy to accept the fast-n-loose method, so that can be eliminated by saying that’s not a default option in the same way you can use LDRH on a StrongARM but not on a Risc PC’s memory system…so don’t. An alternative view, as you already spotted NEON can’t be trapped as an undefined instruction because it sits in what was previously NV space, would be that the compiler could use Here are a few previous threads where the accuracy and provability of any implementation has cropped up, along with some of the pitfalls
One thing that might be worth a benchmark is whether, once in the emulator you then do the implementation with an FPA instruction and just let FPEmulator do the hard maths. VFPSupport makes use of the softfloat library, which is written in C. I like FPA (ultimately a hand tuned implementation in assembler in FPEmulator), I like softfloat in C, but which is faster? Fight! Ultimately it probably doesn’t matter as I don’t think the aim of the game here is faster-than-FPA performance, it’s to pivot the default position to one where all executables use VFP.
Oh. The earlier version of that post had a rosier outlook, and I’ve not yet come up with any technical reason why a VFP emulator couldn’t work and be a benefit if done accurately. |
Graeme (8815) 103 posts |
I was surprised at how few instructions VFPv2 actually has. That is why I though it may be possible to emulate VFPv2. A place where I have got stuck is the context swapping. The actual VFPSupport module would need to be emulated including all the SWIs of which one is VFPSupport_ChangeContext. This is called from the Wimp but I haven’t got around to checking if this is called even if the VFPSupport module is not present. If it doesn’t call it, then context swapping the emulated VFP might not be possible. |
Rick Murray (539) 13751 posts |
You’re in luck… ; save VFP context, lazily if possible ; MOV R0,#0 MOV R1,#VFPSupport_ChangeContext_Lazy+VFPSupport_ChangeContext_AppSpace SWI XVFPSupport_ChangeContext MOVVS R0,#0 ; Ignore error (probably means VFPSupport isn't loaded) Doesn’t look like it is wrapped in any conditionals. So it’ll try and fail gracefully. It’s in Wimp03. |
Graeme (8815) 103 posts |
I’ve stopped due to it been in assembly. It was a challenge. Limited registers was a headache. I also had little clue as to how floating-point numbers actually worked before starting this and had to run real VFP against my code to see what results matched and what didn’t. I got addition and subtraction working for 64-bit numbers only. It might be better in C/C++. This weekend I have GCC compiling the module example without using shared libraries (which may not work under the undefined instruction vector). At some point I may give it another go but in a high level language. It was working alongside my ACE module which does a lot of the low-level things for you so C/C++ code could well be possible. Interfacing the high-level and low-level may result in me asking questions on here. |