RISC OS Porting to the new ARM based Apple MacMini
Paolo Fabio Zaino (28) 1802 posts |
@ Jeffrey Lee
Agreed, yet all the info/resources provided/known comes from the internet, as there is no official documentation publicly available :) But thanks for the correction on the architecture detail, yup AArch32 is optional on ARMv8.3, that is true and from ARM themselves which do public all the details. But again the CPU is the last of the issues here, given that basically everything else in that chip is proprietary or not detailed at all AFAIK and so way more complex to handle than writing a an AArch32 to AArch64 JIT translator. |
Paolo Fabio Zaino (28) 1802 posts |
And just in case someone may be wondering: There are already AArch32 to AArch64 binary translators available. One easy to reach is TANGO, which works on Linux and can translate AArch32 to AArch64 as well as providing Emulated Linux32 API. So it can do JIT translation: https://www.amanieusystems.com/ P.S. I am not implying or suggesting any path forward, also because in the case of the OS itself there are probably better approaches like using virtualization for translation or static translation or even an hybrid approach depending on a lot of aspects and/or goals. However 32bit binary translation ala TANGO certainly helps in running old apps onto a potential 64bit version of RISC OS. |
David J. Ruck (33) 1509 posts |
Apple moved all their iOS applications to AArch64 3 years ago and dropped support for AArch32 in iOS11. As they design their own cores rather than using an off the shelf ARM core, any new Apple chip would be extraordinarily unlikely to have AArch32 support. Dropping AArch32 reduces the number of processor states and eliminates decode ciruity, freeing up silicon and simplifying the core, allowing additional optimisations. |
Paolo Fabio Zaino (28) 1802 posts |
Yes Apple (together with Marvell and Qualcomm and few others) are so called “ARM Architecture licensee” (or Architectural for someone) not an ARM Core (or Processor) licensee. So they are allowed to design their own architecture. Also reducing the supported ISAs helps with security as well as design and production costs reduction. No one is arguing on the presence or not of the AArch32, however for the sake of precision without official documentation we can’t for sure address the issue, although I agree that it’s very unlikely (unless Apple has other commercial plans for their chips) that the A12 has the AArch32 implemented. However given that there seems to be a lot of knowledge on what Apple is doing and is not doing (genuine questions): - How does the GPU in the A12 is working? Beside of the internet based resources that state the obvious: unified pipe-line and the 7nm architecture, does anybody knows how it exactly works? Any insight on ho wit communicate with the Video Processor and the display engine? - Any insight on if any dedicated ISA extension from Apple as been added? - Does the CPU still use ARM “SIMD” (NEON) or does it uses a custom Apple’s designed one to improve things like Metal or playing videos? - Any info on the Secure Enclave? - Any insight on the Vortex uarch? Beside the freely available data like the new DIV unit which improves performance per cycle on both int and fp numbers, 3 FP/vector pipelines, 6 int execution pipelines with two complex units, two load/store units and two branch ports? - Any insight on the Tempest cores uarch? apart from the obvious and publicly available info as they appear very similar (if not identical) to the original Apple 32bit Swift cores uarch? (maybe they have been ported to 64bit) and the even more obvious 32K cache and shared 2MB cache with the Vortex, the 3-wide out-of-order architecture etc.? |
John Ballance (21) 85 posts |
FWIW I’m also asking this sort of Q.. and will be considering the possibilities. The apple deal has a kickback. It is £500 to get at one of these mac minis, but they remain apple property and can be recalled at any time. I’m also suspicious that they will not look happily initially on anyone who isn’t using one of these for apple software. The port I’m currently working on does start 64 bit.. I could use all of u-boot, but I’m anxious to be clear just what is running outside of userspace, hence the approach. |
tymaja (278) 52 posts |
I came here to post this exact same post, also in jest. However when ARM-based MacBooks come out, it will be tempting to attempt a port. Specifically, imagine RISC OS running on an emulated 32-bit CPU, but with 64-bit support, and the 64-bit CPU is ‘real’. The OS ‘core’ is so quick that the emulation will be fast anyway, and people can start coding AArch64 in a comfortable environment! For self-modifying code, they had better not remove that, maybe you will need to jailbreak the laptop first. Native RISC OS will never be an option. I am looking forward to the ‘locks’ they put on their ARM based laptop. Imagine if it is ‘no OS downgrade, no apps unless we say so’ and all the usual Apple nonsense. I bet it will be! |
Paolo Fabio Zaino (28) 1802 posts |
@ tymaja
I am not trying to argue here (I think until we have factual information everything we say it’s fundamentally our own perception, which could be definitely close to the reality, but only facts will confirm or deny that). With that said: - Linux runs on MacBook Pro on the x86 model, and yes it does have some issues, but the issues come from the custom chip Apple as designed for security (the so called T2) which embed both TPM and HW acceleration features and Linux doesn’t play well yet with that (although if progresses are being made). For the record that chip is also causing issues to macOS itself (but obviously less than to Linux). - The ARM side can be pre-analised starting from xnu kernel on github. - About ARM32 vs ARM64, the last update for xnu is from 2 years ago and (contrary to some assumptions) it does have the building infrastructure for ARM32 and it does compile for ARM. - The A12 series was also released in 2018 (IIRW), so it’s possible that the xnu kernel on Apple’s github already contains some useful information. - For who is not aware of what I am talking about: XNU is the open source portion of macOS AND iOS, more commonly known with the name of Darwin or Darwin-xnu. It’s the kernel of macOS and iOS FOR ALL Apple devices. Before we continue the discussion some insight for the people who have less familiarity on reading kernel sources et similar: Support for ARM and ARM64 (Apple seems to be calling AArch32 ARM and AArch64 ARM64): Apple ARM Kernel config:
Apple ARM64 Kernel config:
Mach for ARM32:
Other examples for the ARM architecture (USBSTACK vs usbstack64):
And more example on the source of ARM32 vs ARM64 (file bsd/arm/types.h):
And again:
Anyway for more details please read the xnu source. And again please don’t get this like I want everyone to believe A or believe B, I do not know what’s in the A12 nor does anyone else here unless they work at Apple, in which case please help me to understand :) Just my 0.5c |
Chris Mahoney (1684) 2127 posts |
Apple has already stated that you can turn off the ‘secure boot’ stuff and run ‘unsigned’ operating systems. Apps can be installed from anywhere. |
Paolo Fabio Zaino (28) 1802 posts |
@ taymaja
Never say never, 10/15 years ago people were saying things like this and RISC OS is still here and having a new youth and very exciting time (at least for me, but given that quite few others have returned maybe this is not just my perception). Also, even in the case that for whatever reason it may result impossible to convert RISC OS to AArch64 (I am not aware of any reason for that to be impossible, however, for the sake of the conversation, let’s assume so by hypothesis): Research has done a lot of big steps and today we talk of Nested Virtualisation with Binary Translation. I know this may sound a bit gibberish, but it’s something many people used already, for example if at work you use Vmware on a 64bit system to run a 32bit system (or vice-versa) you probably used (without knowing it) HVX or BT32 Binary translation for the VMM unit that was executing your VM. Basically when VMware cannot use directly the host system (aka when not in Paravirtualization mode etc.), it has to instantiate a VMM (A complete virtual machine, so not just the virtual CPU) that will control the execution of your guest OS. Such VMM decide how the guest OS code and guest OS applications code has to be executed on the host. When such code is not executable directly then it gets translated by a binary translator (for example BT32) which will not only binary translate the original VM code but will also translate it so that it performs operations for the VM and not for the host system. Apologies if I went too deep or if my English is not good enough to explain it in a simple manner. However in the worst case possible, it is possible to run RISC OS as a guest OS of a binary translator. Performances can be warranted by two factors in this case: 1) Pre-translation of a BLOCK of code and not opcode-by-opcode (which wouldn’t make much sense anyway). This can also happen at installation point, so before you even try to start up RISC OS. 2) The memory bandwidth offered by a full blown Desktop or workstation system with 64bit and way higher clock speeds than a SoC Development Board. Now even if this may not seems to be native (and yes it is not fully native) it actually is quite deep and definitely is NOT emulated (where the host is actually executing the interpreter for the guest). |
David Feugey (2125) 2696 posts |
IMHO, that’s not the same. x86 and x64 are very similar. ARM32 and ARM64 are completely different stuff. |
Rick Murray (539) 13462 posts |
Wow. That seems remarkably open for a company known for its control freakery.
Indeed. An x64 still starts up as a real mode 286… All of the 16 bit and 32 bit instructions are still present, and if the OS supports the mode switching, will happily run 32 bit software natively on a 64 but OS.
Different to the point where it looks and feels like a completely different processor. mrs x0, spsr_el3 ubfx. x0, x0, #SPSR_EL_SHIFT, #SPSR_EL_WIDTH cmp x0, #MODE_EL3 b.ne elx_panic ldr x0, [sp], #0x10 b el3_panic Taken from: https://github.com/ARM-software/arm-trusted-firmware/blob/master/common/aarch64/debug.S |
Jeffrey Lee (213) 6048 posts |
To me, that example looks very much like ARMv7 code. mrs r0, SPSR_mon ubfx r0, r0, #SPSR_M_SHIFT, #SPSR_M_WIDTH cmp r0, #MODE_MON bne elx_panic ldr r0, [sp], #0x10 b el3_panic For me, the big differences between AArch32 and AArch64 come from the architectural differences, not the instruction set:
|
Paolo Fabio Zaino (28) 1802 posts |
@ David Feugey
And @ Rick
Please have a look at Tango, it already does AArch32 to AArch64 binary translation and works well, you can download it and install it on an Ubuntu Linux to test it. I re-link it here for your convenience: https://www.amanieusystems.com/ Tango also emulate Linux32 API so that the AArch32 application believes it’s being executed in an original environment. Beside of the fact that binary translation between AArch32 and AArch64 has already been achieved, let’s mention some other example of binary translation that worked even between completely different architectures: - Apple Rosetta 1, converted on the fly PowerPC binary into x86 (now tell me that the PPC architecture is very similar to a 286… joking Rick, not trying to be rude) - DEC binary translation to run binary versions of software developed for its VAX and MIPS architectures on their new Alpha AXP architecture. (For who isn’t practical VAX architecture was the most CISC and crazy architecture where basically you could run even memory-to-memory operations and yet binary translation translated into a completely different paradigm RISC and with register-to-register majority of instructions) So can you help me to understand what you guys meant with your comments please? I am not sure I understand. For example what does it has to do starting up in 286 mode or whatever could be the difference between AArch32 and AArch64 with the implementation of Binary Translation between AArch32 and AArch64? Beside, Apple is using Binary Translation now to convert x86_64 to AArch64. Also 286 memory model was completely different than the 32bit one, to the point Microsoft had to build a VM to run old DOS code even on Windows 95 which was not fully 32bit. Between x86 and x86_64 there are also different instructions for example Syscall and also Windows uses WOW64 to emulate 32bit API. Also Final note: Even if RISC OS gets fully rewritten in native AArch64, we’ll still may want to think at binary translation to run old applications. Again this may 0.5c. |
Chris (121) 470 posts |
Out of interest, how much harder/more work would it be to rewrite in C, rather than into another assembler dialect? Not disagreeing that emulation or binary translation would be good and necessary for both the OS and applications, just wondering, if you chose to go to the trouble of actually rewriting bits of the OS, whether going for a higher-level language might make more sense? |
Paolo Fabio Zaino (28) 1802 posts |
@ Chris I am definitely for C, not ASM. However such a choice (for the full OS) may lead to some significant change, for example rewriting it in C means RISC OS will have to become more similar to OSes like Linux. In very simplistic terms: where Kernel and CLib defines the OS it self given the dependency C can create over CLib. However I believe this is the direction already taken by ROOL and probably everyone else involved directly (at least I hope so!). Please correct me if I am wrong in my believes/hopes! |
Jeffrey Lee (213) 6048 posts |
I think rewriting in C (or some other higher-level language) is the most sensible option.
|
Rick Murray (539) 13462 posts |
Probably a bad example then. ;-)
I think the point we’re trying to make is that a 64 bit x86 processor still contains all the 32 bit junk inside, so providing the mode is set correctly by the OS, should be able to near enough natively run 32 bit apps. And, indeed, as it starts up looking like a 286, it is also capable of running 16 bit apps. Not a binary translation, actually running it native. ARM cores that have 32 bit and 64 bit present the selection per core. There’s overlap in the instruction set between, say, ARM32 and Thumb. But there’s no overlap between ARM32 and ARM64. And a processor that works only in 64 bit mode (as the new Apple one will undoubtably) doesn’t have the capability to run 32 and 64 with some tweaks (like setting mode bits). The 64 bit world is the 64 bit world. End of discussion. [there’s a reason Intel processors are the size of Wagon Wheels and require epic heatsinks, and ARM SoCs are smaller than some of the support chips!]
Of course. We’re already doing this with Aemulor for 26 bit style software. An OS without software is a dead OS, so some sort of translation will be essential. I agree that a rewrite in C (or something that can be compiled with minimal assembler bits, such as the HAL) is the way forward. However RISC OS has traditionally relied upon a lot of calls that have an assembler API and pay exactly zero attention to any sort of calling convention. It’s not impossible, it’s just going to be fiddly. The main problem, as always, is that one doesn’t just go and rewrite an entire functional OS in C as a weekend project. We need to clone Jeffrey. Several times.
I thought the point of PowerPC was to try to make a processor that wasn’t as s**t as x86. I wonder why it wasn’t continued?
Which pretty much breaks every module that hands out pointers to itself for code, data, or error blocks… which is damn near every module. This… will be interesting. <grabs bucket of popcorn> |
David Feugey (2125) 2696 posts |
Yes, but all of this is just emulation. If your processor is ARM64 only: “Tango’s performance running translated code is usually within 15% of native execution speed.” I love the usually :)
x86 code works at native speed under a x64 processor.
It’s. Better. POWER and OpenPOWER. |
Paolo Fabio Zaino (28) 1802 posts |
On this I totally agree. This exact problem was the origin of my frustration with RISC OS back at the end of the 90s, when such decisions should have been taken, when most of us were still able to make some money (somehow) working on RISC OS and so capable of justify the effort somehow. Now it’ll have to be done slowly and during the weekends, i’m afraid :(
Ok I understand now, you are talking of HW aided Virtualisation or Para-virtualisation, where basically you do not modify the code or just modify a very minimal amount or just few calls etc. I am talking of binary translation like Rosetta where you literally translate chunks of code from one format to another, again PowerPC binary into x86 binary, so there is no way the x86 can offer any support for that. Hopefully this is more clear now.
Agreed that is not going to be a simple work, but what could it be the alternative?
LOL You are right on the principle, however the PPC still exists and the latest model is Power9 for witch there is also a full Open Source motherboard available, nice system btw. Apple abandoned it at the time because mostly IBM god-complex did not help with making the PPC competitive with Intel and they really needed to get better performance and reduce their costs and Intel offered both at the time. Now Apple needs better thermals and Intel has failed to deliver that for a very very long time, at the same time they need to sell more Apple Silicon chips and they have managed to get closer to Intel performances with their Apple Silicon so it’s again time to change for Apple. |
Timothy Baldwin (184) 242 posts |
False. Writeable and executable memory is fully supported by ARMv8, unless disabled by software setting SCTLR_ELx.WXN to 1, except that memory writeable in EL0 is not executable in EL1. See table D5-33 the the ARMv8 ARM. Also the Linux port of RISC OS runs on AArch64 Linux on Neoverse N1 complete with readable, writeable and executable memory.
You are comparing processors of different performance. Fast ARM processors also consume a lot of power, for example 240 watts for the 96 core * 4 threads per core ThunderX3, or a reported 100 watts for 64 core AWS Graviton2 (about 2 seconds per RISC OS ROM build).
The OSLib API could be adopted as the RISC OS API, with automatic creation of wrappers on both directions. |
Paolo Fabio Zaino (28) 1802 posts |
@ David
You know VMware emulate the entire system right? You also know that ARM32 is going away right? So the only way to have a VM to run at almost native speed is to use para-virtualization which however requires the guest OS to be aware of the virtualized condition. Everything else requires different degrees of emulation.
Yup JIT BT, not the best approach, which is why Apple Rosetta 2 binary translate at installation time, not at runtime.
No this is actually not true. It has been proven many times that 32bit x86 code on a 64bit x86 runs slower, but the penalty is not dramatic, however it’s not native speed either, sorry. An application compiled 32bit requires OS API emulated in 32bit which on Windows also means passing syscall parameters through the stack and not via registers for example (but there is plenty of other examples). |
Jeffrey Lee (213) 6048 posts |
Memory can be writable or executable, not both at the same time (maybe you can work around that with multiply-mapped pages, but it’s still going to cause us to rethink how a lot of things are handled) I stand corrected! I guess I misremembered what the restrictions were. |
Chris Mahoney (1684) 2127 posts |
Unfortunately I stand corrected. The original statement was that you’d be able to run old versions of MacOS that are no longer signed, which I took to mean that it could run any unsigned OS. However, there is now a screenshot showing the wording ‘Allows any version of signed operating system software ever trusted by Apple to run’, i.e. it still needs to be signed, but that signature doesn’t need to be current. |
David Feugey (2125) 2696 posts |
Even in 10 years, I bet it’ll be easy to get 32bit ARM SOC.
But I can install a 32bit OS on my 64 bit processor. Then I have native speed. The 64bit x64 processors ARE also true 32bit x86 processors. No emulation is needed. No virtualization. Fully native. If it’s slower, it just because of some optimisations. That’s true for most ARM processors too. Broadcom BCM2711 is both an ARM 32 bit and 64 bit. Native for both. I’m not sure it’ll be the same for the Apple offer. And if it’s not the case, only a full emulation will help you, with very abysmal perfs. |
Rick Murray (539) 13462 posts |
Actually, it is native. What takes the time is something else translating all of the 32 bit APIs to 64 bit, and back again. That does imply a speed hit, but the instructions are executing natively and not via some sort of emulator or JIT.
Mmm, perhaps, but maybe not the sort of SoC that we are used to today. I can see there might be a need for a more powerful “Thumb” style SoC – allowing a more sophisticated style of memory management to allow for OS/app protection and such. But the things we’re used to like video hardware probably won’t be present. Indeed, these devices may stick with a 16 bit memory bus, or more likely come with Flash/SDRAM baked into the chip. You see, pretty much everything that ARM32 SoCs were good for… is now being answered by ARM64 SoCs. Indeed, some OSs have already ditched continuing 32 bit support, which means that it’ll be 64 bit going forward. Well, until there’s something bigger. ;-) I’m just trying to imagine what niche ARM32 would fit into that ARM64 wouldn’t be preferred.
Not quite. Native for both, but not native for both at the same time (on the same processor core).
Hmm, do we have practical examples of emulating a 32 bit environment on a 64 bit core? Something that’s a little better than Tango’s 15%? That actually seems kind of slow… just sayin’.
Of course. ;-)
And for comparison, is there a 96 core x64 processor? How does that rate? (size and power consumption)
Yup. I didn’t think Apple would want anything else to taint their shiny-shiny. God forbid somebody might find MacMini and Debian (for example) as being a pretty sweet combo that’s more productive than Apple’s OS. ;-) |