category: Specification <div id="toc_heading"></div><div id="toc"></div> h2. Goals The release in 2011 of architecture ARMv8 added a whole new instruction set called AArch64 with wider 64b register bank, and renamed what was previously referred to as the ARM instruction set (which RISC OS uses) as AArch32. The AArch64 instructions are not binary compatible with the older ones. Until very recently, the ARMv8 targets we would like RISC OS to support have fortunately still contained an implementation of AArch32 for backwards compatibility, allowing RISC OS to run and entirely ignoring the AArch64 aspects. As at 2020 it was becoming clear that Arm intended to wind down ARMv7 and earlier (fewer than 50% of the "cores available to license":https://www.arm.com/products/silicon-ip-cpu would run that way) to focus on their ARMv8 offerings, of which many dropped AArch32 support entirely. In April 2021 the ARMv9 architecture "was announced":https://www.arm.com/why-arm/architecture/cpu with AArch32 relegated to being a license option, in much the same way that 26 bit mode became an option in ARMv4 rarely taken up. At the end of September 2023, Raspberry Pi announced their new Raspberry Pi 5, with a 64-bit quad-core Arm Cortex-A76 processor. While this still supports 32 bit instructions, crucially only in 'user' mode. RISC OS makes heavy use of 'supervisor' mode, which this processor doesn't implement. As such, there is no straightforward way to port RISC OS to the Pi 5. This proposal identifies aspects of RISC OS that will require attention in order to migrate away from AArch32, to help making design decisions, and ultimately a route to implementing changes to ensure there are chips in future to run on. h2. Existing documentation h3. Relevant specifications "DDI0847":https://developer.arm.com/documentation/ddi0487/latest - The ARMv8 architecture reference manual (A profile) "DDI0608":https://developer.arm.com/documentation/ddi0608/latest - The ARMv9 supplement to DDI0847 "IHI0055":https://github.com/ARM-software/abi-aa/releases - Procedure Call Standard for the Arm 64-bit Architecture h3. Relevant forum threads "No more big 32-bit cores for RISC OS from 2022":/forum/forums/5/topics/15704 "What would AArch64 BASIC look like?":/forum/forums/9/topics/15421 "ARMv8 support":/forum/forums/5/topics/3953 Secondary relevance: Could you run on an "Apple MacMini":/forum/forums/9/topics/15400 and "RISC OS on hypervisor":https://www.riscosopen.org/forum/forums/2/topics/15679 h2. Terminology The *AArch32* is the family of instructions with 32 bit wide integer registers addressing _up to_ 2 ^32^ bytes of memory, and also includes the 26 bit addressing mode too. The *AArch64* is the family of instructions with 64 bit wide integer registers addressing _up to_ 2 ^64^ bytes of memory. h2. Detail h3. Possible approaches Since RISC OS is coming to the 64 bit scene rather late, it has the advantage of being able to look at how other mainstream operating systems have approached the problem (though some solutions may not be practical on an Arm processor). # -Do nothing- -This approach simply ignores AArch64 altogether.- This approach has been ruled out as it is highly likely that no more new chips will be released which can run RISC OS in the 5-10 year timeframe. # -Leave the OS AArch32, invest time in emulation/virtualisation- -This approach leaves the OS and applications unchanged, and instead looks to have it run as a guest inside some kind of emulator or similar. This technique is used, for example, to allow old industrial control applications to be run on Windows XP on a virtual machine (such as "VirtualBox":https://www.virtualbox.org/). An _emulator_ aims to replicate the entire machine and all its peripherals, and in that regard it doesn't even need to be hosted on an Arm processor at all, while _virtualisation_ runs the code in a protected bubble but on the actual hardware with some kind of supervisor watching over it.- This approach has been ruled out as although it is an quicker route to get something running, should we wish to make use of other large Open Source projects (such as web browsers or media codecs) these have typically been recoded to assume a 64 bit system and their 32 bit forms have been discontinued, leading to extra effort for anything other than bespoke RISC OS software. # Mixed mode with privileged mode AArch64 and user mode AArch32 Starting with ARMv8 it is possible for both instruction sets to be supported on a single chip, so to reduce the burden the core of RISC OS which runs in a privileged mode could be converted to 64 bit while the applications remain 32 bit. This stepping stone approach would buy some time for applications authors to update to support the newer way of working. There are some technical issues to this concept: for example when switching between 32 bit and 64 bit mode the upper half of the (wider) registers are undefined, which would mean the RISC OS kernel has to spend time in the SWI despatch interpreting what the registers mean and adjusting where appropriate. While the current ARMv9 family of processors do, on paper at least, offers AArch32 at EL0 as a licence option it seems commercially unlikely they will ever be realised in silicon (why would a licensee pay for the option which they don't need?). In summary this approach could provide a useful development stepping stone, but ultimately it's a dead end once the ARMv8 chips dry up. # *Run a 64 bit OS with 64 bit apps and run 32 bit apps under emulation This approach goes directly to a 64 bit system with matching applications. Once the problem areas have been identified it becomes easy to swap back to a 32 bit system by recompiling the source code, and any new applications are immediately able to take advantage of the extra memory too. For those applications where recompiling isn't practical they would be run in an emulation solution. Since they are application level code there is no need for them to directly access hardware peripherals (which would be handled by the 64 bit operating system) which reduces the level of emulation needed. In other words, it doesn't need a full blown machine emulator such as RPCEmu, it's much more like the support Aemulor provides to run 26 bit applications in 32 bit mode today, except that not only would operating system calls need to be intercepted the instructions would also need to be translated into their 64 bit equivalents.* # Run a 64 bit OS with 64 bit apps (as above) but clamp the logical memory map to 32 bits This approach has all the features of the previous solution, but avoids the need to comb through all the locations where the change of size of an address pointer impacts SWI and service calls. This would limit the scope of changes needed to applications because although the instruction set has changed all the moments where data is transferred in and out of memory could be left unchanged. The downside to this is the maximum amount of memory that could be addressed would be 4GB minus any reserved for the operating system, which negates one of the key advantages of a 64 bit processor in accessing large amounts of memory for hungry applications like web browsers. In the remainder of this analysis we assume approach (4) is the primary solution, though to assist development elements of (3) and (5) may be used as temporary measures. This is not unlike the approach taken when getting the first 32 bit versions of RISC OS to work - initially they kept the old memory map with 28MB application limit, and once everything stabilised those limits were raised and more and more of the OS was moved higher up to free up the large application slot we enjoy today. h3. Changes to SWIs h4. Reduction in SWI number space The opcode under AArch32 has space for a 24 bit immediate value which encodes the call number, for a total of 16M possible unique values, though in practice some of the number space is used to encode other information such as bit 17 holding the non-error returning 'X' flag. The opcode under AArch64 has a reduced size 16 bit immediate value, for a total of 64k possible unique values, which clearly is insufficient to directly encode all of the currently circulated allocations. Despite the reduced number space, this does encompass the _high value_ OS SWI block from &00 to &1FF. Since the advent of split I+D caches with the StrongARM is has not been convenient to dynamically generate supervisor call opcodes without incurring a cache flush penalty, and the strong recommendation has been to call <code>OS_CallASWI</code> or <code>OS_CallASWIR12</code> instead. This indirect method, passing the call number via a register which is then passed for despatch via an OS SWI could be used to allow existing SWI allocations to be retained regardless of the instruction set in use. Programmers in C will be used to using the <code>_swix()</code> and <code>_swi()</code> functions which are already indirect calling methods, and programmers in BASIC use <code>SYS</code> which the interpreter can change to an indirect call as required. h4. Register widths Whereas current in AArch32 R0-R15 are 32 bits wide, and therefore can address any location in the 4GB of logical address space, with AArch64 the general purpose register bank can be viewed either as X0-X30 (64 bits each) or W0-W30 (32 bits each). The AArch64 program counter is not directly visible. Therefore, the AArch32 register bank could be viewed as a subset of the AArch64 register bank, in that parameters passed to a SWI executed as an AArch32 instruction could be losslessly passed to an AArch64 RISC OS kernel for handling by sign or zero extending as appropriate. Note that in a mixed AArch32/AArch64 system it is not guaranteed whether the narrower 32 bit registers (containing the SWI's parameters) are zero extended when entering the AArch64 exception handler. The <code>AArch64.CallSupervisor</code> pseudo code in the ARMv8 architecture reference manual is the clearest place to follow this through <code>AArch64.TakeException</code> you get to <code>AArch64.MaybeZeroRegisterUppers</code> where the loop to clear is conditional on <code>ConstrainUnpredictableBool</code>. h4. Survey of affected existing SWIs Assuming a register widening solution is adopted, the only places where the size of a pointer is of concern is where the pointer is passed in via a parameter block held in memory. This section surveys the core module SWIs for places where this technique is used in order to see how widespread a problem it might be. [[List of AArch32 affected SWIs]] h4. Extension modules where no SWIs surveyed are affected The following list modules which have been checked but whose SWIs don't have any potential pointer issues. Modules which don't implement any SWIs, such as application modules, are not listed here. <table><tr><td>* AcornHTTP * AcornSSL * ADFS * ATAPI * BCMSupport * BlendTable * BootFX * Buffer Manager * CDFS, CDFSDriver * ColourTrans * CompressJPEG * DDEUtils * Debugger * DeviceFS * DHCP * Dialler * DOSFS * DragASprite * DrawFile * FilerAction * Filter Manager * Font Manager * Free * Freeway * FrontEnd * FSLock * GPIO * Hourglass * IIC * Internet (Socket) * InverseTable * Joystick * JPEG </td><td>* MakePSFont * NetFS * NetMonitor * NetPrint * NetTime * NFS * Parallel Device Driver * PDriver * PDumper * Portable Manager * RamFS * RedrawManager * ResourceFS * RTC * RTSupport * ScreenBlanker * ScreenFX * ScreenModes * SCSIFS, SCSIDriver * SDFS * ShareFS * ShellCLI * SMP * Sound (Level 1), Sound (Level 2) * Sound Control * Squash * SuperSample * TaskWindow * Toolbox * URI * VCHIQ * VFPSupport * ZLib </td></tr></table> h3. Changes to service calls h4. Survey of affected existing service calls Assuming a register widening solution is adopted as for SWIs, the only places where the size of a pointer is of concern is where the pointer is passed in via a parameter block held in memory. This section surveys the core module service calls for places where this technique is used in order to see how widespread a problem it might be. [[List of AArch32 affected service calls]] h4. Service calls surveyed where none are affected The following list service call ranges which have been checked but which don't have any potential pointer issues. Modules which don't implement any service calls are not listed here. <table><tr><td>* ADFS (&10800) * SCSI (&20100) * Wimp (&400C0) * NetPrint (&40200) * Toolbox (&44EC0) * SDIODriver (&81040) * IIC (&81100) * Window (&82880) * URL (&83E00) </table> h3. Implications on other in-memory formats The SWIs [[OS_File]] and [[OS_GBPB]] include some subreasons which deal with load and execution addresses. These are currently 32b quantities, albeit deprecated in use. Various places store these as 32b quantities for example: in the extended attributes of a ZIP file, in file server messages, in the directory entries of FileCore discs. The SWI [[OS_FSControl 12]] (Add FS) and [[OS_FSControl 35]] (Add image FS) pass a pointer to a FileSwitch [[FS Information Block]] which includes 32b offsets to functions to implement a filing system. Provided modules are not expanded beyond their existing maximum size of 64MB these 32b offsets will suffice because they are relative to the module base address. The SWI [[OS_SpriteOp]] doesn't make use of absolute addresses in memory. Provided sprites are not expanded beyond their existing maximum size of 2GB these 32b offsets will suffice because they are relative to the sprite area base. The 4 word [[MessageTrans]] block is opaque to the caller, so while it may contain a pointer, its layout could be changed without impacting clients. The SWI [[ResourceFS_RegisterFiles]] includes a block with a 32b offset to the next item to add in the chain, however that still allows blocks to be kept ±2GB apart. Devices registered via the list in R1 to [[DeviceFS_Register]] include a 32b offset to the device name as the first word of the buffer. This limits the string to be within ±2GB of the block. Toolbox Res files include 32b offsets (to the body, strings, etc) some of which are relocated when the Res file is loaded into absolute addresses. Mbuf Manager works with mbctl and mbuf structures, these contain both 32b function pointers and linked list pointers. h3. Execution formats h4. AIF and DebugAIF The AIF format has always included a flags word, with a small number of valid values allocated by Acorn or Arm for their use. Following dialogue with Arm there are still plenty of spare flag bits. Therefore, a flag bit can be allocated to denote the code was intended to be run on a 64 bit version of the OS and rejected on 32 bit versions. However, up to now RISC OS doesn't actively reject unknown flags in the AIF header when loaded - the error "Application is not 32-bit compatible" is actually generated when an attempt is made to initialise the SharedCLibrary using the APCS-R calling convention, rather by FileSwitch as might have been expected. Therefore changing the flag word at offset +0x30 in the AIF header wouldn't guarantee an error if attempting to run it on RISC OS 5 or earlier incarnations which expect either a value of 32 or 26 or 0 in the bottom byte of the flags and may have been written to not use the SharedCLibrary at all. Since application binaries are central to the purpose of an application a new filetype could be allocated for a 64 bit executable, but unfortunately all the filetype numbers in the range &Fxx are already exhausted. Instead, the AIF header will contain a minimal code sequence written using AArch32 instructions which throws an error. The flags word will contain 64 in the bottom byte of the flags, and the AIF header extended to be followed by a second header with extra fields and information relevant for a 64 bit environment. Compare with the "Portable Executable":https://wiki.osdev.org/PE format (PE) which prefixes the 64 bit executable with a simple MS-DOS program which just gives the error "This program cannot be run". h4. Binaries with load/execution addresses Plain binaries which are <code>*Run</code> and rely on the 32b load and execution addresses held in the RISC OS file attributes would either require a change to the FileCore logical format to support longer attributes, or be unsupported (the use of load/execution addresses has been deprecated for some time), or be limited to loading into the low 4GB of the memory map as presently. h4. Utilities Utilities run in User mode after being loaded into the RMA. The kernel currently runs utilities without type checks since in User mode any undefined instruction can only cause inconsequential damage. An optional "32OK" signature [[File formats: Utility|appended to the image]] is used by Aemulor to suppress its emulation, so precedent exists for adding "64OK" to denote the change of instruction set. Heuristics to detect AArch32 opcodes are likely to lead to false matches due to the AArch64 opcodes overlapping. h4. Relocatable modules Since RISC OS 5 the module header has [[File formats: Relocatable Module|included a flags word]] to provide for future changes. Therefore, a flag bit can be allocated to denote the code was intended to be run on a 64 bit version of the OS, rejected on 32 bit versions. In addition, the first few words are expected to be a limited subset of AArch32 instructions (B, BL, MOV) to add confidence to the decision. h4. Podule loader Podule loaders have 4xAArch32 instructions and an optional "32OK" signature at the start. Since AArch64 instructions are also 32b in size, and the signature could be changed, should there be an ARMv8+ machine with a podule bus, this can be accommodated. h4. Squeezing Modules and applications can currently be squeezed. The kernel includes a lightweight decompressor for "modules":https://gitlab.riscosopen.org/RiscOS/Tools/Sources/modsqz, and applications are typically "squeezed":https://gitlab.riscosopen.org/RiscOS/Tools/Sources/squeeze to reduce the time they take to load off disc. Both of these compression algorithms are somewhat biased as the tables used to pick out frequently occurring values are based on common AArch32 instructions. Given the increased performance of 64 bit systems and modern hardware it is likely that the compression algorithm could be changed for something more common - for example ZLib or UNIX compress. h3. Built in BBC BASIC The BASIC interpreter could need to address memory above the 4GB boundary, for example through <code>DIM</code> or interacting with SWIs through the SYS keyword, but with its current 4 byte integer variables would not be able to do so. Other dialects have already introduced 64 bit integers, so ARM BBC BASIC may be able to copy that syntax for declarations and indirection. h3. Changes to integer type sizes in C The integral types in C are unsigned/signed versions of char, short, int, long, long long, and pointers. C programmers are probably already familiar with the pitfalls moving between systems where an int has only 16 bits rather than 32, so how big might an int be in a 64 bit environment? When specific sized variables are required using one of the types from "<stdint.h>":https://pubs.opengroup.org/onlinepubs/009695399/basedefs/stdint.h.html is highly recommended. For everything else there's a design decision to make of how to change the integral types, which has its own terminology: * SILP64 - short/int/long/pointers become 64 bit (long long is already 64 bit), this is not widely used * ILP64 - int/long/pointers become 64 bit (long long is already 64 bit), this is not widely used * LLP64 - long long/pointers become 64 bit (int and long remain 32 bit) * LP64 - long/pointers become 64 bit (int remains 32 bit, long long is already 64 bit) Historically Windows defined lots of in-memory structures using the LONG type and "decided to pick LLP64":https://devblogs.microsoft.com/oldnewthing/20050131-00/?p=36563 as a means of minimising the impact on its API had all those LONG variables suddenly changed size, while most other operating systems chose LP64. Given most of RISC OS's APIs use int to mean 32 bit word it would seem sensible to follow the LP64 data model, and this is also what Arm recommend in their AArch64 Procedure Calling Standard. In terms of change, this does mean that a careful inspection of source code for places where the size of a long or size of a pointer was assumed to be 32 bits will be needed. In general this is only important when fixed sized structures in memory exist or where casting between pointers and integers has occurred. h2. Implementation progress table(bordered). |_<. Phase |_=. Status |_=. Completion |_<. Latest updates | |<. Conceptual design |=. In progress |=. 20% |<. 03-Oct-2023 Document updated (see history) | |<. Mock ups/visualisation |=. - |=. - |<. - | |<. Prototype coding |=. - |=. - |<. - | |<. Final implementation |=. - |=. - |<. - | |<. Testing/integration |=. - |=. - |<. - | h2. Document history v1.00 - 10-Apr-2021 * Outline added v1.01 - 01-Aug-2021 * SWI and service call sections expanded v1.02 - 18-Apr-2022 * Links to ARM ARM and most relevant forum threads added * Notes on register widths and SWI immediate constants added * Section headings for executable formats added (and moved podule loaders into it) v1.03 - 19-May-2022 * Expanded the executable formats section v1.04 - 11-Jun-2022 * Split off the long lists of SWIs and service calls into sub-pages to keep this document manageable v1.05 - 23-Jul-2022 * Added consideration of 64 bit integer types in BBC BASIC v1.06 - 24-Sep-2023 * Describe the 5 technical approaches considered, select a preferred approach * Revise AIF header change due to lack of checking of reserved flags * Add notes on how squeeze performs best on AArch32 due to its frequency tables v1.07 - 30-Sep-2023 * Update the 'Goals' section to reflect the news of the Raspberry Pi 5 launch. v1.08 - 03-Oct-2023 * Explain C language integer type size implications