category: Specification <div id="toc_heading"></div><div id="toc"></div> h2. Goals The main goals are: # To allow applications which make use of standard threading APIs (e.g. C11 threads via SharedCLibrary, pthreads via GCC's UnixLib) to run in a threaded manner across multiple CPU cores (or just one core for single-core machines), while requiring minimal RISC OS-specific changes to be made to the applications # To build a stable foundation on which future changes to the OS can be made which will improve the performance of threaded programs (preferably without requiring new versions of those programs to be released) # To support multi-core use across all RISC OS machines which support the ARMv7MP architecture extension (which is all ARMv7+ multi-core machines the OS currently runs on) # To achieve the above while minimising breakage for existing programs There will also be some degree of support for using threads from modules (in an SMP manner), but this is likely to be in a more restrictive environment than that available for applications (e.g. limited SWIs available). Older multi-core systems/solutions (e.g. the ARMv6K-based ARM11 MPCore CPU) are highly unlikely to be supported. Getting the multi-core support working reliably is likely to involve making several missteps along the way, so the changes which allow for the extra cores to be utilised are likely to only be merged into the OS towards the end of the project. However other improvements can be drip-fed into the main sources as and when they're finalised. h2. Existing documentation h3. Relevant specifications * "C11 concurrency support library":https://en.cppreference.com/w/c/thread (threads.h, stdatomic.h) * "ISO/IEC 9899:2011":https://www.iso.org/standard/57853.html (aka C11) * "pthreads specification":https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/pthread.h.html * "DDI0406C.d":https://developer.arm.com/documentation/ddi0406/cd/?lang=en – ARMv7-A & -R architecture reference manual h3. Relevant forum threads * "Thinking ahead: Supporting multicore CPUs":/forum/forums/5/topics/406 (mostly historic discussion) h2. Terminology *Primary core* - the CPU core which boots first *Auxiliary cores* - the extra CPU cores which the multi-core changes aim to unlock the potential of h2. Detail h3. Initial HAL changes New HAL APIs are required to allow the OS to perform the following tasks: * Identify how many CPU cores are available * To identify the current CPU core * To allow for bringup of the aux cores * To control how interrupts are routed to each core (or to determine any fixed interrupt routing) To support the above, the following new HAL entry points have been (re)defined: * [[HAL_IRQProperties]] (#53) * [[HAL_IRQSetCores]] (#54) * [[HAL_IRQGetCores]] (#55) * [[HAL_CPUCount]] (#56) * [[HAL_CPUNumber]] (#57) * [[HAL_SMPStartup]] (#58) See "Kernel/Docs/SMP/HAL":https://gitlab.riscosopen.org/RiscOS/Sources/Kernel/-/blob/master/Docs/SMP/HAL and "Kernel/Docs/SMP/IRQ":https://gitlab.riscosopen.org/RiscOS/Sources/Kernel/-/blob/master/Docs/SMP/IRQ for more in-depth information about these calls and any changes to the specification of existing HAL APIs (i.e. expected SMP safety of different HAL entry points) Several HALs have already been updated to support the above specification revisions, with the code merged into the main OS sources (BCM2835, OMAP4, OMAP5, iMx6). Out of the other HALs in the main OS sources, the Titanium and PineA64 HALs are yet to be updated. h3. More powerful exception recovery Mutexes, semaphores, spinlocks, and other mechanisms are critically important to being able to make components MP-safe. If code crashes while a lock is held, there needs to be some way of unlocking it (and potentially taking other recovery actions) in order to avoid a deadlock when something tries to use it next. "Stack-based exception handlers" have been proposed as a solution for this ("forum thread":/forum/forums/3/topics/17152, "merge request":https://gitlab.riscosopen.org/RiscOS/Sources/Kernel/-/merge_requests/60), which allow privileged-mode code to push special exception handler nodes onto the stack. Whenever the OS resets the stacks (e.g. after an unhandled data abort) it will invoke those handlers to allow them to perform any necessary actions such as unlocking spinlocks. More iteration & dogfooding is needed before the code is ready to be merged, including checking to see how feasible it is to allow handlers to halt unwinding half-way (e.g. to shield foreground threads from crashes in interrupt handlers, as per RISC OS Select) h3. MP-safe OS_ClaimProcessorVector The current OS_ClaimProcessorVector API has flaws that cause problems for both single-core and multi-core use. "A new version of the API":/forum/forums/3/topics/16458 has been proposed and "implemented":https://gitlab.riscosopen.org/jlee/Kernel/-/commits/NewCPV but not yet merged in to the main sources. "See this forum post":/forum/forums/3/topics/16458?page=6#posts-131225 for details of the new API. Software which is using the old API will only have its handlers called for aborts/exceptions which occur on the primary core. Software using the new API will be called for aborts/exceptions that occur on any core on which the OS is running. To finish off these changes, the API design feedback needs reviewing, and any appropriate implementation changes need to be made. To allow the implementation to be merged into the main sources (and the pending changes for other components which have been modified to use the new API), it may be necessary to make the use of spinlocks optional in order to avoid any negative performance impacts on single-core machines (and the current single-core version of the OS). h3. MP-safe FPEmulator and VFPSupport Floating point support is a fundamental part of the execution environment which lots of software relies upon. Therefore MP-safe versions of FPEmulator and VFPSupport are important stepping stones to allowing every-day code to run across multiple cores. Building ontop of the new OS_ClaimProcessorVector API, MP-safe versions of these modules have been produced, but until the new OS_ClaimProcessorVector API has been finalised (and preferably merged) these modules can't have their changes merged: * "FPEmulator":https://gitlab.riscosopen.org/jlee/FPASC/-/commits/SMP * "VFPSupport":https://gitlab.riscosopen.org/jlee/VFPSupport/-/commits/SMP h3. The SMP module The SMP module defines and implements the low-level threading APIs (threads, mutexes, condition variables, etc.) which the OS exposes to applications & modules. Currently it's also responsible for bring-up of the aux cores (and the kernel which they run), process management, and thread scheduling. The initial version of the module was produced in 2017, providing a minimal threading environment for the aux cores. All development since then has taken place in the SMPthread branch, with the changes collected "in this pending merge request":https://gitlab.riscosopen.org/RiscOS/Sources/Programmer/SMP/-/merge_requests/1. This new version is significantly improved, but still incomplete. Missing features still need implementing, along with lots of code tidying, testing, hardening & bug fixing. The module also hooks itself into the kernel/OS in a very unsatisfying way - ideally some of the code from the SMP module (e.g. thread-safe IRQ & SWI dispatch) should be merged into/implemented within the kernel. h3. C11 threads in the Shared C Library This can be broken down into several smaller chunks of work: h4. Atomics support Core code has been implemented and merged in as "CLib 6.14":https://gitlab.riscosopen.org/RiscOS/Sources/Lib/RISC_OSLib/-/tags/RISC_OSLib-6_14. Compiler extensions are required to support both @_Atomic@ and @_Atomic()@ (currently only @_Atomic()@ is supported), and to ensure that doubleword types are doubleword aligned. The recent addition to the compiler of the ACLE @__ARM_ARCH@ predefined macro could allow atomics.h to be improved to use inline assembler instead of library calls when targeting new CPUs. h4. Threading support on multi-core systems Work has started on implementing the C11 thread APIs as a wrapper around the SMP module threading SWIs, but is incomplete. h4. Threading support on single-core systems Not yet started. It's not yet been decided whether this will take the form of a callback-based thread scheduler implemented within CLib (no extra dependencies required), or by producing a single-core version of the SMP module (which already implements a callback-based scheduler for running the scheduler on the primary core) h4. Thread-specific storage support Not yet started. On ARMv7+ this is best done by using the CP15 thread ID registers. Supporting the @_Thread_local@ storage specifier will require the compiler to be extended. h4. Thread-safety of the Shared C Library Library entry points need checking and making thread-safe where appropriate. h3. pthreads in GCC's UnixLib UnixLib already contains a pthreads implementation that runs on single-core machines. This will modifying to detect the SMP module and use it for threading, and UnixLib as a whole will need reviewing to ensure it's MP-safe where appropriate. h3. This is just the tip of the iceberg (Many more things need listing) h2. Implementation progress table(bordered). |_<. Phase |_=. Status |_=. Completion |_<. Latest updates | |<. Conceptual design |=. In progress |=. 0% |<. 26-Dec-2021 Document created | |<. Mock ups/visualisation |=. - |=. - |<. - | |<. Prototype coding |=. - |=. - |<. - | |<. Final implementation |=. - |=. - |<. - | |<. Testing/integration |=. - |=. - |<. - | h2. Document history v1.00 - 26-Dec-2021 * Outline added v1.01 - 19-Jan-2023 * Goals, Existing documentation, and part of the Details section filled in.