Using C in assembler components in Library

If you have a component (application, module) written in C then it's relatively easy to add assembler routines to the code and make use of them. But if you have an assembler component then adding C routines to it is a more complicated prospect. There isn't any de-facto method for adding C to an assembler component - each situation tends to be a little bit different, and so the suitability of different approaches will vary as well. This page aims to be a mostly informal guide to the problems that you'll face and some of the ways to work around them.

Note that this guide is mainly concerned with using code compiled by the ROOL DDE/Norcroft - other compilers are likely to have their own requirements.

*(WORK IN PROGRESS)*

<div id="toc_heading"></div><div id="toc"></div>

h2. The problems

First, a summary of the key problems you'll face:

* No shared C library.
** Any basic library functions (malloc, strcpy, printf, etc.) will have to be written yourself.
** You may also have to implement some internal functions that the compiler generates references to, e.g. division
* No automatic pointer fixup (the compiler needs extra help to cope with the "relocatable" property of relocatable modules)
* No automatic allocation of workspace for read-write/zero-init data
* Hard to get access to the workspace struct that assembler modules use
* Care needed when calling into C to ensure APCS environment is set up correctly

Now details about each problem and how to solve it.

h3. No shared C library

There's no easy solution to this one. If you only need a small amount of functionality, consider writing your own implementations of the code (or 'borrow' the CLib implementations if the license conditions of the component you're working on allow it). If you need to use large amounts of CLib functionality - or if the component isn't very large (e.g. one file assembler module) - consider just rewriting everything in C.

If you need to use large amounts of CLib functionality, and rewriting the component in C isn't an option, you might be able to write your own stubs to allow for linking against CLib. However that's beyond the scope of this guide.

h3. No automatic pointer fixup

In a linked program (whether that program is an application or a module), any references to code or data are typically represented in the form of absolute addresses. For applications this is fine, because all applications are loaded and executed from a base address of &8000. But modules are by definition relocatable, and so some address relocation is required. There are two address relocation schemes employed by compiled C code:

h4. Initialisation-time pointer fixup

With the @-RMF@ or @-util@ command line flag, the linker will automatically generate a special __RelocCode function which will perform pointer fixup on any pointers contained within the image. For a standard C module this is called by the module header/stubs produced by CMHG, as part of the module's initialisation entry point. For assembler modules (or for Utility files, for which there are no standard stubs) you'll need to call __RelocCode manually.

Notes:

* In addition to the standard APCS temporary/callee-save registers, __RelocCode will also corrupt fp.
* Because ROM modules are statically linked to a fixed address, ordinarily no __RelocCode function is generated. However there's no need for you to switch in/out calling of the function based on your build config - if the linker spots a redundant reference to __RelocCode then it will generate a dummy function that just does nothing.
* __RelocCode isn't unique to C code - assembler-only programs can also benefit from the relocation ability (e.g. if the binary needs to contain absolute addresses of code/data)

h4. Runtime pointer fixup of writable data

The compiler and linker organise writable data so that it's stored in one block of memory, similar to how assembler modules manage their writable workspace. In an assembler module, R12 is typically used as a pointer to the base of this workspace, and indexed addressing is used to access individual variables. For C, things are a bit different:

* APCS doesn't include support for using a specific register to refer to the base address of a programs workspace. So the compiler can't assume that a certain register always contains the module workspace address.
* Because the size of a programs workspace can only be determined at link time, using indexed addressing with load/store instructions isn't possible - by the time the program is linked, there's no guarantee that all the required variables will be within the 4K address range supported by LDR/STR.

Norcroft uses the following strategy to solve these problems:

* As far as the linker is concerned, writable data for a module is handled exactly the same as for applications. I.e. it's placed in either the read-write or zero-init section of the executable, and is assigned an 'absolute' address at link time.
* In order to relocate that absolute address to point to the runtime location of the workspace, a relocation offset is stored in a reserved area of the stack. The stack is used for this purpose because APCS defines R10 as being the stack limit (SL) register - therefore SL is always going to be passed into every function, and it will never have its value changed (since the privileged-mode stacks are all fixed size and the OS doesn't support extending them).
** Note that code compiled with -zps1 (which disables stack limit checking/stack extension) will still treat SL as a special register, preventing it from being used for any other purpose. However, switching to an APCS variant which doesn't use SL (e.g. -APCS /noswst) will cause SL to be treated as a general-purpose register and will prevent runtime relocation offsets from being used (cc 5.71 will generate a suitable compiler error if /noswst code tries to use relocation offsets).

Standard modules actually use two relocation offsets: The library relocation offset (used for relocation CLib's internal workspace) and the module/client relocation offset (used for relocation your programs workspace). These are at SL-540 and SL-536 respectively (and SL itself is nominally set at 560 bytes above the actual end of the stack). But when calling C from assembler, there are three important things to realise:

# You only need one relocation offset (since you won't be using CLib)
# The relocation offset isn't hardcoded in the compiler - it's determined at link time from the value of a symbol which you can define in your program (_Mod$Reloc$Off)
# As long as you avoid calling any code which uses stack limit checking/stack extension, SL doesn't have to point to the stack (in practical terms, this just means making sure stack limit checking is disabled when compiling your code, and that you only link to libraries that have been compiled in the same way)

In practical terms, this means that you can use the following approach when calling C from assembler:

# During initialisation, calculate the relocation offset and store it as part of your workspace ("example":https://www.riscosopen.org/viewer/view/apache/RiscOS/Sources/Programmer/Debugger/s/Debugger?rev=4.33#l637)
# Export _Mod$Reloc$Off as being 0 ("example":https://www.riscosopen.org/viewer/view/apache/RiscOS/Sources/Programmer/Debugger/s/CGlue?rev=4.1#l21). There's little point using any other value (TODO - check if negative values work)
# Before calling C, set SL to be a pointer to your CRelocOffset value (i.e. @ADD SL,R12,#:INDEX:CRelocOffset@)
# Make sure your C code is compiled with -zps1 to prevent the stack limit checks from becoming confused about the unusual SL value

By storing the relocation offset in your workspace, you also have a handy way of getting at your module workspace if the C code calls back into the assembler: @SUB R12,SL,#:INDEX:CRelocOffset@). You could even take this one step further and store the relocation offset as the first entry in your workspace, and use R10 as your workspace pointer instead of R12

Note that even if your program marks all static-storage variables (i.e. 'static' variables and any variables defined outside of a function) as 'const' you'll probably find that the compiler has stored some of them in either the read-write or zero-init areas of the image and is attempting to use runtime relocation offsets to access them. This is because the -zM switch changes the way the compiler assigns variables to areas (non-module code will have all const data placed in the read-only data region), although the exact reason why this change occurs is unknown ("there's some related discussion here":https://www.riscosopen.org/forum/forums/4/topics/2299#posts-28675)

h3. No automatic allocation of workspace for read-write/zero-init data

Related to the above, if you're not using the standard CLib stubs then your module won't have any workspace allocated for its non-const C data. However it should be quite straightforward to do this yourself:

# Allocate a block of memory large enough for the read-write & zero-init sections (use the special Image$$RW$$Base, Image$$RW$$Limit, Image$$ZI$$Base, Image$$ZI$$Limit symbols to determine the locations/sizes)
# Copy in the initial read-write data from the module image
# Zero-initialise the zero-init area
# Adjust the relocation offset as appropriate

h3. Hard to get access to the workspace struct that assembler modules use

There are actually two problems here - the first is that you don't have a pointer to the assembler workspace, the second is that you don't know the layout of the workspace.

h4. Workspace pointer

For a module, if you're using the "point SL at assembler workspace" trick mentioned above then you could write a small C function which uses inline assembler to convert SL back to a workspace pointer. Or you could have a small assembler function which does the same.

Another alternative (mainly used in HAL code) is to use the __global_reg storage specifier to permanently bind a register to a workspace pointer. But this will reduce the number of registers which the compiler is able to use for general operations.

Lastly, you could ensure that any C functions which require assembler workspace access are explicitly given pointers to the workspace (or to bits of the workspace which they need) - although this could involve refactoring lots of code, it may make things easier when dealing with the second problem of not knowing the layout of the assembler workspace.

h4. Workspace layout

The RISC OS build system has a Hdr2H tool which is able to convert definitions found in assembler 'header' files to C equivalents. However it's mainly geared towards numeric constants, rather than struct layouts - after all, there is no formal method for declaring a struct with typed members using objasm.

So although you might be able to use Hdr2H to get some details of your assembler workspace into C, it's far from a complete solution. Existing "C in assembler" components where the C needs access to the assembler workspace seem to fall back to just having two copies of the workspace defintion, one in C and one in assembler, and requiring developers to manually keep the definitions in sync. Rearranging the workspace so that all the members which C requires are at the start can help with this - the C workspace struct can then avoid defining the assembler-only members.

A more reliable solution to this problem might be a H2Hdr tool which is able to parse (simple) C structure definitions and convert them to equivalent assembler definitions. Or, since objasm's macros are quite powerful, it might possible to implement a macro-based solution (place all the structure and constant definitions in their own file, using a special syntax; use different macros in C and assembler to interpret that file and produce the appropriate definitions).

Another solution - albeit a very disruptive one for existing projects - would be to avoid using a fixed workspace definition altogether. All your workspace becomes exported C variables, and the assembler code imports those variables and applies the necessary relocation offsets. Objasm macros would ease the implementation of such a system, but it's probably a bit too nasty to be worth implementing.

h3. Care needed when calling into C to ensure APCS environment is set up correctly

Compared to some of the other problems listed here, this one's pretty straightforward to solve. But if you find that lots of places need to call to lots of (different) bits of C then it might be become a bit of a hassle.

Before calling a C function:

* Make sure function arguments are in the correct registers. For simple functions (bool/char/short/int/pointer arguments, bool/char/short/int/pointer result) the rules are pretty straightforward (see below). For more complex functions you should probably check the APCS spec.
** The first four arguments go in a1-a4 (R0-R3)
** Additional arguments are pushed onto the stack
** Bools should be 0 or 1
* It's customary, but perhaps not necessary, to set FP to 0. This will stop anything which attempts to unwind APCS stack frames from trying to unwind random garbage once it hits your assembler code.
* Make sure SL is set correctly (see the section about relocation offsets)

After calling a C function:

* Remember that a1-a4 (R0-R3) and ip (R12) will have been corrupted. A bool/char/short/int/pointer result will be returned in a1.
* Remember to discard any words of stack that were used for passing in extra parameters

If lots of places need to call into the same C function, it's probably worth creating a small assembler wrapper function which performs the above actions.

h2. Using C in the HAL or kernel

Using C in the HAL or kernel is in some ways more straightforward than using it from modules. The HAL and kernel aren't relocatable, so a lot of the problems related to code/data relocation go away.

h3. Standard HAL setup

Don't specify -zM to the compiler. Instead, specify an APCS variant that lacks software stack checking (e.g. -APCS 3/32bit/nofp/noswst). To provide the C code with access to the HAL workspace, declare the HAL workspace as a struct, and in a common header use @__global_reg(6) halworkspace_t *sb;@ to declare that v6 (i.e. "SB in the HAL calling standards":https://www.riscosopen.org/wiki/documentation/show/HAL%20calling%20standards) is a pointer to the workspace struct.

Because the HAL calling standards have been based on APCS, this means that you can implement HAL functions directly in C - no assembler wrappers are required. It also means assembler HAL code can call into C HAL code (and vice-versa) without any special interfacing required. HAL C code can also call OS entry points without any special interfacing. However, you do need to make sure that if a piece of code (whether that's C or assembler) uses HAL workspace, that it's never called from C code which doesn't have the __global_reg definition compiled in. Otherwise the calling function may have modified SB from its original value.

Because the HAL image isn't relocatable there's no need to call __RelocCode on startup.

Because we aren't specifying -zM, const data should be correctly placed in the const data section. However, read-write or zero-init data will still be problematic; the initial read-write data will still be output as part of the image, but there won't be any runtime code for relocating references to it, and the ROM image will be read-only in memory so there's no way the HAL can write to it. So you will need to make sure that all non-const data required by the C code is placed in the HAL workspace.

h3. Using C in the kernel

Currently there isn't any C code in the kernel, so there's nothing to use as a reference. However, it's possible to theorise how C code could be introduced.

Similar to the HAL, the code should be compiled with -APCS 3/32bit/nofp/noswst (and no -zM). Unlike the HAL or modules where the writable workspace can be at any location, the kernel generally uses workspace locations that are fixed at build time. This avoids the need for using __global_reg to reserve a register for pointing to the workspace. Instead, #defines could be used to cast (hardcoded) addresses to pointers to the various workspace structs.

Calling the HAL from C code would require a small piece of glue logic to set up SB correctly - the inline assembler syntax should be adequate for this.

h3. Using C in pre-MMU code

Because the C compiler cannot generate read-only position-independent code, extreme care will be needed if using C prior to MMU activation - none of the code/data will be at the location that the binary expects it to be at. Generally, anywhere where the compiler obtains the address of some code/data will suffer from problems - e.g. getting a pointer to some const data or getting a pointer to a function.

h2. C implementations of HAL devices

Although HAL functions can be implemented in C, implementing HAL devices in C (whether in the HAL or a module) is a bit trickier.

h3. In the HAL

HAL functions are called with SB set to the HAL workspace pointer, but HAL devices aren't. For assembler HAL devices this is usually tackled by storing a pointer to the workspace in the HAL device's structure (which will always be passed in to the device calls in R0/A1). For C HAL devices, if @__global_reg(6) halworkspace_t *sb;@ is used to bring in SB, you might be tempted to simply assign 'sb' to the workspace pointer at the start of the device function. This will correctly set up sb (i.e. R9) to point to the workspace, but the compiler won't restore the old value of R9 on exit. So care will be needed to save and restore the value of the register/variable for each of your device entry points.

For device entry points which only use a1-a4 for receiving arguments it should be possible to write a small objasm macro that generates a stub assembler function to wrap the C function. Entry points which require stacked arguments will require more work - the arguments may have to be copied to below the R9 & R14 values the stub pushes onto the stack.

Of course if you're not going to call any HAL code which requires sb, you could always forgo defining and setting the workspace pointer.

h3. In modules

If -zM is in use then you'll almost certainly need to use assembler stub wrappers for all the device entry points.

For "C in assembler" modules these would load the SL value from the device structure and set FP to 0 as described in previous sections.

For regular C modules, it may be tempting to use a CMHG veneer to handle setting up the relocation offsets on the stack (with an assembler stub to first get the module workspace pointer into R12) - but this will have the downside that (in C) all your device entry points will be accepting their arguments via _kernel_swi_regs rather than via standard function arguments. It will also prevent you from having any functions which use stacked arguments - since the CMHG veneers don't expose the original stack pointer. So instead, you're likely to want to use custom assembler veneers to set up the C relocation offsets. Similar to the case of C HAL devices in the HAL, it should be straightforward to use objasm macros to generate the required code for you, e.g. like "this":https://www.riscosopen.org/viewer/view/mixed/RiscOS/Sources/HWSupport/BCMSupport/s/asm?rev=1.1;hideattic=0#l67