RISC OS Open: Forum: ARM6x0 / 7x0 / 7500 erratum

Sep 28, 2017 4:00am

Jon Abbott (1421) 2608 posts

Does anyone have details of ARMv3 erratum?

I’ve been testing the ADFFS JIT on a RiscPC and noticed some very odd behaviour on physical compared to emulation, a list of erratum might help track the issue down if it’s related to an errata.

The issue is proving very hard to pin down as its hard locking the machine before I can get any reliable debug info on screen. I’m not even sure if the issue is in the JIT itself or the translated code, despite several days of testing.

Sep 28, 2017 1:11pm

Jeffrey Lee (213) 6046 posts

I have a feeling there was one bug fixed with the StrongARM (the abort restart bug that prevented lazy task swapping from being stable?) that also applied to earlier architectures (except with ARMv3 RISC OS never tried to use lazy task swapping in the first place, due to the complexities of the late abort model). But I can’t find a source for this information.

I’ve now created a skeleton Notable CPU bugs wiki page, so if you do find anything useful please add it to there!

Sep 28, 2017 1:28pm

Rick Murray (539) 13425 posts

The page ought to list all pertinent CPU bugs, because otherwise tracking down errata would be a bunch of disparate sources…

Sep 28, 2017 2:26pm

Jeffrey Lee (213) 6046 posts

That’s kind of what the purpose of the page is. List all the bugs that are pertinent to application developers, and provide links to documentation for other bugs. Attempting to list all the bugs would be foolish, because you’d just end up overloading people with information, and the page would be constantly going out of date as new errata documents are released.

However I am now wondering if the traditional “Warnings on the use of ARM assembler” title might have been a better fit, since there’s plenty of other stuff that’s worth mentioning as well (CLREX usage, when to use memory barriers, SWP vs. LDREX/STREX, etc.). Maybe it’s best that this page just sticks to bugs, and we can create a “Warnings…” page (or similar) which links to all of the other docs.

Sep 29, 2017 9:20am

Jon Abbott (1421) 2608 posts

I’ve now created a skeleton Notable CPU bugs wiki page, so if you do find anything useful please add it

“CPU Erratum” would have made more sense. I think I’ve posted most of the erratum ADFFS fell foul of on here, with repros, so will track them down and add to that page at some point.

TLDR: Abort handing and 26/32bit neutral stub Modules

Back to my ARMv3 issue…Turning self-modifying code support off got Zarch “sort of” running under it, but loading a 26bit Module into the JIT RMA causes the OS to lock solid when you issue HELP COMMANDS at GOS – that’s very odd as ADFFS doesn’t play any part in this, the OS is just stepping through the RMA Modules and displaying command help text.

The JIT’d code on 26bit ARMv3 is the same code that’s used on 26bit ARMv4, which works, so it’s not the JIT’d code – unless erratum come into play, which I’d say is unlikely considering the Zarch test case. The only real differences between the JIT on ARMv3/ARMv4 are the TLB flush method and no cache flushing being required on ARMv3. To rule out write buffer causing TLB issues, I changed the STR for TLB changes to a SWP – no difference, so unlikely to be TLB related – this would only affect self-modifying code anyhow and wouldn’t cause the issues I’m seeing.

That leaves the Abort and IRQ handlers. The IRQ handler can probably be immediately ruled out as its only dealing with IOMD IRQ redirection, it does some checks to see if a codelet was interrupted and preserves its state in case of re-entry, but little else.

The Abort handler – the most likely culprit given enabling self-modifying code support fails so quickly. Remove the Page Zero protection and JIT Abort handler and Zarch runs normally, so there’s definitely an issue when handling Aborted writes – possibly related to Early/Late Abort handling, which is different between ARMv3/v4 and I know Red Squirrel doesn’t match ARMv3 behaviour in this respect.

The 26/32bit neutral Module stub lock is a completely different issue though. I need to test loading a stub Module without the JIT to be certain they’re not triggering a bug elsewhere.

Sep 29, 2017 9:34am

Mike Freestone (2564) 129 posts

“CPU Erratum” would have made more sense. I think I’ve posted most of the erratum ADFFS fell foul of on here, with repros, so will track them down and add to that page at some point.

Though if there are several ‘CPU errata’ would make more

Sep 29, 2017 10:03am

Jeffrey Lee (213) 6046 posts

The only real differences between the JIT on ARMv3/ARMv4 are the TLB flush method and no cache flushing being required on ARMv3.

One thing to be wary of is that since December last year RISC OS 5 will be using cacheable or bufferable pagetables on all machines. So the correct procedure to modify a page table entry (on any machine) is:

Write new page table entry to memory
Drain write buffer
Invalidate relevant TLB entries

(plus whatever DSB/ISB sequences are relevant on ARMv6+; my memory is a bit fuzzy)

Sep 29, 2017 10:44am

Jon Abbott (1421) 2608 posts

RISC OS 5

I’m testing on 3.71 at the minute, but will be testing different CPU/OS combinations over the weekend.

the correct procedure to modify a page table entry

I don’t believe you can drain the write buffer on ARMv3, so a SWP instead of STR to the TLB, followed by a TLB purge should have the same effect?

Sep 29, 2017 11:10am

Jeffrey Lee (213) 6046 posts

I don’t believe you can drain the write buffer on ARMv3, so a SWP instead of STR to the TLB, followed by a purge should have the same effect?

Correct. (Currently the OS uses STR to write the entry, then does a SWP to a dummy word on the stack; changing it to do the page table write via SWP could be a useful optimisation)

Sep 29, 2017 11:14am

Jeffrey Lee (213) 6046 posts

And the above works is because (in addition to bypassing the cache/write buffer) SWP does first wait for the write buffer to drain before it performs its operation.

Sep 29, 2017 1:38pm

Jon Abbott (1421) 2608 posts

Although my problem is possibly related to Early/Late Abort handling, I coded and tested the Abort handler some 5 years ago to handle any variation of Abort model across single and multiple instructions. The only other thing the JIT relies on to handle self-modifying code and Page Zero writes is the FSR to detect page permission aborts, this is more likely to be where the problem is.

I need to get all the machines setup this weekend for the London Show, so will test behaviour across all the ARMv3 CPU’s I have to hand.

Sep 29, 2017 1:44pm

Jeffrey Lee (213) 6046 posts

The only other thing the JIT relies on to handle self-modifying code and Page Zero writes is the FSR to detect page access aborts, this is more likely to be where the problem is.

Possibly there’s an issue there with nested aborts. I know there have been a couple of problems there in RISC OS, where nested aborts were being triggered and corrupting the FAR/FSR before the parent abort handler had finished using it. It’s probably best to read the relevant registers once on entry to your handler and then never again.

Sep 29, 2017 8:00pm

Jon Abbott (1421) 2608 posts

I’ve just tested an ARM610, ARM700 and ARM710. The ARM610 works, the ARM700 and ARM710 both fail, so there’s errata affecting the JIT on the ARM7 macrocell.

Documentation on them would be useful at this point, I don’t fancy my chances tracking them down any time soon. I have horrible memories of the 18 months it took me to find the errata affecting the JIT on the Iyonix!

Sep 29, 2017 9:03pm

Jeffrey Lee (213) 6046 posts

Jon: Might you be hitting this?

http://www.riscos.com/support/developers/prm/arm710warn.html#marker-296390

(although if it was “early versions” I would have hoped that it would have been fixed by the ARM710)

Sep 29, 2017 9:25pm

Jon Abbott (1421) 2608 posts

Might you be hitting this?

Unlikely I’d say, application space will be both cacheable and bufferable. Useful to know though.

Oct 2, 2017 7:53pm

Jon Abbott (1421) 2608 posts

loading a 26bit Module into the JIT RMA causes the OS to lock solid when you issue HELP COMMANDS at GOS

This turned out to be a known ARMv3 errata, namely touching a banked register after loading USER registers:

LDMIA R13, {R13, R14}^
<--- needs a NOP
ADD R13, R13, #8

I’ve now tested Zarch on ARM7500FE, which exhibits the same odd behaviour when run under the JIT and I’ve confirmed the problem only occurs when Aborts are being raised (by writes to pages with code in them). The plot thickens though, I dumped all registers into memory at every Abort for a fixed number of Aborts and then compared both main memory and the register dump against the same test run on an ARM610 – they were consistent.

Based on that I think its safe to assume there’s no USER register, flag or CPU mode corruption. I’m at a loss as to what to test next.

EDIT: I’ll add a hotkey to toggle write protection on codepages, so I can confirm its specifically related to Aborts being raised and not a knock on effect of any having been raised. If toggling pages to read/write immediately fixes the problem, I know for certain it’s specific to Abort generation/exit state (already proved Aborted instructions are handled correctly above). If it continues to behave oddly then there’s been a knock on effect, possibly to RISCOS.

The fact it only occurs on the ARM7 macrocell already implies it’s CPU related, but RISCOS was specifically patched for ARM7 in 3.60, so there’s a slim possibility it’s related to those changes.

Oct 4, 2017 10:35pm

Jon Abbott (1421) 2608 posts

I’ll add a hotkey to toggle write protection on codepages, so I can confirm its specifically related to Aborts being raised and not a knock on effect of any having been raised

Toggling Aborts on causes instant issues and toggling them off immediately resolves them. This confirms there’s no lasting effect once an Abort has occurred, so I’ve now cast my focus over what’s happening after the Abort handler exits.

I’ve tested some other games, most of which work without issue, so it could be a combination of instructions around the Aborting STR/STM or a specific sequence of events that’s triggering the problem. I’ve also tried with the processor in both Early and Late Abort Mode, both of which exhibit the same issue.

Oct 6, 2017 10:21pm

Jon Abbott (1421) 2608 posts

I’ve finally tracked down the instruction sequence that’s failing. The final STR in the following sequence is triggering the Abort handler, which is failing to proxy the write as it believes the write is unnecessary:

STR R9,[R7],#4
STMIA R7!,{R0-R5,R8}
STR R7,[R14]

SOE is:

STR R9,[R7],#4 <— executed by CPU, R7=A000 prior to execution. After execution R7=A004
STMIA R7!,{R0-R5,R8} <— executed by CPU. After execution R7=A020
STR R7,[R14] <— triggers an Abort as page at R14 is read-only. Prior to execution, the value at [R14] is A000
Abort handler is entered
Abort handler decodes the instruction and loads the value at [R14]. [R14]=A020 according to the CPU, but is in fact still A000
As [R14] is already the correct value, it exits, resuming execution without writing anything to memory
LDR R0,[R14] <— [R14]=A000

The only way I can explain this behaviour is if the CPU has pre-staged the write generated by STR R7,[R14] in the write buffer, prior to generating the Abort and then drops it from the write buffer after the Abort is resumed.

I’ve found two workarounds:

Remove the “unnecessary write check” in the Abort handler and write the value regardless
Duplicate STR R7,[R14]

The Abort handler code that handles this is as follows:

R0 = Aborting instruction
R2 = Rd
R8 = Register dump
R10 = Address being written too that triggered the Abort

 TST     R0, #1 << 22                     ;word or byte?
 LDREQ   R1, [R8, R2, LSR #12 - 2]        ;word, load value being written
 #if Unaligned_Support == 1               ; |
   BICEQ   R10, R10, #%11                 ; |    ensure address is aligned
 #endif                                   ; |
 LDREQ   R12, [R10, #0]                   ; |+   load value being overwritten
 LDRNEB  R1, [R8, R2, LSR #12 - 2]        ;byte, load value being written
 LDRNEB  R12, [R10, #0]                   ; |+   load value being overwritten
 SUBS    R14, R12, R1                     ;has the value changed? R14=0 if not
 BEQ     JIT_AbortI_STR_memory_unchanged  ;NO, don't write to memory
 TST     R0, #1 << 22                     ;word or byte?
 STREQ   R1, [R10, #0]                    ;proxy the write
 STRNEB  R1, [R10, #0]
.JIT_AbortI_STR_memory_unchanged

Oct 6, 2017 11:33pm

Jeffrey Lee (213) 6046 posts

Interesting.

How are you handling the permissions for the page at R14? (user read-only, SVC read-write, cacheable+bufferable?) If so, what happens if you make it non-bufferable?

What happens if you drain the write buffer on entry to your data abort handler?

What happens if less data is stored to R7 in the preceding instructions? e.g. replace the STR with an ADD R7,R7,#4. The write buffer is only 8 words, and in total the code is trying to store 9, so maybe the erratum is related to the CPU stalling until space is available for the final STR.

The ARM710 manual doesn’t actually say how MMU aborts interact with the write buffer, only that “If the MMU detects an access violation, it will do so before the external memory access takes place, and it will therefore inhibit the access”. So maybe there really is something sitting in the write buffer, which gets discarded later on (although if your abort handler has written a register dump, I would have expected that to be enough to flush anything else out of the buffer)

Oct 7, 2017 12:02am

Jeffrey Lee (213) 6046 posts

Some sleuthing: The ARM7 chips used in the RiscPC use an ARM710a core, and the ARM710a manual contains a warning in the “differences between ARM610 and ARM710a” section that:

Spurious addresses may be broadcast

In the case of an internally aborting access, a spurious address may be broadcast externally, but no access will be performed to this location. The memory system should ignore this address.

So it does seem pretty likely that the aborted write is held in the write buffer, but is flagged such that it won’t actually generate a write to memory (but the logic for peeking the write buffer is buggy and doesn’t ignore the NOP’d entry)

Meanwhile, the ARM7500 is based around the ARM710c, and although I can’t find any documentation for that core, the changelog for the ARM710a manual does say it was copied from the ARM710c manual. So yeah, good chance they both have the same write buffer.

Oct 7, 2017 7:03am

Jon Abbott (1421) 2608 posts

How are you handling the permissions for the page at R14?

It loads the L2PT entry and sets AP3..AP0 to %10, it doesn’t alter any other bits, so they’re as appspace is set by RISCOS.

What happens if you drain the write buffer on entry to your data abort handler?

See below

What happens if less data is stored to R7 in the preceding instructions? e.g. replace the STR with an ADD R7,R7,#4

Good question. I did place an ADD prior to the instructions to confirm R7 was correct after each instruction – at the time I assumed the first two instructions were also triggering Aborts and R7 wasn’t being updated by the Abort handler, however I soon realised they weren’t aborting when I watched the Abort count after each instruction. R7 was correct after each instruction, so I quickly ruled out registers being at fault.

The write buffer is only 8 words, and in total the code is trying to store 9, so maybe the erratum is related to the CPU stalling until space is available for the final STR.

Again, good question. I’ll confirm once I have a Repro outside of Zarch – testing inside a game isn’t ideal.

if your abort handler has written a register dump, I would have expected that to be enough to flush anything else out of the buffer

Indeed, in which case the likely culprit is the cache not the WB. It could be a combination of both, or even a discrepancy between the cache and WB.

I’m probably not going to spend a lot more time investigating this weekend, as I need to get all the machines ready and tested for the London Show. I needed to locate the cause so I knew for certain it wasn’t a bug in my code which might affect the JIT across all CPU. As soon as I have all the show machines updated and soak testing I’ll attempt to create a Repro to confirm if the issue occurs with all Aborting STR, or a particular sequence of instructions leading to an Aborting STR.

I did attempt to flush the write buffer (via SWP – is there another method?) both in the Abort handler and in the instruction prior to the aborting STR, it lessened the fail rate, but didn’t cure it. Adding a lot of NOP’s prior to the Aborting instruction also lessened the fail rate, but didn’t cure it, as did placing an LDM R7,{R0} prior to it. Based on this, the complete SOE may be a lot longer than I’ve detailed above, for example my STR/STM validation app does not cause the issue.

Oct 9, 2017 8:28pm

Jon Abbott (1421) 2608 posts

I’ve had a look at creating a Repro today, simply copying the instructions in question does not reproduce the issue, so there’s a much longer SOE involved.

Results from tests suggested:

Bufferability makes no difference
Cacheable off resolves the issue
Flushing the IDC in the Abort handler, prior to the “unnecessary write check” resolves the issue
Draining the write buffer prior to the “unnecessary write check” does not resolve the issue
The instruction count seen before the issue occurs is random
Adding additional writes prior to the failing STR does not resolve the issue
An LDR to the same cache line immediately prior to the failing STR does not resolve the issue
An LDR/STR pair to a different address in the same cache line immediately prior to the failing STR resolves the issue. This will however trigger another Abort so possibly not a valid test
Switching to the Aborting CPU mode and then back into ABT32 prior to the Abort handler’s “unnecessary write check” does not resolve the issue
Two SWP’s to another word in the same cache line inside the Abort handler does not resolve the issue
Where the failing write occurs in the cache line makes no difference
Forcing the address to update by adding two SWP’s to the address being checked resolves the issue:

 TST     R0, #1 << 22                     ;word or byte?
 SWPEQ   R12, R12, [R10]                  ;force the cache line to update
 SWPEQ   R12, R12, [R10]
 SWPNEB  R12, R12, [R10]
 SWPNEB  R12, R12, [R10]
 LDREQ   R1, [R8, R2, LSR #12 - 2]        ;word, load value being written
 #if Unaligned_Support == 1               ; |
   BICEQ   R10, R10, #%11                 ; |    ensure address is aligned
 #endif                                   ; |
 LDREQ   R12, [R10, #0]                   ; |+   load value being overwritten
 LDRNEB  R1, [R8, R2, LSR #12 - 2]        ;byte, load value being written
 LDRNEB  R12, [R10, #0]                   ; |+   load value being overwritten
 SUBS    R14, R12, R1                     ;has the value changed? R14=0 if not
 BEQ     JIT_AbortI_STR_memory_unchanged  ;NO, don't write to memory
 TST     R0, #1 << 22                     ;word or byte?
 STREQ   R1, [R10, #0]                    ;proxy the write
 STRNEB  R1, [R10, #0]
.JIT_AbortI_STR_memory_unchanged

Based on this, the cache is looking suspect following an STR that triggers a Page Access Abort. The CPU is sometimes performing the write to the cache line ignoring the page access level, what I can’t fathom out is what’s triggering the write to get dropped from the cache and is it done by marking the cache line as invalid?

Oct 11, 2017 10:53pm

Jeffrey Lee (213) 6046 posts

I spotted that there’s a slightly different ARM710a doc here (DDI 0022D vs. DDI 0033D). And when looking at the “Differences between ARM610 and ARM710a” section, I spotted something that I missed earlier, hidden under the “enlarged cache” heading (even though it’s in both versions of the doc):

When an internal Abort occurs lines may be purged from Cache to remove invalid data.

Which sounds a lot like them saying “oh, by the way, you might need to flush the cache when handling data aborts to cacheable areas”.

Oct 11, 2017 11:13pm

Jeffrey Lee (213) 6046 posts

Also, what’s the CP15 id register value for the CPU(s) you’re seeing the problem on? The OS recognises a few different models of ARM7xx. Judging by the ID codes and the dates on various docs, the ARM710a is a newer model than the plain ARM710. So maybe it’s only some models where you need to flush the cache in the abort handler (since that sentence from the 710a doc could be interpreted as “older ARM710’s were buggy, this new one will flush the cache automatically” or “you must flush the cache”)

Oct 12, 2017 4:40am

Jon Abbott (1421) 2608 posts

When an internal Abort occurs lines may be purged from Cache to remove invalid data.

I did read that section before I posted, but as it’s not specific in detail, I assumed the CPU would purge lines updated from pipelined instructions, or undo partially complete STM. The timing of the purge is certainly a factor here as it doesn’t occur until at least a few dozen instructions after the abort and possibly isn’t even related to instruction numbers or timings, I’ve certainly ruled out mode changes as the trigger.

The erratum in this case is that the CPU might ignore page access permission and speculatively perform an STR to a read-only page, resulting in cache inconsistency. Although I have a workaround to that, I’d still like to figure out the two burning questions:

Under what condition does the CPU ignore page access permissions?
What triggers the purge of the invalid write from the cache?

I expect this issue is documented, ARM7 errata docs just aren’t publicly available.

Back to your original point, flushing the IDC at an Abort is probably the correct course of action; for performance however, it’s probably best to simply assume the cache line is inconsistent and act accordingly.

what’s the CP15 id register value for the CPU you’re seeing the problem on?

CPU’s tested:

ARM700 – 41007003
ARM710 – 41007100
ARM7500FE – 41077100

ARM6x0 / 7x0 / 7500 erratum

Reply

Search forums

Social

ROOL Store

Donate! Why?

RISC OS IPR

Description

Voices

Options