ARM6x0 / 7x0 / 7500 erratum
Pages: 1 2
Jon Abbott (1421) 2608 posts |
Does anyone have details of ARMv3 erratum? I’ve been testing the ADFFS JIT on a RiscPC and noticed some very odd behaviour on physical compared to emulation, a list of erratum might help track the issue down if it’s related to an errata. The issue is proving very hard to pin down as its hard locking the machine before I can get any reliable debug info on screen. I’m not even sure if the issue is in the JIT itself or the translated code, despite several days of testing. |
Jeffrey Lee (213) 6046 posts |
I have a feeling there was one bug fixed with the StrongARM (the abort restart bug that prevented lazy task swapping from being stable?) that also applied to earlier architectures (except with ARMv3 RISC OS never tried to use lazy task swapping in the first place, due to the complexities of the late abort model). But I can’t find a source for this information. I’ve now created a skeleton Notable CPU bugs wiki page, so if you do find anything useful please add it to there! |
Rick Murray (539) 13425 posts |
The page ought to list all pertinent CPU bugs, because otherwise tracking down errata would be a bunch of disparate sources… |
Jeffrey Lee (213) 6046 posts |
That’s kind of what the purpose of the page is. List all the bugs that are pertinent to application developers, and provide links to documentation for other bugs. Attempting to list all the bugs would be foolish, because you’d just end up overloading people with information, and the page would be constantly going out of date as new errata documents are released. However I am now wondering if the traditional “Warnings on the use of ARM assembler” title might have been a better fit, since there’s plenty of other stuff that’s worth mentioning as well (CLREX usage, when to use memory barriers, SWP vs. LDREX/STREX, etc.). Maybe it’s best that this page just sticks to bugs, and we can create a “Warnings…” page (or similar) which links to all of the other docs. |
Jon Abbott (1421) 2608 posts |
“CPU Erratum” would have made more sense. I think I’ve posted most of the erratum ADFFS fell foul of on here, with repros, so will track them down and add to that page at some point. TLDR: Abort handing and 26/32bit neutral stub Modules Back to my ARMv3 issue…Turning self-modifying code support off got Zarch “sort of” running under it, but loading a 26bit Module into the JIT RMA causes the OS to lock solid when you issue HELP COMMANDS at GOS – that’s very odd as ADFFS doesn’t play any part in this, the OS is just stepping through the RMA Modules and displaying command help text. The JIT’d code on 26bit ARMv3 is the same code that’s used on 26bit ARMv4, which works, so it’s not the JIT’d code – unless erratum come into play, which I’d say is unlikely considering the Zarch test case. The only real differences between the JIT on ARMv3/ARMv4 are the TLB flush method and no cache flushing being required on ARMv3. To rule out write buffer causing TLB issues, I changed the STR for TLB changes to a SWP – no difference, so unlikely to be TLB related – this would only affect self-modifying code anyhow and wouldn’t cause the issues I’m seeing. That leaves the Abort and IRQ handlers. The IRQ handler can probably be immediately ruled out as its only dealing with IOMD IRQ redirection, it does some checks to see if a codelet was interrupted and preserves its state in case of re-entry, but little else. The Abort handler – the most likely culprit given enabling self-modifying code support fails so quickly. Remove the Page Zero protection and JIT Abort handler and Zarch runs normally, so there’s definitely an issue when handling Aborted writes – possibly related to Early/Late Abort handling, which is different between ARMv3/v4 and I know Red Squirrel doesn’t match ARMv3 behaviour in this respect. The 26/32bit neutral Module stub lock is a completely different issue though. I need to test loading a stub Module without the JIT to be certain they’re not triggering a bug elsewhere. |
Mike Freestone (2564) 129 posts |
Though if there are several ‘CPU errata’ would make more |
Jeffrey Lee (213) 6046 posts |
One thing to be wary of is that since December last year RISC OS 5 will be using cacheable or bufferable pagetables on all machines. So the correct procedure to modify a page table entry (on any machine) is:
(plus whatever DSB/ISB sequences are relevant on ARMv6+; my memory is a bit fuzzy) |
Jon Abbott (1421) 2608 posts |
I’m testing on 3.71 at the minute, but will be testing different CPU/OS combinations over the weekend.
I don’t believe you can drain the write buffer on ARMv3, so a SWP instead of STR to the TLB, followed by a TLB purge should have the same effect? |
Jeffrey Lee (213) 6046 posts |
Correct. (Currently the OS uses STR to write the entry, then does a SWP to a dummy word on the stack; changing it to do the page table write via SWP could be a useful optimisation) |
Jeffrey Lee (213) 6046 posts |
And the above works is because (in addition to bypassing the cache/write buffer) SWP does first wait for the write buffer to drain before it performs its operation. |
Jon Abbott (1421) 2608 posts |
Although my problem is possibly related to Early/Late Abort handling, I coded and tested the Abort handler some 5 years ago to handle any variation of Abort model across single and multiple instructions. The only other thing the JIT relies on to handle self-modifying code and Page Zero writes is the FSR to detect page permission aborts, this is more likely to be where the problem is. I need to get all the machines setup this weekend for the London Show, so will test behaviour across all the ARMv3 CPU’s I have to hand. |
Jeffrey Lee (213) 6046 posts |
Possibly there’s an issue there with nested aborts. I know there have been a couple of problems there in RISC OS, where nested aborts were being triggered and corrupting the FAR/FSR before the parent abort handler had finished using it. It’s probably best to read the relevant registers once on entry to your handler and then never again. |
Jon Abbott (1421) 2608 posts |
I’ve just tested an ARM610, ARM700 and ARM710. The ARM610 works, the ARM700 and ARM710 both fail, so there’s errata affecting the JIT on the ARM7 macrocell. Documentation on them would be useful at this point, I don’t fancy my chances tracking them down any time soon. I have horrible memories of the 18 months it took me to find the errata affecting the JIT on the Iyonix! |
Jeffrey Lee (213) 6046 posts |
Jon: Might you be hitting this? http://www.riscos.com/support/developers/prm/arm710warn.html#marker-296390 (although if it was “early versions” I would have hoped that it would have been fixed by the ARM710) |
Jon Abbott (1421) 2608 posts |
Unlikely I’d say, application space will be both cacheable and bufferable. Useful to know though. |
Jon Abbott (1421) 2608 posts |
This turned out to be a known ARMv3 errata, namely touching a banked register after loading USER registers:
I’ve now tested Zarch on ARM7500FE, which exhibits the same odd behaviour when run under the JIT and I’ve confirmed the problem only occurs when Aborts are being raised (by writes to pages with code in them). The plot thickens though, I dumped all registers into memory at every Abort for a fixed number of Aborts and then compared both main memory and the register dump against the same test run on an ARM610 – they were consistent. Based on that I think its safe to assume there’s no USER register, flag or CPU mode corruption. I’m at a loss as to what to test next. EDIT: I’ll add a hotkey to toggle write protection on codepages, so I can confirm its specifically related to Aborts being raised and not a knock on effect of any having been raised. If toggling pages to read/write immediately fixes the problem, I know for certain it’s specific to Abort generation/exit state (already proved Aborted instructions are handled correctly above). If it continues to behave oddly then there’s been a knock on effect, possibly to RISCOS. The fact it only occurs on the ARM7 macrocell already implies it’s CPU related, but RISCOS was specifically patched for ARM7 in 3.60, so there’s a slim possibility it’s related to those changes. |
Jon Abbott (1421) 2608 posts |
Toggling Aborts on causes instant issues and toggling them off immediately resolves them. This confirms there’s no lasting effect once an Abort has occurred, so I’ve now cast my focus over what’s happening after the Abort handler exits. I’ve tested some other games, most of which work without issue, so it could be a combination of instructions around the Aborting STR/STM or a specific sequence of events that’s triggering the problem. I’ve also tried with the processor in both Early and Late Abort Mode, both of which exhibit the same issue. |
Jon Abbott (1421) 2608 posts |
I’ve finally tracked down the instruction sequence that’s failing. The final STR in the following sequence is triggering the Abort handler, which is failing to proxy the write as it believes the write is unnecessary:
SOE is:
The only way I can explain this behaviour is if the CPU has pre-staged the write generated by STR R7,[R14] in the write buffer, prior to generating the Abort and then drops it from the write buffer after the Abort is resumed. I’ve found two workarounds:
The Abort handler code that handles this is as follows: R0 = Aborting instruction
|
Jeffrey Lee (213) 6046 posts |
Interesting. How are you handling the permissions for the page at R14? (user read-only, SVC read-write, cacheable+bufferable?) If so, what happens if you make it non-bufferable? What happens if you drain the write buffer on entry to your data abort handler? What happens if less data is stored to R7 in the preceding instructions? e.g. replace the STR with an ADD R7,R7,#4. The write buffer is only 8 words, and in total the code is trying to store 9, so maybe the erratum is related to the CPU stalling until space is available for the final STR. The ARM710 manual doesn’t actually say how MMU aborts interact with the write buffer, only that “If the MMU detects an access violation, it will do so before the external memory access takes place, and it will therefore inhibit the access”. So maybe there really is something sitting in the write buffer, which gets discarded later on (although if your abort handler has written a register dump, I would have expected that to be enough to flush anything else out of the buffer) |
Jeffrey Lee (213) 6046 posts |
Some sleuthing: The ARM7 chips used in the RiscPC use an ARM710a core, and the ARM710a manual contains a warning in the “differences between ARM610 and ARM710a” section that: Spurious addresses may be broadcast So it does seem pretty likely that the aborted write is held in the write buffer, but is flagged such that it won’t actually generate a write to memory (but the logic for peeking the write buffer is buggy and doesn’t ignore the NOP’d entry) Meanwhile, the ARM7500 is based around the ARM710c, and although I can’t find any documentation for that core, the changelog for the ARM710a manual does say it was copied from the ARM710c manual. So yeah, good chance they both have the same write buffer. |
Jon Abbott (1421) 2608 posts |
It loads the L2PT entry and sets AP3..AP0 to %10, it doesn’t alter any other bits, so they’re as appspace is set by RISCOS.
See below
Good question. I did place an ADD prior to the instructions to confirm R7 was correct after each instruction – at the time I assumed the first two instructions were also triggering Aborts and R7 wasn’t being updated by the Abort handler, however I soon realised they weren’t aborting when I watched the Abort count after each instruction. R7 was correct after each instruction, so I quickly ruled out registers being at fault.
Again, good question. I’ll confirm once I have a Repro outside of Zarch – testing inside a game isn’t ideal.
Indeed, in which case the likely culprit is the cache not the WB. It could be a combination of both, or even a discrepancy between the cache and WB. I’m probably not going to spend a lot more time investigating this weekend, as I need to get all the machines ready and tested for the London Show. I needed to locate the cause so I knew for certain it wasn’t a bug in my code which might affect the JIT across all CPU. As soon as I have all the show machines updated and soak testing I’ll attempt to create a Repro to confirm if the issue occurs with all Aborting STR, or a particular sequence of instructions leading to an Aborting STR. I did attempt to flush the write buffer (via SWP – is there another method?) both in the Abort handler and in the instruction prior to the aborting STR, it lessened the fail rate, but didn’t cure it. Adding a lot of NOP’s prior to the Aborting instruction also lessened the fail rate, but didn’t cure it, as did placing an LDM R7,{R0} prior to it. Based on this, the complete SOE may be a lot longer than I’ve detailed above, for example my STR/STM validation app does not cause the issue. |
Jon Abbott (1421) 2608 posts |
I’ve had a look at creating a Repro today, simply copying the instructions in question does not reproduce the issue, so there’s a much longer SOE involved. Results from tests suggested:
Based on this, the cache is looking suspect following an STR that triggers a Page Access Abort. The CPU is sometimes performing the write to the cache line ignoring the page access level, what I can’t fathom out is what’s triggering the write to get dropped from the cache and is it done by marking the cache line as invalid? |
Jeffrey Lee (213) 6046 posts |
I spotted that there’s a slightly different ARM710a doc here (DDI 0022D vs. DDI 0033D). And when looking at the “Differences between ARM610 and ARM710a” section, I spotted something that I missed earlier, hidden under the “enlarged cache” heading (even though it’s in both versions of the doc):
Which sounds a lot like them saying “oh, by the way, you might need to flush the cache when handling data aborts to cacheable areas”. |
Jeffrey Lee (213) 6046 posts |
Also, what’s the CP15 id register value for the CPU(s) you’re seeing the problem on? The OS recognises a few different models of ARM7xx. Judging by the ID codes and the dates on various docs, the ARM710a is a newer model than the plain ARM710. So maybe it’s only some models where you need to flush the cache in the abort handler (since that sentence from the 710a doc could be interpreted as “older ARM710’s were buggy, this new one will flush the cache automatically” or “you must flush the cache”) |
Jon Abbott (1421) 2608 posts |
I did read that section before I posted, but as it’s not specific in detail, I assumed the CPU would purge lines updated from pipelined instructions, or undo partially complete STM. The timing of the purge is certainly a factor here as it doesn’t occur until at least a few dozen instructions after the abort and possibly isn’t even related to instruction numbers or timings, I’ve certainly ruled out mode changes as the trigger. The erratum in this case is that the CPU might ignore page access permission and speculatively perform an STR to a read-only page, resulting in cache inconsistency. Although I have a workaround to that, I’d still like to figure out the two burning questions:
I expect this issue is documented, ARM7 errata docs just aren’t publicly available. Back to your original point, flushing the IDC at an Abort is probably the correct course of action; for performance however, it’s probably best to simply assume the cache line is inconsistent and act accordingly.
CPU’s tested:
|
Pages: 1 2