Raspberry Pi 2
Rick Murray (539) 13751 posts |
Yay! ;-) |
||||
John Sandgrounder (1650) 574 posts |
I now have the latest build (as of 4pm on 5th Feb) running on a Raspberry Pi 2. Disappointingly, it is about 20% slower (running a simple loop test in Basic) than the same SD Card is when plugged into a Raspberry Pi 1 I have the arm_freq set at 800 and I change nothing when moving the SD card from one board to the other. The loop times are: Raspberry Pi 1 Raspberry Pi 2 |
||||
Tank (53) 374 posts |
In case anybody wants a ROM with the GPIO module built in and working on a Pi 2, just email me (webmaster@tankstage.co.uk) and I’ll send you a copy. |
||||
Chris Hall (132) 3544 posts |
Yes it is disappointing – I found that the Pi model 2 at 900MHz was only perhaps 20% or 30% faster than the model 1 at 700MHz. Some of the hype said it was six times faster! All nonsense… |
||||
WPB (1391) 352 posts |
Tank, just wondering if there’s any mileage in Ben’s comment about the HAL exposing different details about the GPIO? |
||||
Tank (53) 374 posts |
WPB, if you mean what I think you mean .. I added the extra detection into the HAL myself, that’s why I built it as a ROM… |
||||
George T. Greenfield (154) 736 posts |
@ Chris: OTOH, if the ‘2’ takes the same amount of overclocking safely as its predecessor, 1.2GHz should be possible. It seems to me (from a very cursory look at the new board’s specs) that the main effect of the model 2 will be, paradoxically, to reduce RISC OS’s competitive advantage over Linux, inasmuch as the latter is much more happily catered-for! |
||||
Chris Hall (132) 3544 posts |
Yes, it looks that way! |
||||
John Sandgrounder (1650) 574 posts |
Does anybody have all of the CONFIG.TXT settings which give a performance increase when using a Raspberry Pi 2? Just changing arm_freq in the standard config.txt file has NO EFFECT on the speed of a simple loop in Basic – On either a Raspberry Pi 2 or, indeeed, on a Raspberry Pi 1. |
||||
Kuemmel (439) 384 posts |
I think it’s not surprising that for Risc OS that the Pi 2 isn’t that much faster, as the good old software here isn’t using mostly any of the new features. The performace statement from raspberrypi got a lot to do with the 4 cores and the NEON unit…at least now we got the latter one now on the Pi, hope more people will use it :-) There’s an interesting thread here with lots of different results on linux. Is the NEON extension working yet on Risc OS on the Pi 2 ? If so may be someone could give my benchies a run (link). Three version of the Mandelbrot, one Integer, one VFP, one NEON… |
||||
Rick Murray (539) 13751 posts |
Why does this surprise you? As far as RISC OS can see, it is a model one clocking at 900MHz, so the results are entirely consistent.
Well, I can easily believe 5x faster if you imagine the speed increase to be 20%ish spread across four cores. Maybe the 6x is aided by enhancements in the later revision of the processor, we’ve moved beyond ARM11.
That’s a bit harsh. I bet the difference is outstanding on something capable of splitting the workload across all of the capabilities of the silicon. |
||||
Rick Murray (539) 13751 posts |
From the Pi forums:
That’ll be numerous factors – higher clock speed, more cores, more memory… It seems that there may also be things that run slower but I’m going to take that with a pinch of salt as it might be effects of, for instance, code optimised for the ARM11 underperforming on the Cortex processor? |
||||
John Sandgrounder (1650) 574 posts |
@Kuemmel I would not mind if it was just not that much faster I am seeing it 20% slower in a simple Basic loop. |
||||
Jeffrey Lee (213) 6048 posts |
No, VFP/NEON isn’t working yet. The Cortex-A7 (and A15) uses a different subarchitecture which VFPSupport doesn’t support yet (the VFP subarchitecture basically describes how the unit generates exceptions). Also, it’s important to realise that the VFP coprocessor that’s in use doesn’t support using VFP in vector mode. VFP scalar operations and NEON vectors will work, but VFP vector operations will have to be handled by VFPSupport, so they’ll end up being a lot slower than on previous machines. There’s some work I can do to improve that (e.g. use VFP scalar instructions instead of a full software emulation), but the performance will still be terrible compared to just using scalars yourself or using NEON. I would not mind if it was just not that much faster Regarding general performance: It’s possible the L2 cache isn’t being enabled. Plus the L2 cache is now in the CPU rather than the GPU, so we’ll no longer get free caching on screen memory. |
||||
John Sandgrounder (1650) 574 posts |
Anything I can do to check the L2 Cache? I wouldn’t know where to start! Other than the performance hit, it is all working well and is stable. As I said earlier, the arm_freq entry in config.txt seems to be being ignored no matter which Raspberry Pi is used (A, B, B+, 2). |
||||
Dave Higton (1515) 3479 posts |
My RPi2 arrived this morning. I hope to be able to show it working next Wednesday at SROUG. |
||||
Kuemmel (439) 384 posts |
@John: You could run my Memspeed test app (Link). If you can post the results of the !MemspeedPi we could probably judge from the speed of the memory transfer if the 2nd level cache is on or off… I don’t know how big the 2nd level cache is, so if it’s for example 1 MByte the results would decline around that size of transfered chunks. If it’s after like 32 KByte and then stays the same it might be off. EDIT: Found it…citing “Also, as the Cortex complex has its own 512KB L2 cache, we no longer use the 128KB system L2 — ARM traffic goes directly to SDRAM instead”…and regarding first level: The BCM2835 has a 16K instruction, and a 16K data Level 1 caches to help with memory accesses, but the BCM2836 now has 32KB L1 data and instruction per core, and in addition, a 512K Level 2 cache used by all 4 cores, but that is for the ARM only and is no longer shared by the GPU…" and more here “An exclusive use Vicodecore L2 cache is retained on the 2836.” |
||||
David Pitt (102) 743 posts |
I will hijack this and post my results in the hope that it may help. Mostly things are faster on my Pi2, but not everything. Neither of these Pi’s are over clocked. Old Raspberry Pi B Testing RAM->RAM Transfer with ARM Rx LDM/STM instructions Size [KByte];Speed [MByte/s] 1;822 2;820 4;818 8;808 16;705 24;288 32;241 36;231 40;224 48;217 64;211 128;154 256;100 512;100 1024;88 1152;86 1280;85 1536;83 2048;81 2176;81 2304;81 2560;80 4096;78 8192;77 16384;82 32768;90 Testing RAM->VRAM Transfer with ARM Rx LDM/STM instructions Size [KByte];Speed [MByte/s] 1;813 2;827 4;813 8;800 16;718 32;241 64;210 128;156 256;101 512;100 1024;100 1152;100 2048;100 2176;100 4096;100 New Raspberry Pi 2 B Testing RAM->RAM Transfer with ARM Rx LDM/STM instructions Size [KByte];Speed [MByte/s] 1;1833 2;1759 4;1474 8;1201 16;939 24;767 32;715 36;713 40;710 48;707 64;710 128;706 256;675 512;472 1024;375 1152;371 1280;369 1536;364 2048;359 2176;358 2304;358 2560;359 4096;357 8192;352 16384;352 32768;350 Testing RAM->VRAM Transfer with ARM Rx LDM/STM instructions Size [KByte];Speed [MByte/s] 1;313 2;311 4;313 8;313 16;329 32;341 64;343 128;344 256;344 512;316 1024;234 1152;231 2048;220 2176;219 4096;218 |
||||
Kuemmel (439) 384 posts |
…looks like the 2nd level cache is there, as the decline is visible from 706/675 to 472 MByte/s at 512 KByte transfer size. …what puzzles me is the decline of the 1st level cache so rapidly before the 32 KByte are reached…is that 32 KByte 1st level cache a unified cache of data and instructions !? That behaviour I can’t find on Cortex-A9 or A15…EDIT: According to the ARM Information Center caches D/I are seperated as usual…hm, so no clue why…some config problem ? …for the screen transfers you can see that the PI 2 behaves almost equally over the size…whereas the PI had this huge benefit due to the direct connection to the GPU at low transfer size. |
||||
Steffen Huber (91) 1945 posts |
If both had the same microarchitecture, it would indeed not be surprising. But we are talking ARM11 against Cortex-A7. All things equal (caches etc.), we should see a speedup of probably 100% in the 700MHz vs 900MHz situation. Cortex-A7 is a partial dual-issue design with a similar DMIPS/MHz rate as the Cortex-A8. |
||||
John Sandgrounder (1650) 574 posts |
Better late than never. Here are my results with arm_freq set at 700 and then 900 on a Pi 1 and a Pi 2
They do seem to confirm that the that the Pi 2 is mostly Faster and that on the Pi 2, at least, the arm_freq value in config.txt is being read and taking effect. So, this leaves me with the question – "What is going on in the BASIC interpreter, such that the Pi 2 interprets the loop slower than the Pi 1 and the arm_freq value seems to be ignored. |
||||
Dave Higton (1515) 3479 posts |
I’ve got it. After a bit of a rigmarole, it’s running. I wish I knew where to find a known good config.txt file… Why do I get “This screen mode is unsuitable for displaying the desktop”? The RPi2 is running the mode in question, but issuing “wimpmode x1280 y1024 c16m f52” causes the message. Configuration→Screen looks pretty thoroughly broken. It has been broken on ordinary Raspberry Pi for a long time now. Anyway – congratulations to all those involved in porting RISC OS to the RPi2! |
||||
Kuemmel (439) 384 posts |
@John: Could you post your simple basic loop ? |
||||
Jeffrey Lee (213) 6048 posts |
Yeah, I’ve had a look through the Cortex-A7 TRM and can’t see any enable/disable control for the L2 cache. So presumably it just enables itself automatically whenever the L1 caches are enabled. FYI my Pi 2 has now arrived, hopefully I’ll find the time to get VFP/NEON working over the weekend. |
||||
Dave Higton (1515) 3479 posts |
I realised later that the monitor configuration file in PreDesk had the load monitor type line commented out, so my monitor wasn’t selected. Presumably the chosen definition was not available for the default monitor definition. It makes the error message look somewhat misleading. |