# ARM NEON in ObJAsm

Pages: **1 2 **

Paolo Fabio Zaino (28) 863 posts |
I totally forgot about this thread lol, what the heck has happened here? XD @ Matthew As soon as I have a good benchmark running on RISC OS I’ll share the numbers I get on all my boards. @ Rick yes the original topic is about improving memory (copy, set and move) performance in C on RISC OS. |

Paolo Fabio Zaino (28) 863 posts |
@ Stuart Gavin link to Zhal here: http://www.wra1th.plus.com/soft.html Scroll down. And great work Gavin! :) |

Paolo Fabio Zaino (28) 863 posts |
For the new topic about math operations using NEON: NEON can benefits such type of operations only when a piece of software does a lot of similar computations that could be parallelised. In other words if you just need to do a single sqrt for a single value, then there will be little to no benefit in using a NEON based implementation of it. A reason why I am working on another project using NEON for math functions, is to improve Genann performance where each neuron performs an FP calculation and (on backfeed neural networks) this can happen multiple times too (and for the same neuron) and given that a single neural network layer may have many neurons, and a neural network is composed by multiple layers of neurons you can imagine. There are many applications for NEON in AI: https://www.youtube.com/watch?v=lUmjnCdGtGE The video above is a presentation of a Tech Talk that should happen on the 21st of September 2021, and it’s specifically for ARM. Some of the techniques that will be explained in that Tech Talk I am implementing on RISC OS and hopefully I’ll have enough time to get to demo-able state at some point. All of this has very little to help BBC BASIC type of uses and/or retro-coding type of approaches on RISC OS. So, maybe not of interest here, in which case sorry for the noise. |

GavinWraith (26) 1347 posts |
The big integer type in !Zahl is realized by the IMath library (copyright @ 2002-2009 Michael J. Fromberger), an ISO C arbitrary precision integer and rational library that will tell you want you want to know about its representation of numbers. Nick Craig-Wood originally wrote his BigNum (relocatable) module in ARM assembler, offering SWI calls so that any kind of RISC OS application could use it. When he left RISC OS-land I offered to maintain it for him. I came across some bugs. For example the Newton-Raphson algorithm for square root does not work in integer arithmetic when you start with a number one less than a power of two. After a while I found the ARM assembler code too difficult to maintain, and, finding C easier to work with, I produced !Zahl as a partial replacement. If I may mount my hobby horse at this point, there are indefinitely many ways of representing numbers in a computer, but it seems that most people are fixated on the standard floating point representations and are unaware of alternatives. Floating point numbers suit engineering very well (I am not being rude here), but they do have their own drawbacks. Their principal virtue is to break the bond between the value of a number and the number of bytes required to store its representation. Their principal vice is that the ordinary laws of arithmetic, e.g that |

Steve Drain (222) 1532 posts |
Three years ago there was a very extended topic in the Raspberry Pi forums about the suitability of various un-enhanced To be part of it, I set about finding a solution in ARM BASIC, using array operations with some modifications of my own invention. I was able to get my solution to finish on my mini.m in about a day and a half! Towards the end, the experts, using various compiled languages, were shaving fractions off substantially less than a second on some quite fast machines. I then found the Numbers module that Gavin was maintaining. Using that, still in BASIC, I was able to get a result in only a few minutes. While working out how to use Numbers I wrote a StrongHelp manual for it, so if anyone is interested I could post a link. There are a couple of other things I learnt. First is that the algorithms used in Numbers have dated and there are significantly faster ones around now. The second is that there are C libraries that incorporate all the speed tricks for the fibonacci task and can produce a result in a single line. ;-) What is needed is a job like the RexEx module, which takes a C library and incorporates it into a RISC OS module. That is probably Zahl, of course, but I have not looked at it yet.
Edit: it was fibonacci number not factorial – oops. |

GavinWraith (26) 1347 posts |
I wrote an article The contradiction between time and space (i.e. storage) is an interesting theme in computer science. Lua makes this explicit. Functions ( |

Steve Pampling (1551) 7111 posts |
That set me thinking back to school days. Our schoolboy minds quickly realised we could use a tetrahedron of values in the same way for (x+y+z) |

GavinWraith (26) 1347 posts |
I had a long email correspondence with Richard Hallas, when he was editor of Foundation Risc User, about how people differ in their imaginations, particularly how they see numbers.
Good for your schoolboy minds. I think you were unusually quick. |

Steve Drain (222) 1532 posts |
Both the discussion and my reading around it introduced me to several of those. My own attempts used only slightly enhanced Karatsuba method.
Now there’s a concept that was new to me and cropped up a lot in the fibonacci topic. Nothing like that for BASIC, although I |

Steve Drain (222) 1532 posts |
I now realise that I |

Steve Drain (222) 1532 posts |
That reminds me of, first Mathematician’s Delight, but especially of the later Prelude to Mathematics, that introduced me to the idea of looking for such patterns. |

GavinWraith (26) 1347 posts |
Both excellent books. Alas, the IMath library, which Zahl uses, is in C and would be hard to integrate with BASIC. I felt a bit guilty that this aspect of Craig-Wood’s BigNumber module I was unable to keep alive. I think he went on to look at some other arithmetics that are possibly interesting for RISC OS. One was to choose the biggest prime, call it p (surprise?), smaller than 2^64. Its residues can be represented within 8 bytes. Modular arithmetic modulo p could be useful, and there are some clever ARM assembler tricks to make it fast. Another useful thing to consider is the Chinese Remainder Theorem. That tells you that modular arithmetic modulo a product of distinct primes can be done by working modulo each prime separately. Ideally you delegate a separate core for each prime. Something for the future, maybe? |

Pages: **1 2 **

### Reply

To post replies, please first log in.