RISC OS Open
Safeguarding the past, present and future of RISC OS for everyone
ROOL
Home | News | Downloads | Bugs | Bounties | Forum | Documents | Photos | Contact us
Account
Forums → Code review →

Functions with more than one entry

Subscribe to Functions with more than one entry 46 posts, 14 voices

Posts per page:

Pages: 1 2

 
Jan 30, 2022 6:34pm
Avatar Steve Drain (222) 1620 posts

Since BASIC has no “address of” operator for getting a pointer to a string …

True, but it is trivial to do that if you want to:

DIM P% 19
[OPT 2
._ADDRESS
LDR     r0,[r9],#8;lval of address%
LDR     r1,[r9],#8;lval of string$
LDR     r1,[r1];address of string$
STR     r1,[r0];address => address%
MOV     pc,lr
]
string$="Hello world!"+CHR$13
CALL_ADDRESS,string$,address%
PRINT $address%

Use LEFT$(string$) if you want the original without the added terminator, and address% is only valid if you do not alter string$.

 
Jan 30, 2022 7:42pm
Avatar Rick Murray (539) 13048 posts

It looks as though R3=%11 is applied, whatever. That is the ‘artefact’.

Oh, I noticed that (which is why it’s all zero, I tried different values) but since I was using StrongHelp and only doing it for the sake of doing it, I couldn’t be bothered to dig up the PRM to see if it agreed with SH (and thus not with reality).

At any rate, it certainly seems like Territory has impressed me with even more brokenness. ;)

 
Jan 30, 2022 7:50pm
Avatar Rick Murray (539) 13048 posts

It looks as though R3=%11 is applied, whatever.

Uh… Yeah…

https://gitlab.riscosopen.org/RiscOS/Sources/Internat/Territory/TerritoryModule/-/blob/master/s/Entries#L173

It does a quick bit of shifting to nuke the upper sixteen bits of R3, which is weird given that only bits 0-2 are defined as doing anything.

Then, get this, we OR in the two flags. Yup, ORR r3,r3,#Collate_IgnoreCase :OR: Collate_IgnoreAccent just to completely disregard what the user might have specified.

TheF…?

Edit: Reported

 
Jan 30, 2022 7:56pm
Avatar Steve Drain (222) 1620 posts

Thanks for that, Rick, it saves me the bother. ;-)

 
Jan 30, 2022 10:35pm
Avatar Sprow (202) 1090 posts

Then, get this, we OR in the two flags. Yup, ORR r3,r3,#Collate_IgnoreCase :OR: Collate_IgnoreAccent just to completely disregard what the user might have specified.

Take a moment to read the whole function before jumping to conclusions, as collation is a multi-pass affair. Scroll down as far down as line 378, for example, where the user’s flags are restored.

 
Jan 31, 2022 2:40pm
Avatar Steve Drain (222) 1620 posts

E pur si muove

I still cannot get Collate to work as I know I have before, just with a simple SYS call.

Perhaps it is me. ;-]

 
Jan 31, 2022 5:38pm
Avatar Rick Murray (539) 13048 posts

before jumping to conclusions

Okay. I won’t pretend to understand exactly what is going on inside. Instead, I’ll look at it from the outside.

Here’s a modified version of the above program (fixed the name buffer N% being about forty bytes too small ;) ).

 10 ON ERROR PRINT REPORT$+" at "+STR$(ERL) : END
 20 
 30 DIM code% 127
 40 DIM A$(10)
 50 DIM M% 63
 60 DIM N% 255
 70 
 80 FOR flags% = 0 TO 3
 90   REM Let's pick some names
100   A$(0) = "april"    : REM Normal   (note lower case)
110   A$(1) = "Emily"    : REM Normal
120   A$(2) = "Æronwen"  : REM Welsh, 'ae' lig
130   A$(3) = "Jessica"  : REM Normal
140   A$(4) = "Ásfríðr"  : REM Norse
150   A$(5) = "Clíodhna" : REM Irish
160   A$(6) = "aubrey"   : REM Germanic (note lower case)
170   A$(7) = "clemence" : REM French   (note lower case)
180   A$(8) = "Clotilde" : REM Germanic
190   A$(9) = "Angela"   : REM Normal
200 
210   REM Since BASIC has no "address of" operator for
220   REM getting a pointer to a string (not to mention
230   REM it'll be in a BASIC format), we'll poke each
240   REM into memory, and set up a list of pointers.
250   FOR l% = 0 TO 9
260     REM Assume we can fit them into 16 bytes ;)
270     $(N%+(l% * 16)) = A$(l%) + CHR$(0)
280     M%!(l%<<2) = N% + (l% * 16)
290   NEXT
300   M%!(10 << 2) = 0 : REM End marker
310 
320   REM Now we want some code to munge HeapSort to Territory_Collate
330   P% = code%
340   [ OPT 2
350     ; we can trash R0-R3
360     ; objects are in R0 and R1
370     ; we must return flag LT if R0<R1
380 
390     STR   R14, [R13, #-4]!  ; we'll be called from the SWI, so be paranoid
400 
410     MOV   R2, R1            ; string 2
420     MOV   R1, R0            ; string 1
430     MVN   R0, #0            ; -1 for current territory
440     MOV   R3, flags%        ; cardinals, ignore case, ignore accents
450     SWI   "Territory_Collate"
460 
470     ; Result is <0 if S1<S2
480     ;            0 if S1=S2
490     ;           >0 if S1>S2
500     ; So compare with zero to ensure LT or GE conditions are met
510     CMP   R0, #0
520 
530     ; Done
540     LDR   PC, [R13], #4
550   ]
560 
570   SYS "OS_HeapSort32", 10, M%, code%,,,,, 0
580 
590   REM Show what we have now
600   PRINT "After territory-collate sorting with flags = "+STR$(flags%)
610   l% = 0
620   WHILE (M%!l% <> 0 )
630     PRINT RIGHT$("00000000"+STR$~(M%!l%), 8)+" : ";
640     SYS "OS_Write0", M%!l%
650     PRINT
660     l% += 4
670   ENDWHILE
680   PRINT
690 NEXT
700 
710 END
720 

What we’re doing is running through the entire process four times, building the data tables with identical data to start with, and rebuilding the assembler code to pass flags (0, 1, 2, and finally 3). This corresponds to obey case/obey accents, ignore case/obey accents, obey case/ignore accents, and finally ignore both case and accents.

And here is the result.

*_names2
After territory-collate sorting with flags = 0
0000987C : Angela
000097EC : april
0000982C : Ásfríðr
0000984C : aubrey
0000980C : Æronwen
0000985C : clemence
0000983C : Clíodhna
0000986C : Clotilde
000097FC : Emily
0000981C : Jessica

After territory-collate sorting with flags = 1
0000987C : Angela
000097EC : april
0000982C : Ásfríðr
0000984C : aubrey
0000980C : Æronwen
0000985C : clemence
0000983C : Clíodhna
0000986C : Clotilde
000097FC : Emily
0000981C : Jessica

After territory-collate sorting with flags = 2
0000987C : Angela
000097EC : april
0000982C : Ásfríðr
0000984C : aubrey
0000980C : Æronwen
0000985C : clemence
0000983C : Clíodhna
0000986C : Clotilde
000097FC : Emily
0000981C : Jessica

After territory-collate sorting with flags = 3
0000987C : Angela
000097EC : april
0000982C : Ásfríðr
0000984C : aubrey
0000980C : Æronwen
0000985C : clemence
0000983C : Clíodhna
0000986C : Clotilde
000097FC : Emily
0000981C : Jessica
*

I built a version with OPT 3 to verify that it was passing different flags to Territory_Collate.

It was. So why is the result identical for each possible flags option?

Surely whether, or not, you’re paying attention to case and/or accents would cause the order to change when some names begin with a lower case letter, and one begins with an ‘Á’ (plus the ‘Æ’ too)?

 
Jan 31, 2022 5:53pm
Avatar André Timmermans (100) 581 posts

I am surprised that it works with territory number 0, UK is 1 and the other countries have a higher number.

 
Jan 31, 2022 6:01pm
Avatar Rick Murray (539) 13048 posts

MVN, not MOV. It’s setting -1 to mean "current Territory" which for most of us will probably be UK. ;-)

 
Jan 31, 2022 6:12pm
Avatar André Timmermans (100) 581 posts

Ah, I missed the MVN, I am too used to ObjAsm automatically transforming MOV of negatives into MVN instructions.

 
Feb 1, 2022 12:30pm
Avatar Martin Avison (27) 1375 posts

I have updated the bug report with some details of further confusion if flag bit 2 is used, which according to the StrongHelp manual is ‘interpret cardinals (5.22+)’, which should result in File9 < File10.

 
Feb 1, 2022 1:13pm
Avatar Rick Murray (539) 13048 posts

Thanks. As I was just looking at names, I didn’t do anything with bit 2.

But… What? That looks both backwards and the cutoff (0-2 and 3-7) doesn’t make sense.

Edit: only had a cursory look at the code (on break at work), but nothing looks like it is particularly wrong…other than being a big pile of assembler. ;)

 
Feb 1, 2022 2:40pm
Avatar Fred Graute (114) 608 posts

So why is the result identical for each possible flags option?

The code compares the strings ignoring case and accents. If this produces a match then, and only then, are the flags taken into consideration.

So comparing April with angela will always give the same result regardless of flags passed in. The result comes from the comparison of the first pair of unequal characters.

Comparing April with april will give different results depending on case being ignored (r0 == 0) or not (r0 <> 0).

It seems this SWI behaves differently from what many, including me, had inferred from its description.

 
Feb 1, 2022 4:57pm
Avatar Martin Avison (27) 1375 posts

I have just used a small program to test just Territory_Collate, and it seems to me that the case flag and cardinal flags are working ok. The accent flag is working, but I have no idea which should be high!

So I am now confused, and wondering what sequence the UK territory accented characters are really in – and are they treated as accented characters?

My test program is:

PROCcollate("abcdef","abcDef")
PROCcollate("accent","accënt")
PROCcollate("File10","File9")
END
:
DEF PROCcollate(s1$,s2$)
  PRINT 'TAB(7)"Flag" TAB(20)s1$;":";s2$
  FOR F% = 0 TO 7  
    SYS "Territory_Collate",-1,s1$,s2$,F% TO r0
    PRINT F% TAB(17)SGN(r0)  
  NEXT
ENDPROC

 
Feb 1, 2022 5:11pm
Avatar Rick Murray (539) 13048 posts

Sorry for the terseness. I had written a longer reply, but managed to stiff the machine. So, take two…

Comparing April with april will give different results depending on case being ignored (r0 == 0) or not (r0 <> 0).

Doesn’t seem to be the case. I replaced Emily with april (lower case), and it sorted the names in the order Angela, April, april for flag settings 0-3. Given that april was before April in the input data, one might have expected to see it sorted first if the case was not being considered.

It seems this SWI behaves differently from what many, including me, had inferred from its description.

Certainly. We’ve had to make logical assumptions given that the description is too damn short to be useful. Things like “if we are NOT ignoring case, then case will be considered”. Plus, as far as I can tell, the actual behaviour is non-obvious.

Furthermore, it’s a great shame that there is no “Western European” option to the Collate SWI, because the Betamax/German-S-thing will only be counted as an S in German. Brits get the fl and fi ligatures counted as ‘fl’ and ‘fi’. It doesn’t seem to recognise æ as ‘ae’ (but might in French?).
Hmm… Shouldn’t the ‘ü’ in German be considered equal to ‘ue’ or something?

This design flaw means it’s pretty much not possible to correctly sort a list of names where some are local and some are foreign. As my example shows, it’ll get it mostly correct, but not completely.
Hmm, par for the course with Territory, then. ;-)

As to hanging the machine… I made a whoopsie. Name #0 was “april” and name #9 was also “april” (I meant to make it capitalised and forgot). I’m not sure which SWI objected, but no error was raised. The machine just stiffed. Yay!

 
Feb 1, 2022 5:25pm
Avatar Grahame Parish (436) 460 posts

What does it do if there are two entries which are identical? Shuffle them both around forever?

 
Feb 1, 2022 5:40pm
Avatar Martin Avison (27) 1375 posts

I also made a whoopsie … It seems that flag bit 2 (interpret cardinals) works as intended. I have corrected my bug comment.

if there are two entries which are identical

Some sort techniques preserve the order of equal keys, but the majority do not, and it is random which ends up first. I cannot remember offhand what HeapSort does. It is random with my ArmSort, which uses flash and shell sort algorithms.

 
Feb 1, 2022 6:39pm
Avatar Rick Murray (539) 13048 posts

What does it do if there are two entries which are identical? Shuffle them both around forever?

Might be a bug in my code, but it hung (in an unrecoverable way as it will have been stuck somewhere in the OS)… so, yeah, probably beating itself across the head trying to work out which should come first. ;-)

 
Feb 2, 2022 10:57am
Avatar Martin Avison (27) 1375 posts

440 MOV R3, flags% ; cardinals, ignore case, ignore accent

I supect that line is a bug. As written flags% will be taken as a register number to load into r3, when I think it should be a literal #flags%

Not sure if that could cause a loop, however!

 
Feb 2, 2022 6:31pm
Avatar Rick Murray (539) 13048 posts

when I think it should be a literal #flags%

That’s a good point. I’ll fiddle with that next time I’m at the computer and see if egg suddenly splatters over my face. :-)

PS: Lesson number one billion, four hundred thousand and thirty eight for why one shouldn’t use assembler in the 21st century…

 
Feb 6, 2022 1:05pm
Avatar Fred Graute (114) 608 posts

Doesn’t seem to be the case. I replaced Emily with april (lower case), and it sorted the names in the order Angela, April, april for flag settings 0-3.

Apply the fix Martin suggested, and make sure april comes before April, and you will see a difference in the output depending on case-sensitivity.

The code I tested with was quite similar to Martin’s:


PROC_Collate("angela","april")
PROC_Collate("angela","April")

PROC_Collate("April","april")
PROC_Collate("april","April")

END

DEF PROC_Collate(s1$,s2$)
LOCAL f%,r0%
  PRINT "Comparing '" + s1$ + "' with '" + s2$ + "'"
  PRINT SPC7;"Flags";SPC4;"Result"

  FOR f% = 0 TO 3
    SYS "Territory_Collate",-1,s1$+CHR$(0),s2$+CHR$(0),f% TO r0%
    PRINT f%,r0%
  NEXT f%

  PRINT
ENDPROC

Pages: 1 2

Reply

To post replies, please first log in.

Forums → Code review →

Search forums

Social

Follow us on and

ROOL Store

Buy RISC OS Open merchandise here, including SD cards for Raspberry Pi and more.

Donate! Why?

Help ROOL make things happen – please consider donating!

RISC OS IPR

RISC OS is an Open Source operating system owned by RISC OS Developments Ltd and licensed primarily under the Apache 2.0 license.

Description

Developer peer review of proposed code alterations.

Voices

  • Steve Drain (222)
  • Rick Murray (539)
  • Sprow (202)
  • André Timmermans (100)
  • Martin Avison (27)
  • Fred Graute (114)
  • Grahame Parish (436)

Options

  • Forums
  • Login
Site design © RISC OS Open Limited 2018 except where indicated
The RISC OS Open Beast theme is based on Beast's default layout

Valid XHTML 1.0  |  Valid CSS

Powered by Beast © 2006 Josh Goebel and Rick Olson
This site runs on Rails

Hosted by Arachsys