| Date: |
16-Mar-2001 |
| Document number: |
2501,846/FS |
| Change number: |
ECO 4428 |
| Master format: |
HTML |
| Issue: |
3 |
| Last release: |
2 |
| Associated project: |
310 "STB-400" |
| Authors: |
A.Hodgkinson |
| Status: |
Confidential (secret) © Pace Micro Technology plc |
Modified for RISC OS Open Limited Wiki by ADH, 30-May-2007. Adapted for Textile
engine with layout-related changes only, except for removal for redundant external
links.
Software on the
STB 400 often needs to find
out if a given uniform resource locator -
URL [1],
[2],
[3] - meets some comparison criteria in
order to perform different behaviour depending on the
URL in use.
For example, a match may activate various extensions in the web browser depending on
the page being fetched or lead to selection of a certain protocol module for video
playback.
Check URL provides a central interface through which its clients
may perform this task.
© Pace Micro Technology plc. All trademarks are acknowledged.
- Foreword
- Acknowledgements
- Contents
- Audience
- Overview
- Outstanding issues
- File formats
- Programmer interface
CheckURL_Check
CheckURL_ReadAreaID
CheckURL_ReadFile
CheckURL_AddArea
CheckURL_DeleteArea
- Check URL errors
- Dependencies
- Performance targets
- Development test strategy
- Acceptance criteria
- Appendix A: Example area ID cacheing code
- History
- References
This document is aimed at a technical audience intending to incorporate calls
to
Check URL in client software.
Check URL matches a given
URL against one or many
URL fragments and reports whether or not there
is a match. The fragments can lead to matches based on complete host names or
partial domains, ports, and paths. The fragments can be supplied in a central
configuration file read at module initialisation (in
RAM
builds) or given to the module at run-time.
Matches are searched for within an
area and only
URL fragments within that area will be checked. This
allows many different applications to use
Check URL simultaneously.
It also allows them to share their required fragments within one
Check
URL configuration file, though they may dynamically add and remove areas
if they wish.
Client software decides what area titles to use. It is strongly recommended that
area titles are based on the software's allocated name, an underscore, then any
more specific title that the client may wish to associate. This helps avoid
area namespace collision (for example,
VideoControl_ProtocolModules).
There are no outstanding issues at present.
Fragment and area descriptions can be provided in a central configuration file
as well as dynamically. The configuration file is read from
Choices:CheckURL at module initialisation in
RAM builds
or just before the first
Check URL SWI is invoked after
initialisation in
ROM builds. The file is plain text; LF or CR
are taken as line endings, and blank lines are ignored (so
DOS
style text files will work fine). All other white space is treated as a field
separator. The body of each area within the file is based on two fields; one
describes hosts, domains, or partial
URLs (without any fetch
scheme specified) and the other gives a parameter string.
If
at the start of a new line a hash character is encountered, the line
is treated as a comment and the sectionNumberingContents are ignored save for scanning for a
subsequent LF or CR character marking the start of the next line. Otherwise, the
data is treated as the host, domain or
URL fragment:
- If the item starts with a dot, '.', then the right hand side of any domain
will match (e.g. .co.uk will match pace.co.uk, this.that.co.uk,
but not this.org.uk or this.co.uk.that).
- If the item does not start with a dot, then the whole domain must match
(e.g. pace.co.uk will match pace.co.uk but not video.pace.co.uk).
- If a specific port specifier is to be looked for, rather than any port, include it in the
usual fashion with a colon after the domain or domain fragment (e.g.
.pace.co.uk:5000).
- If the item has a forwards slash character in, '/', then in addition to
matching the domain as described above, the left hand side of the path
is matched too (e.g. .co.uk/this/ will match pace.co.uk/this/that.mpi
as well as pace.co.uk/this/ but not pace.co.uk/that/; in addition,
.co.uk/this will match video.co.uk/this/that.mpi as well as
video.drome.co.uk/thistoo/).
After any amount of white space not including CR or LF,
Check URL
expects to see a
parameter string. This is any combination of characters
terminated by CR and LF. When clients find a match, this string (NUL terminated) is
given back as part of the match details. Clients can thus encode any information
they need statically associated with the match fragment in this parameter string.
If a parameter string is not seen, the data read thus far is assumed to be an area
name instead of a
URL fragment. Consequently area titles cannot
contain white space. If you do not want to associate any parameter string with a
URL, just include (say) a single non-white space character as a place holder; for
example, a single hyphen.
The number of fragments, the length of those fragments and the length of their
parameter strings is limited by available memory only. The length of an area title
is also limited by available memory only. The internal usage of area IDs limits
the number of areas to 2
24, memory permitting.
Within each area, matches are carried out from the bottom up. Put the most specific
matches, if required, last, and the most general matches first. For example, you may
wish to match a path of
/this/specific/path/ for one thing, otherwise match
/this/ for everything else - in that case put the more general rule first in
the file.
All string matches, without exception, are case-sensitive.
Whenever an attempt to open a file in
Check URL fails when a
SWI
requires it,
error &818601 is generated. This isn't raised if the
central configuration file can't be opened at module initialisation. If the format of any
file appears to be invalid,
error &818602 is generated. This
does include the central configuration file.
- Example
-
# Video Control protocol module selection.
VideoControl_ProtocolModules
jupiter.eng.acorn.co.uk 53580
.eng.acorn.co.uk/testvideos/ 53540
.eng.acorn.co.uk/testvideos/2/ 535C0
# JavaScript video control extension security.
NCFresco_JavaScript_VideoSecurity
webpool.isp.com -
users.isp.com -
See if a
URL matches any fragments.
- On entry
-
| R0 | = |
Flags:
| Bit(s) |
|
Meaning |
| 0 |
|
If clear, R1 is a pointer to a NUL-terminated area name string. If set, R1 is an area ID
|
| 1 |
|
If clear, R1 is a pointer to a NUL-terminated URL string. If set, R1
is a pointer to a URL descriptor block
|
| 2-31 |
|
Reserved (must be zero) |
|
| R1 |
= |
Pointer to a NUL-terminated area name string if R0:0 clear, else an area ID
|
| R2 |
= |
Pointer to a NUL-terminated URL string if R0:1 clear, else pointer
to a URL descriptor block
|
- On exit
-
| R0 | = |
Flags:
| Bit(s) |
|
Meaning |
| 0 |
|
If set, there is a match (R1 is valid). If clear, there is no match and R1 is
preserved
|
| 1-31 |
|
Reserved (must be zero) |
|
| R1 |
= |
Pointer to the parameter string associated with the matched fragment, NUL terminated,
if R0:0 on exit is set, else preserved
|
All other registers preserved.
- Interrupts
- Interrupt state is undefined.
- Re-entrancy
- SWI is not re-entrant.
- Use
-
This SWI checks a given URL against any fragments
listed in the given area. The area may be specified as a string or as a numeric ID (see
SWI CheckURL_ReadAreaID). Using
an ID is faster as the area name string doesn't have to be matched to find the fragments.
The URL may be specified as either a normal NUL-terminated string, or
if the client has had cause to pass it through the URL Fetcher [4]
module, it can give a pointer to a descriptor block as filled in by SWI
URL_ParseURL 1. If the client already has this information, it stops Check URL having
to claim memory and spend time splitting the URL up again.
If the area given is invalid or cannot be found, error &818600
is returned.
- Related SWIs
- CheckURL_ReadAreaID
- Related vectors
- None
Find out the quick reference ID for a given area name string or vice versa.
- On entry
-
| R0 | = |
Flags:
| Bit(s) |
|
Meaning |
| 0 |
|
If clear, R1 on entry holds a pointer to a NUL-terminated area name string and on
exit holds an area ID if the name is recognised. If set, R1 on entry holds an area
ID and on exit points to a NUL-terminated area name string if the ID is recognised
|
| 1-31 |
|
Reserved (must be zero) |
|
| R1 |
= |
Pointer to a NUL-terminated area name string if R0:0 clear, else an area ID
|
- On exit
-
| R0 | = |
Flags All bits currently reserved (must be zero)
|
| R1 | = |
An area ID if R0:0 on entry was clear, else pointer to a NUL-terminated area name string
|
All other registers preserved.
- Interrupts
- Interrupt state is undefined.
- Re-entrancy
- SWI is not re-entrant.
- Use
-
This call is used to convert between an area name string and an area ID and
back again. It is quicker to use area IDs in calls that take an ID or a string
in order to find associated fragments since the ID references the fragments
directly. Clients must treat area IDs as opaque values.
If the area given is invalid or cannot be found, error
&818600 is returned.
An area ID is in practice composed of two parts; a 24-bit array index leading
to the 224 area limit and an 8-bit counter. An internal array
holds pointers to the area fragments along with an associated counter. When
an area is deleted its position in the array may be re-used later, but the
counter is incremented. This way, if a stale area ID is used by a client the
mismatch of the counter will let Check URL determine that the ID is
no longer valid. A similar system is used for task handles in the Window
Manager. Clients can take an area ID, cache it, and hold this for all time,
but if they do so they risk the ID becoming stale. If somebody else was to
delete and recreate the area the ID would change. To be completely robust,
code should cache the ID and attempt to re-cache once if the ID subsequently
fails. Appendix A contains example code.
An area ID of zero is guaranteed to be invalid.
- Related SWIs
-
CheckURL_Check
CheckURL_AddArea
CheckURL_DeleteArea
- Related vectors
- None
Read a new configuration file.
- On entry
-
| R0 | = |
Flags: All bits currently reserved (must be zero)
|
| R1 | = | Pointer to NUL-terminated filename string |
- On exit
-
All registers preserved.
- Interrupts
- Interrupt state is undefined.
- Re-entrancy
- SWI is not re-entrant.
- Use
-
This SWI allows clients to give Check URL a new
configuration file in the same format as the main file described in the
File formats section. Usually a client will
have called CheckURL_DeleteArea
beforehand to remove any current areas, though this is not necessary. In
the event that an area is already present with the same name as in the
new file, fragments will be added. Any duplicates will not be stripped
out. Items added will be checked before the items already in the area,
with bottom-up matching within the file itself (as for the main
configuration file). For example, if Check URL already holds an
area from reading the following file:
VideoControl_ProtocolModules
/ 53580
.eng.acorn.co.uk/testvideos/ 53540
and a client attempts to load this file:
VideoControl_ProtocolModules
/ 53A00
/multicast 53540
the result is the same as loading the following in one go:
VideoControl_ProtocolModules
/ 53580
.eng.acorn.co.uk/testvideos/ 53540
/ 53A00
/multicast 53540
So with bottom-up matching, the / entry for 53580 would
never get matched.
Adding new areas or adding new fragments to existing areas does not
alter the validity of area IDs.
- Related SWIs
-
CheckURL_AddArea
CheckURL_DeleteArea
- Related vectors
- None
Add a new area, or a fragment to an existing area.
- On entry
-
| R0 | = |
Flags:
| Bit(s) |
|
Meaning |
| 0 |
|
If clear, R1 is a pointer to a NUL-terminated area name string. If set, R1 is an area ID
|
| 1 |
|
If clear, R2 is a pointer to a NUL-terminated set of CR or LF separated fragments and
parameter pairs. If set, R2 is a pointer to a NUL-terminated filename string
|
| 2-31 |
|
Reserved (must be zero) |
|
| R1 | = |
A NUL-terminated area name string if R0:0 clear, else an area ID
|
| R2 |
= |
Pointer to a NUL-terminated set of CR or LF separated fragments and parameter pairs if
R0:1 clear, else pointer to a NUL-terminated filename string; zero if no fragments are
to be added to the area at this time
|
- On exit
-
| R1 | = |
Area ID of the (possibly new) area in use.
|
All other registers preserved.
- Interrupts
- Interrupt state is undefined.
- Re-entrancy
- SWI is not re-entrant.
- Use
-
This SWI is used to add new areas or add fragments to
existing areas. If called with an area ID or name that is unknown, a new
area is created, else any fragments given are merged into the existing
area in exactly the same way as described for SWI
CheckURL_ReadFile.
Fragments may be supplied directly, as a single string holding what amounts
to the fragment definition part of a configuration file, or in a file
containing the same data. The format in both cases is basically the same as
for the main configuration file but just describes one area, and so must
contain no area names. If the SWI is given a string
holding match fragments directly but the format appears to be invalid,
error &818603 is returned.
The return value in R1 allows clients creating new areas by name to
avoid making a subsequent call to SWI
CheckURL_ReadAreaID if they
want to refer to the area by ID in future.
- Related SWIs
-
CheckURL_ReadFile
CheckURL_DeleteArea
- Related vectors
- None
Remove one or all areas.
- On entry
-
| R0 | = |
Flags:
| Bit(s) |
|
Meaning |
| 0 |
|
If clear, delete the area specified in R1. If set, delete all areas
|
| 1 |
|
If clear, R1 is a pointer to a NUL-terminated area name string. If set, R1 is an area ID
|
| 2-31 |
|
Reserved (must be zero) |
|
| R1 | = |
If R0:0 set, ignored. If R0:0 and R0:1 clear, pointer to a NUL-terminated area
name string. If R0:0 clear and R0:1 set, an area ID
|
- On exit
-
All registers preserved.
- Interrupts
- Interrupt state is undefined.
- Re-entrancy
- SWI is not re-entrant.
- Use
-
This call deletes an area based on its name or ID, or will delete all areas if R0:0 is
set. Any area ID relating to a deleted area will become invalid immediately.
If R0:0 is clear and the area given in R1 is invalid or cannot be found,
error &818600 is returned.
- Related SWIs
-
CheckURL_ReadFile
CheckURL_AddArea
- Related vectors
- None
The
Check URL module is allocated one error block at &818600:
| Error no. | Meaning |
| &818600 |
Area not known
A client has passed an unknown area name string or ID to SWI
CheckURL_Check,
CheckURL_ReadAreaID, or
CheckURL_DeleteArea.
|
| &818601 |
Cannot open configuration file
An attempt to open a configuration file has failed. This is only raised in response
to any SWI that calls for a file to be read.
|
| &818602 |
Invalid configuration file
Raised whenever any file read by Check URL is of an apparently
invalid format. This includes finding an area name field if a file is read in
SWI CheckURL_AddArea.
|
| &818603 |
Invalid fragments
The URL fragments and parameters string given to SWI
CheckURL_AddArea is of an apparently
invalid format (for example, a fragment may be missing a parameter).
|
| &818604 |
Check URL could not claim enough memory
Memory was exhausted during some allocation operation being performed
by Check URL.
|
For
URL canonicalisation,
Check URL will call
the URL_ParseURL
SWI and thus
URL Fetcher 0.43
or later must be present.
Final code size of the version described by this document should be inside
24K. Memory claimed will depend on the number of areas and their fragments
and parameters, and the lengths of the strings involved. No memory will be
claimed at run-time if all areas are deleted. Since area IDs work as indices
into an array of pointers, the array itself may remain at a high water mark
size if an area sits at the top of the array even after all others are
deleted; however, once that area is removed the whole array will be freed.
The
SWI interface will be tested to ensure it performs as
documented in a debug build, with various combinations of valid and invalid
configuration files or configuration file fragments.
The
SWIs must perform as documented and the
performance targets must be met.
Refering to an area by its area ID is significantly faster than referring
to it by name. Multiple clients may be using
Check URL however, and
it is possible that one of them could delete and recreate an area you are
using, e.g. to ensure that a known set of fragments exist in that area and
no others. At this point the old ID becomes stale. Whilst you may consider
it legitimate to allow this stale ID to fault depending on the nature of
the client you are writing, if possible it is better to make a single
attempt to re-read the ID and continue if this attempt succeeds. An example
piece of C code might read as follows:
#include <stdlib.h>
#include <stdbool.h>
#include <swis.h>
/* This is exported via. Check URL's !MkExport */
#include <CheckURL.h>
/* Area name to use */
#define AreaName "VideoControl_ProtocolModules"
/* For any URL, this can hold a more complete description than */
/* strings, and makes comparing two URLs in a valid manner easier. */
typedef struct url_description
{
char * full; /* Complete, canonicalised URL */
char * protocol; /* Such as 'http' or 'mailto' */
char * host; /* E.g. 'www.acorn.com' */
char * port; /* For example '8080' */
char * user; /* E.g. 'ahodgkin' */
char * password; /* E.g. 'NotMine' */
char * account; /* As in ftp://user:pass:account@host/ */
char * path; /* Speaks for itself */
char * query; /* CGI info - after a '?' in a URL */
char * fragment; /* Anchor info - after a '#' in a URL */
}
url_description;
/**************************************************************/
/* url_match() */
/* */
/* Match a given URL_Description in the area recorded in */
/* 'ConfigArea' through the Check URL module. Caches the area */
/* ID for speed and will attempt to re-cache if this ID */
/* appears to become invalid later via. a recursive call. */
/* */
/* Parameters: Pointer to the url_description to match; */
/* */
/* Pointer to a char * to take a pointer to the */
/* match parameter (will be NULL on exit if the */
/* match fails); */
/* */
/* true to support the stale ID recovery attempt */
/* else false. */
/**************************************************************/
static _kernel_oserror * url_match(url_description * d,
const char ** param,
bool allow_fail)
{
static unsigned int area_id = 0;
_kernel_oserror * e;
unsigned int match;
if (param == NULL) return NULL;
*param = NULL;
/* Ensure we have an area ID */
if (area_id == 0)
{
allow_fail = false; /* Make sure we don't try and reread it in a moment */
e = _swix(CheckURL_ReadAreaID,
_INR(0,1) | _OUT(1),
0,
AreaName, /* See near top of file */
&area_id);
if (e != NULL) return e;
}
/* Try the match */
e = _swix(CheckURL_Check,
_INR(0,2) | _OUTR(0,1),
CU_Check_OnEntry_GivenAreaID | CU_Check_OnEntry_GivenURLDescriptor,
area_id,
d,
&match,
param);
if (e == NULL)
{
/* If no match, clear "param" (R1 is preserved on exit for no match) */
if ((match & CU_Check_OnExit_MatchFound) == 0) *param = NULL;
}
else if (e->errnum == cu_ERROR_AREA_NOT_KNOWN && allow_fail)
{
/* Since allow_fail is true, we area allowed to fail on an area ID */
/* lookup. This is because we know IDs can become stale. In this */
/* case, try again, but only once. */
area_id = 0;
e = url_match(d, param, false);
}
return e;
}
| Issue A |
06-Mar-2000 |
First draft completed and checked (ADH)
|
| Issue B |
09-Mar-2000 |
Released following review (ADH)
|
| Issue 1 |
21-Mar-2000 |
Corrected a description of R1 that didn't match the description of
flags in R0 and corrected description of the way in which fragments
are added to existing areas (ADH)
|
| Issue 1A |
22-May-2000 |
Extended CheckURL_AddArea to return an area ID in R1. ROM builds
have to defer loading of the central configuration file. Gave
example code for cacheing an area ID. Few typing errors fixed (ADH)
|
| Issue 1B |
05-Jul-2000 |
Corrected example code which wasn't checking on-exit flags of the
call to CheckURL_Check (ADH)
|
| Issue 2 |
04-Aug-2000 |
Released following review (AMR 5390, ADH)
|
| Issue 3 |
16-Mar-2001 |
Updated automatic section numbering to handle subsectionNumberingSections. Used full subsection
numbers for SWIs. Updated references section using links into the drawing office
search engine and added validation footer. Overall, all changes were internal.
ECO allocated for release (ECO 4428, ADH)
|
The following may prove useful:
- RFC 1630: Uniform Resource Identifiers in WWW
('http://www.faqs.org/rfcs/rfc1630.html' - April 1998).
Official location: 'ftp://ftp.isi.edu/in-notes/rfc1630.txt'
- RFC 1738: Uniform Resource Locators
('http://www.faqs.org/rfcs/rfc1738.html' - April 1998).
Official location: 'ftp://ftp.isi.edu/in-notes/rfc1738.txt'
- RFC 1808: Relative Uniform Resource Locators
('http://www.faqs.org/rfcs/rfc1808.html' - April 1998).
Official location: 'ftp://ftp.isi.edu/in-notes/rfc1808.txt'
- URL Fetcher API
Specification (1215,220/FS issue 3, 12-Nov-1998, ECO 4131)