Document Status
Distribution: General Release
Title: Acorn URL fetcher API specification
Drawing Number: 1215,220/FS
Issue: 0.25
Author(s): Paul Wain
Carl Elkins
Stewart Brodie
Andrew Hodgkinson
Issue Date: 12/11/1998
Change Number: ECO 4131
Last Issue: 0.24 (04/08/1998)
Contents
========
Issue history
Overview
Outstanding issues
Client to URL module interface
Protocol module to URL module interface
URL module to protocol module interface
URL module service calls
URL module *-commands
URL errors
Performance targets
References
Glossary
Issue history
=============
0.16 19/10/1997 First formal version of specification based on
uncontrolled textual programmer's notes. (RCE)
0.16a 20/10/1997 Incorporated notes from ADH & SB. (RCE)
0.19 17/11/1997 Incorporated details of service calls. (SNB)
0.20 20/11/1997 Incorporated details of URL parsing SWI. (SNB)
0.21 11/06/1998 All other updates incorporated. (SNB)
0.22 22/06/1998 Comments after first review incorporated
Added details of proxy enumeration SWI. (SNB)
0.23 25/06/1998 Comments from interested parties incorporated.
(SNB)
0.24 04/08/1998 No longer live. ECO 4082. (SNB)
0.25 12/11/1998 Four digit years on all dates. Tidied up white
space. Removed smart quotes and n-dashes.
Added author details to history. Corrected
references on R0 exit words from URL_ParseURL
to URL_Status. Added details of bit 1 of
flags word in R0 to URL_ParseURL. Clarified a
few sentences here and there. ECO 4131. (ADH)
Overview
========
The URL (Universal Resource Locator) module is a general purpose module for
fetching data from various Internet services. This specification reflects
the behaviour of version 0.42 or later of the URL_Fetcher module. The purpose
of the module is to provide a uniform entry point into a set of "fetcher"
protocols (e.g. FTP, HTTP, Gopher, NNTP, etc.), without the need for a client
application to understand how that protocol works. This is done using a
number of generalised URL SWIs. The fetcher protocols modules (hereafter just
"protocol modules") with which the URL module communicates, are called only by
the URL module itself. The entry points into the protocol modules have similar
names to the entry points into the URL module, but these are NOT the same,
despite similarities. The system structure is shown in figure 1 below.
/----------------\
| Applications |
\----------------/
|
|
v
/---------------------------\
| URL module |
\---------------------------/
^ | ^ |
| | | |
| v | v
/----------\ /----------\
| HTTP | | FTP | . . . . .
\----------/ \----------/
Figure 1: URL Fetching system structure
Each client fetch occurs with in the context of a 'session'. Each session is
identified by a different session identifier. Client session identifiers are
issued by the URL module upon request and remain valid until the client
informs the URL module to discard the session. Subsequently, session
identifiers may be re-issued by the URL module for new sessions. Only a
single object fetch can be performed in any one given session. Sessions
cannot be re-used by clients, even if a prior object fetch in that session
has completed.
The typical client usage of the system is:
* Obtain a session identifier (SWI URL_Register)
* Start fetching an object (SWI URL_GetURL)
* Repeatedly, whilst multi-tasking if in the desktop environment:
- Read blocks of data (SWI URL_ReadData)
- Process that data
* Discard session (SWI URL_Deregister)
If an application decides it requires a premature termination (e.g. the user
asked the application to quit whilst an object was being downloaded), then
the application calls SWI URL_Stop immediately and then discards the session
with SWI URL_Deregister. Typical clients, such as web browsers, will, most
likely, have several sessions active concurrently.
The URL module uses its own session identifiers that are passed in many of
the SWI interfaces to the protocol modules which are not those known to the
client application - the URL module maintains its own private sessions into
the protocol modules. Service calls are also provided to ease interaction
between the URL module and the fetchers, mainly to inform other modules of
the arrival or departure of a particular module.
Each protocol module accepts data and returns results as per the HTTP
protocol. Thus any extra client data associated with a request (passed
in R4 to SWI URL_GetURL) will take the format of a (possibly empty) set of
HTTP headers, an empty line and then the data; and each response will start
with an HTTP/1.0 or HTTP/1.1 Response-Line of the format: "HTTP/1.0 200 OK"
followed by various headers identifying the content-type of the retrieved
data, followed by an empty line, followed by the data itself.
Outstanding issues
==================
None.
Client to URL module interface
==============================
A typical client would be an application, such as a Web Browser. The
following SWI calls provide the interface for an application to control and
transfer data via the URL module.
SWI URL_Register (&83E00)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
On exit:
R0: Reserved - currently zero.
R1: Session identifier.
All other registers preserved.
SWI is not re-entrant.
Interrupt status undefined.
This SWI initialises a client session with the URL module and provides the
client with a session identifier that can be used to monitor the status of
the URL module within that client's context. The session identifier is unique
for each client session that is registered with URL and is also used as an
identifier in subsequent interactions with the URL module.
Multiple registration by the same client application is permitted. This will
provide the client with multiple identifiers to the URL module. Calling this
SWI does not result in the calling of any protocol module SWIs.
The URL module imposes no limit on the number of concurrently registered
sessions, other than having the required memory available in which to store
details of the session.
SWI URL_GetURL (&83E01)
On entry:
R0: Flags:
Bit 0 => R6 is valid.
Bit 1 => R5 holds length of data in R4 specified buffer,
otherwise a single NUL terminated string in buffer.
Bits 31-2 reserved (0).
R1: Session identifier.
R2: Bits 7-0 => Method (8-bit value, held in bits 7-0).
This is protocol dependent. See table below for values.
Bits 15-8 => Method dependent.
Bits 31-16 reserved (0).
R3: URL - The document we are after including the protocol,
e.g. "http://www.acorn.co.uk/".
R4: Data block:
Data to send in addition to the URL. Validity is protocol
and method dependent.
R5: If R0:1 is set, length of data in R4 data block.
If R0:0 is clear, must be 2.
R6: 'User Agent':
Pointer to string to use as 'User Agent' identifier in
request header if R0:0 is set. A NULL pointer or NULL string
implies use default identifier - see below.
On exit:
R0: Protocol status (as defined for SWI URL_Status, below).
All other registers preserved.
SWI is not re-entrant.
Interrupt status undefined.
This SWI is used to instigate a transfer of data to or from (mainly from) a
resource server. When this SWI has been called, the URL module checks the
per-session and global proxy settings, looking for a match (see SWI
URL_SetProxy for details on setting proxies and proxy conflict resolution).
If no proxy is to be used, then URL looks for a protocol module which is
capable of handling the URL specified by R3. If a proxy setting was found,
then a pointer to the proxy URL is placed in R7, R0:31 is forced to value 1,
and URL looks for a protocol module which is capable of handling the
specified proxy URL. In both cases, if a suitable module cannot be located,
the URL module generates an error. If a protocol module capable of handling
the URL was found, then all client registers are passed onto the protocol
module via the Protocol_GetData SWI call with the exceptions stated above for
proxy handling. On exit, R0 will hold the status code returned by the
protocol module.
The extra data pointed to by R4 on entry is method and protocol specific.
For example, in HTTP, the data comprises HTTP headers and, if appropriate, an
entity body. Protocol modules should use this style wherever possible. Note
that these headers do not include lines such as an HTTP Request-Line (ie. the
"GET / HTTP/1.0" part. For example, when posting data to an HTTP URL as the
result of a form submission on a web page, the web browser would supply a
Content-Type header, Content-Length header, potentially some kind of encoding
header, a blank line and then the entity body.
The User Agent string pointed to by R6 if R0:0 is set, is in indication to
the underlying protocol module of how the module should identify itself to
remote systems. This controls the User-Agent header for the HTTP protocol
module, for example. The protocol module is free to define its default
identifier as it pleases, however, following the format of the HTTP
User-Agent is recommended where possible and appropriate to the protocol.
Modules may choose to ignore or amend any User-Agent string. For example,
the AcornHTTP module will suffix the client's User-Agent with its own version
number, resulting in complete identifiers such as:
User-Agent: Acorn Browse/2.06 AcornHTTP/0.82
where the client only specified "Acorn Browse/2.06".
Table of method numbers
FTP HTTP and others Comment
1 RETR/LIST GET ("Get this object" operation)
2 n/a HEAD ("Get entity headers" operation)
3 n/a OPTIONS ("Get server options" operation)
4 n/a POST ("HTTP POST" operation)
5 n/a TRACE ("HTTP TRACE" operation)
6 n/a n/a (Reserved to Acorn - do not use)
7 n/a n/a (Reserved to Acorn - do not use)
8 STOR PUT ("Store this object" operation)
9 MKD n/a ("Create directory" operation)
10 RMD n/a ("Remove directory" operation)
11 RNFR/RNTO n/a ("Rename object" operation)
12 DELE DELETE ("Delete object" operation)
13 STOU n/a ("Store object unique" operation)
Applications for new method codes should be made to Developer Support.
The range 128-254 is reserved for private non-distributed modules.
Method numbers 0 and 255 are reserved and must not be used.
The list of methods specific to FTP quoted above are fully implemented
in version 0.28 of the FTP Fetcher module. The list of methods specific
to HTTP quoted above are fully implemented in version 0.82 of the AcornHTTP
module.
SWI URL_Status (&83E02)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Session identifier.
On exit:
R0: Status word:
Bit 0 => Connected to server.
Bit 1 => Sent request.
Bit 2 => Sent data.
Bit 3 => Initial response received.
Bit 4 => Transfer in progress.
Bit 5 => All data received.
Bit 6 => Transfer aborted.
Bits 31-7 reserved (0).
R1: Preserved.
R2: Server response, as an "HTTP" response code (200, 401 etc.).
R3: Bytes read so far (total body data count).
R4: Total bytes to be transferred in whole transaction
if known (approximate value only), or -1 if unknown.
All other registers preserved.
SWI is not re-entrant.
Interrupt status undefined.
This SWI is used to monitor the transfer of data from a remote service. It
is protocol independent - the exit status bits are common to all services.
Clients must test this field bit-wise, since the value is cumulative.
Clients may not assume that the states returned in R0 will progress in any
particular combination or order. However, the likely progression during a
fetch for a resource being retrieved over a network (when the bits are
combined into a single decimal value) is: 0,1,3,7,15,31 and then R0:5 set
upon completion, and R0:6 set at any stage when an error has occurred.
Since each protocol module is returning its results according to the HTTP
protocol, R2 can be treated as an HTTP response code whatever the URL
being fetched. For example, the FileFetcher module will indicate file not
found errors by setting the response code to 404 (HTTP's Not Found error
code).
Note that in the case of, for example, an HTTP 400 (Forbidden) return, some
explanatory data may be received, too. If the amount of data to be received
is unknown, R4 will contain -1, however R3 will contain the number of bytes
received so far. The R4 value should be treated as approximate, since the
exact interpretation varies between protocols.
When this SWI is called, the URL module invokes the Protocol_Status SWI for
the protocol module concerned with the request.
SWI URL_ReadData (&83E03)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Session identifier.
R2: Client buffer for received data.
R3: Size of buffer pointed to by R2.
On exit:
R0: Status word (see SWI URL_Status).
R2: Preserved. Contents of buffer modified.
R4: Number of bytes transferred to R2 buffer.
R5: Number of bytes still to be read to complete object
(if known) or -1 if unknown.
All other registers preserved.
SWI is not re-entrant.
Interrupt status undefined.
This SWI is used to read the data pending from a request, find out how much
data has been read on this call and how much more there is remaining to be
read for the request. R2 is a pointer to a buffer on entry (and R3 is the
size of the buffer), on exit the buffer contains the new data, R4 contains
the amount of data written to the buffer and R5 contains the amount of data
left to be read. If the amount of data left is unknown R5 will contain -1. R1
always returns the protocol status code. In the event of all the data being
read (R5 = 0 on exit), a call to URL_Stop is not required as this is
performed automatically when URL_Deregister is called for the client session.
Once all data has been read a call to URL_Status can return no meaningful
information, simply indicating that the transfer has completed.
The data returned will take the form of a complete HTTP compatible response.
Responses should use HTTP/1.0 if possible and avoid HTTP/1.1. For example,
AcornHTTP will downgrade any higher version responses to HTTP/1.0, having
taken care to remove any features applicable only to the higher version, such
as chunked transfer encodings.
When this SWI is called, the URL module invokes the Protocol_ReadData SWI for
the protocol module concerned with the request.
SWI URL_SetProxy (&83E04)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Session identifier.
R2: Address of buffer containing a URL base.
R3: URL 'method' to proxy (address of URL fetch identifier
to be proxied).
R4: 0 => Proxy request.
1 => Don't proxy request.
All other values reserved.
On exit:
R0: Status word (see SWI URL_Status for details)
All other registers preserved.
SWI is not re-entrant.
Interrupt status undefined.
This call is used to set up a proxy server to use for a session with the URL
module. If R1 is zero then the proxy is considered global and is used for
all sessions. If R1 is a valid session identifier then the proxy server for
that session only is set. R2 is a pointer to a string containing the base URL
to pass the request on to when a proxy request is made. This is of the form
"http://www-cache.demon.co.uk:8080/" (note the trailing '/'). A common
error is to omit the port number. If the port number is not specified, then
the default port number is used. See discussion under URL_ProtocolRegister
regarding how the default port number is derived.
R3 is a pointer to a buffer containing the initial part of the URL to proxy -
the URL scheme (e.g. "http:", "ftp:"). This system has the advantage that
requests to certain hosts can be proxied and not others (e.g. by giving
"http://www.acorn.co.uk/" as the scheme). However, if R4 is 1, this indicates
that no matter how the proxy settings have been defined, requests to the base
URL should not be proxied in this case (R3 is undefined). When a URL_GetURL
request is received, the proxy settings are evaluated in the following order:
1 Client no-proxy
2 Client proxy
3 Global no-proxy
4 Global proxy
This is to ensure all client settings override global settings and thus
remain safe for the given client - ie. a client which sets up a proxy server
and then defaults all other URLs to no-proxy, can, no matter how the global
settings are changed, be sure of where requests will end up. If R2=0 on
entry, then all proxy settings for the specified session are cleared.
Calling this SWI does not result in any calls being made to protocol modules.
SWI URL_Stop (&83E05)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Session identifier.
On exit:
R0: Status word (see URL_Status for details).
All other registers preserved.
SWI is not re-entrant.
Interrupt status undefined.
This call aborts a current request if there is one associated with the
session identifier. In the event of no request being associated with the
identifier, an error is generated. The purpose of this SWI call is to provide
the client with a way of enforcing the termination of a request. It is not
called by the client just because all the data associated with the request
has finished being transferred, although it may do that if it so chooses. The
URL_Stop call will be made automatically by the URL module when the session
is deregistered by the client using SWI URL_Deregister.
When this SWI is called, the URL module invokes the Protocol_Stop SWI for the
protocol module concerned with the request.
URL_Deregister (&83E06)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Session identifier.
On exit:
R0 Status word (see SWI URL_Status for details).
All other registers preserved.
SWI is not re-entrant.
Interrupt status undefined.
This call deregisters the client session from the URL module, freeing up any
information the URL module may have kept about the client session (e.g. proxy
information). The session identifier ceases to be valid and becomes
available for re-issue on a subsequent call to SWI URL_Register.
When this SWI is called, the URL module invokes the Protocol_Stop SWI for the
protocol module concerned, if it has not already done so (e.g. during the
processing of URL_Stop).
SWI URL_ParseURL (&83E07)
On entry:
R0: Flags:
Bit 0 => If set, R5 contains number of words in data block,
else a default of 10 words is assumed.
Bit 1 => If set, character codes 0 to 31 and 127 in the URL
will be escaped (hex encoded, e.g. space becomes
'%20') - only available in URL 0.42 or later. URL
0.38 through to 0.41 inclusive always escape these
characters. Versions prior to 0.38 never do this.
Bits 31-2 reserved (0).
R1: Reason code:
0 => Return component buffer requirements.
1 => Return component data in specified buffers.
2 => Construct full URL from component buffers.
3 => 'Quick parse'.
R2: Pointer to base URL.
R3: Pointer to URL relative to base URL (or NULL if none).
R4: Pointer to data block of R5 words (unless R1=3, see below,
or R0:0 is unset, in which case R4 points to a buffer of
at least 10 words in length).
R5: If R0:0 set, size of R4 block in words.
If R3 is non-NULL, it is assumed to point to a partial URL which needs to be
resolved with respect to the base URL pointed to by R2. If R3 is NULL, then
R2 is assumed to point to a full URL.
On exit:
R0: Flags:
Bits 31-0 Reserved (0).
All other registers preserved.
SWI is not re-entrant.
Interrupt status undefined.
Data block at R4 is updated in line with entry reason code.
This SWI is used to parse URLs into their constituent parts, enabling clients
to extract the various fields from the URL in a reliable manner. The call is
also capable of resolving a relative URL to produce a fully-qualified URL,
and of reconstructing a full URL from a set of components.
The data block referred to above is either a block of integers which will
be updated to contain the size of the required buffer for each element, or a
block containing pointers to buffers for the actual data.
All strings are zero-terminated and all lengths include space for the zero
terminator.
The number of entries in the block is specified in R5 if R0:0 is set on
entry. If R0:0 is clear, then the default value of 10 is assumed. The
format of the data block is:
Offset Usage
+ 0 Fully canonicalised URL.
+ 4 URL protocol (e.g. "http", "ftp") forced to lower-case.
+ 8 Hostname (e.g. "www.acorn.com") forced to lower-case.
+ 12 Port (e.g. "80").
+ 16 Username - used for FTP authentication and mailto.
+ 20 Password - for FTP.
+ 24 Account - for FTP.
+ 28 Path (e.g. "pub/riscos/releases") [See note].
+ 32 Query - for HTTP, things after a query character.
+ 36 Fragment - for HTTP, things after a hash character.
It is anticipated that this SWI will be called twice: the first time to find
the lengths of the buffers, and the second to retrieve a copy of the data
into the buffers. The URLs pointed to by R2 and R3 (if used) need not be
fully-qualified. e.g. R2 may point to "www.acorn.com/browser/". The fully
canonicalised version of the URL at block+0 refers to a fully-qualified,
canonicalised version of it, which in this example would be
"http://www.acorn.com/browser/".
During canonicalisation, the port number will be elided if possible. See
the discussion under SWI URL_ProtocolRegister for details of how URL
discovers whether this is possible or not.
[Note] The path will not start with a '/' unless the URL being parsed
explicitly specified one - this is in keeping with the URL specification, so
for example, given the URL "http://www.acorn.com/browser/", then the path
component is "browser/", and not "/browser/"; the slash between the hostname
and path is a separator only, not a part of either component.
The entry reason codes are described below.
URL_ParseURL_ReturnLengths (R1 = 0)
When R1 is 0 on entry to the SWI, the data block is treated as a block of
unsigned 32-bit integers. The contents of the block are ignored on entry, but
on exit are filled in with the lengths of the individual components of the
URL. A value of zero is stored for a field which does not exist; non-zero
values include space for a zero-byte terminator.
URL_ParseURL_ReturnData (R1 = 1)
When R1 is 1 on entry to the SWI, the data block is treated as a block of
pointers to buffers to receive the components of the URL. Each of the
pointers in the data block must be either zero, indicating that the caller is
not interested in that field, or point to a buffer which is sufficiently long
to receive the field. The client can ensure this by having previously used
reason code 0 to determine the length required.
URL_ParseURL_ComposeFromComponents (R1 = 2)
When R1 is 2 on entry to the SWI, the data block is treated as containing
the broken down fields of a URL. Each of the pointers in the data block
must be either zero or point to a buffer containing the value of the
component, with the exception of the full URL field, which is a pointer
to a buffer to receive the fully canonicalised URL. This buffer is
filled in on exit.
URL_ParseURL_QuickResolve (R1 = 3)
When R1 is 3 on entry to the SWI, R4 points to a buffer for receiving the
fully resolved URL. R5 is the length of the buffer. On exit, the buffer is
filled in with the fully resolved URL obtained, and R5 is decreased by the
length of the URL (including terminating zero byte). Hence R5 will be
negative on exit if the buffer wasn't large enough. There is no fixed rule
for calculating the minimum buffer length required for the answer. To
guarantee that the buffer is large enough, it should be calculated as:
length(base URL) + length(relative URL) + 4
If R0:1 is set on entry, there is the potential for up to the entire URL
to be hex encoded. In this case, you would need to multiply the above
by three. URL 0.37 and earler never hex encodes URLs. Note that URL 0.38,
0.39, 0.40 and 0.41 will *always* do this; the control through R0:1 was
introduced in v0.42. Clients not knowing about this bit (therefore leaving
R0:1 unset) will find that 0.42 or later do not automatically escape URLs,
this being more sensible default behaviour on the whole.
Characters which are already hex encoded in URLs are left alone in all
versions of the URL module.
Clients are strongly recommended to use this reason code if they wish to
resolve a relative URL or canonicalise a URL and are only interested in the
fully resolved and canonicalised form of the URL, since it is significantly
faster than using reason code 0 and then reason code 1. To help reduce the
chances of wildly over-allocating buffer space, setting of R0:1
SWI URL_EnumerateSchemes (&83E08)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Context (0 for first call).
On exit:
R0: Status flags (currently unused).
R1: Context for next call (-1 if finished).
R2: Pointer to read-only URL fetch scheme (if R1 is not -1).
R3: Pointer to read-only help string (if R1 is not -1).
R4: Protocol module SWI base (if R1 is not -1).
R5: Protocol module version (*100, if R1 is not -1).
All other registers preserved.
SWI is not re-entrant.
Interrupt status is undefined.
This call is used to discover which schemes are currently available to
the URL module. It may be used, for example, to determine whether or not
a client of the URL module may deal with a given URL (in combination with
SWI URL_ParseURL to extract the scheme) and if not, pass it to the Acorn
URI handler to see if anything else in the system can deal with it (see
the Acorn URI Handler Functional Specification, 1215,215/FS).
URL will not cope gracefully if the protocol module list is updated between
calls to this SWI (you may get duplicate modules or miss some out).
SWI URL_EnumerateProxies (&83E09)
On entry:
R0: Flags:
Bit 0 => If set, enumerate the no-proxy list.
Bits 31-1 reserved (0).
R1: Session identifier (or zero for global proxies/no-proxies).
R2: context (0 for first call).
On exit:
R0: Status flags (currently unused).
R1: Preserved.
R2: Context for next call (-1 if finished).
R3: If R0:0 clear: Pointer to read-only URL to proxy (if R2 is
not -1).
If R0:0 set: Pointer to read-only URL to not proxy
(if R2 is not -1).
R4: If R0:0 clear: Pointer to read-only proxy URL information
(if R2 is not -1).
If R0:0 set: Corrupted, contains no useful information.
All other registers preserved.
SWI is not re-entrant.
Interrupt status is undefined.
This call is used to discover which URLs proxies are set for on a per
session or global basis, or which URLs are not to be proxied. The
information pointed to by R3 and R4 where applicable is a copy of
that which was passed to SWI URL_SetProxy when the setting was made.
If R0:0 is set on entry, then R4 will be corrupted on exit and may not
contain a meaningful value.
URL will not cope gracefully if the proxy list is updated between calls
to this SWI (you may get duplicate entries or miss some out).
Protocol module to URL module interface
=======================================
This section defines the calls provided by the URL module to enable a fetcher
protocol module to interact with it.
SWI URL_ProtocolRegister (&83E20)
On entry:
R0: Flags:
Bit 0 => If set, R5 contains protocol flags word.
Bit 1 => If set, R6 contains the default port number.
Bits 31-2 reserved (0).
R1: Protocol module's SWI base.
R2: URL fetch scheme supported e.g. "http:" etc.
R3: Version number * 100 e.g. 116 => version 1.16
R4: Informational string. Up to 50 characters of
descriptive text, e.g. "Acorn HTTP fetcher".
R5: Protocol flags word, if R0:0 set. See below.
R6: Default port number, if R0:1 set. See below.
On exit:
R0: Flags:
Bits 31-0 reserved (0).
All other registers preserved.
SWI is not re-entrant.
Interrupt status is undefined.
This call is used by a protocol fetcher module to register its SWI base and
the type of URL that it accepts with the URL module. The SWIs that are
accessible from this SWI base are defined in the following section. If the
module cannot be registered (e.g. another module is already claiming that URL
base), then an error will be returned. R3 is an integer version number and R4
is a pointer to a string containing more information which will be displayed
by the *URLProtoShow command (or 0 if no descriptive text is provided).
Typically, it will be called during a protocol module's initialisation code
or on a callback set from the module's initialisation code. If the
protocol module is registered successfully, then URL will issue a service
call Service_URLProtocolModule_ProtocolModule to inform any interested
modules.
If R0:0 is set, then R5 contains a protocol flags word. This is used to
describe to URL how the resolver should treat URLs from this scheme. The
current bits defined are:
Bit Meaning when set
0 Path is *not* UNIX-like
1 No parsing should be performed on this scheme
2 Scheme allows "user@" to precede the hostname component
3 Hash (ASCII 35) allowed in hostname (e.g. for file: URLs)
4 No hostname component (e.g. mailto: URLs)
5 Remove *leading* ".." components in pathname.
Note that the meanings of set bits are such that zero is a reasonable value
to pass for unknown schemes. Note that if URL is requested to resolve URLs
using schemes unknown to it, it will assume a protocol flags word value of
zero. This may lead to inconsistent behaviour depending on whether the
protocol module is loaded or not.
If R0:1 is set, then R6 contains the default port number for this scheme.
This is used by the URL resolving code to determine if explicitly
specified port numbers can be elided from the URL. For example, when
constructing the canonicalised form of "http://www.acorn.com:80/", the port
bit is dropped as it serves no useful purpose, leaving
"http://www.acorn.com/".
The URL module is primed with knowledge of the following protocols:
mailto:, telnet:, finger:, file:, filer_opendir:, filer_run:,
local:, gopher:, ftp:, http:, https:, whois:
It is not necessary for modules implementing those protocols to set
either flag bit and hence no need for them to set R5 or R6.
SWI URL_ProtocolDeregister (&83E21)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: SWI base.
On exit:
R0: Flags:
Bits 31-0 reserved (0).
R1: Number of client sessions that were using this module.
All other registers preserved.
SWI is not re-entrant.
Interrupt status is undefined.
This call should be used by the protocol module to tell the URL module that
it is no longer available. The URL module will raise the appropriate
disconnect messages with its clients, and tell the protocol module the number
of clients that were affected.
Typically, it will be called during a protocol module's finalisation code.
If the protocol module is deregistered successfully, then URL will issue a
service call Service_URLProtocolModule_ProtocolModule to inform any
interested modules.
URL module to protocol module interface
=======================================
The protocol module SWI interface is only called by the URL module. URL
module clients should never call the ReadData/Status/GetData/Stop SWIs
directly. The protocol modules are required to supply a SWI interface. There
are currently 4 SWIs that need to be supported which run from SWI_base to
SWI_base+3. New SWIs common to all protocol modules will only be added at the
low-end of the SWI range. Protocol modules must generate standard SWI not
known error (error number &1E6) if they receive a call which they do not
understand, so that the URL module can determine that they do not support the
SWI. Note that there is no general requirement to use SWIs from offset 0
into a SWI chunk, although it makes sense to do this. Protocol modules which
support multiple protocols should ensure that they do not place their
internal "SWI bases" less than 16 SWIs apart to allow space to future
expansion. e.g. AcornHTTP registers http: as &83F80 and https: as &83F90.
Protocol specific SWIs should be added at the top-end of the SWI chunk (ie
start at SWI_base+63 and work down) - the AcornHTTP module uses that range to
provide clients with access to its HTTP cookie management code, for example.
NOTE: the Session identifiers used by the URL module to talk to the protocol
modules are NOT the same identifiers used by clients to talk to the URL
module. They are NOT interchangeable.
SWI Protocol_GetData (SWI_base+0)
On entry:
R0: Flags:
Bits 30-0 => as specified by client in URL_GetURL.
Bit 31 => R7 is valid.
R1: Session identifier.
R2: Method (See table earlier in document).
R3: URL (including fetch scheme).
R4: Pointer to block of data in addition to URL.
R5: Protocol dependent.
R6: Protocol dependent.
R7: If R0:31 is set, proxy URL information. See below.
On exit:
R0: Protocol status word (see SWI URL_Status for details).
All other registers are protocol dependent.
SWI re-entrancy is protocol module dependent.
Interrupt status is protocol module dependent.
This call is used to start retrieving data. The protocol module should raise
any events for the client via the session identifier provided in R1. The URL
module calls this SWI in response to one of its clients calling SWI
URL_GetURL.
The proxy URL information specified in R7 (if R0:31 is set) gives the
location of the proxy to be used in the format of a URL. For example,
"http://www-cache.demon.co.uk:8080/". This information is supplied by the URL
module and not the client. The protocol module must note that on a proxied
request, the target URL indicated by R3 may not have the same fetch scheme.
For example, it might be an ftp: URL being proxied through an HTTP proxy
service.
SWI Protocol_Status (SWI_base+1)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Session identifier.
On exit:
R0: Protocol status word (see SWI URL_Status for details).
R2: As URL_Status.
R3: As URL_Status.
R4: As URL_Status.
All other registers preserved.
SWI re-entrancy is protocol module dependent.
Interrupt status is protocol module dependent.
This SWI is used to monitor the transfer of data from the remote service. It
is protocol independent, with the exit status bits of R0 being common to all
fetcher services. R2 should contain the remote server's most recent response
code where possible; note that even in the case of, for example, an HTTP 400
(Forbidden) response, some explanatory data may be received, and thus R3 may
be non-zero. If the client is unknown to the protocol module then an error
should be returned. If the client's last request has finished, but the client
session has not yet been deregistered, then the protocol module should return
the status code as of the time that the request finished (ie bit 6 or 5 will
be set along with another combination if relevant).
The URL module calls this SWI in response to one of its clients calling SWI
URL_Status.
SWI Protocol_ReadData (SWI_base+2)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Session identifier.
R2: Address of client's data buffer.
R3: Size of client's data buffer.
On exit:
R0 Protocol status word (see SWI URL_Status for details).
R2: As URL_ReadData.
R3: As URL_ReadData.
R4: As URL_ReadData.
R5: As URL_ReadData.
All other registers preserved.
SWI re-entrancy is protocol module dependent.
Interrupt status is protocol module dependent.
This SWI is used to read the data pending from a request, find out how much
data has been read on this call and how much more there is remaining to be
read for the request. The register usage and description is the same as for
SWI URL_ReadData. The URL module calls this SWI in response to one of its
clients calling SWI URL_ReadData.
Protocol_Stop (SWI_base+3)
On entry:
R0: Flags:
Bits 31-0 reserved (0).
R1: Session identifier.
On exit:
R0: Protocol status word (see SWI URL_Status for details).
All other registers preserved.
SWI re-entrancy is protocol module dependent.
Interrupt status is protocol module dependent.
This call aborts a current request if there is one associated with the
session identifier. The URL module calls this SWI in response to one of its
clients calling SWI URL_Deregister or SWI URL_Stop.
URL Module Service Calls
========================
The URL fetcher system has been allocated a block of 256 service calls
(&83E00-&83EFF). Two are currently defined. The other 254 are reserved by
Acorn for future use.
Service_URLProtocolModule (&83E00)
This service call is issued by the URL protocol module to communicate
important events to the protocol modules.
On entry:
R0: Reason code - reason for the service call (see below).
R1: &83E00 (Service_URLProtocolModule).
All other registers are reason code dependent.
On exit:
All registers must be preserved, unless claiming the service call.
In all the currently defined cases, the service call must not be
claimed. Protocol modules must ignore reason codes which they do
not understand.
Defined Reason Codes:
URLModuleStarted
R0: 0 URL module has initialised.
R1: &83E00 Service_URLProtocolModule.
R2: version Version number of URL module * 100.
Upon receiving this service call, protocol modules should re-register with
the new URL module by issuing SWI URL_ProtocolRegister as usual. It must
assume that any previous registration is no longer valid.
This service call must not be claimed.
URLModuleDying
R0: 1 URL module is dying.
R1: &83E00 Service_URLProtocolModule.
R2: version Version number of URL module * 100.
Upon receiving this service call, protocol modules should note that the URL
module has gone away and not attempt to talk to it any more until a future
Service_URLProtocolModule/URLModuleStarted service call arrives.
This service call must not be claimed.
All other reason codes are reserved to Acorn and must not be used.
Service_URLProtocolModule_ProtocolModule (&83E01)
On entry:
R0: Reason code.
R1: &83E01 (Service_URLProtocolModule_ProtocolModule).
R2: URL fetch scheme (e.g. "http:", "ftp:").
R3: SWI base chunk of protocol module.
R4: Description of module as shown by *URLProtoShow.
On exit:
All registers must be preserved, unless claiming the service call.
In all the currently defined cases, the service call must not be
claimed. Protocol modules must ignore reason codes which they do
not understand.
Defined reason codes:
URLProtocolModuleStarted
R0: 0 Protocol module has just registered
URLProtocolModuleDying
R0: 1 Protocol module has just deregistered
All other reason codes are reserved.
URL module *-commands
=====================
The URL module provides a single *-command.
Syntax: *URLProtoShow
Parameters: None
Use: Display information on currently registered protocol modules.
Help text: *URLProtoShow shows all the current protocols known and their
SWI bases.
Example: *URLProtoShow
Base URL SwiBase Version Comment
=============================================================================
--- 0x83e00 038 URL © Acorn 1997-8 (Built: 07 May 1998)
gopher: 0x508c0 010 Gopher Fetcher © Acorn 1997-8 (Built: 17 Feb 1998)
ftp: 0x4bd00 028 FTP Fetcher © Acorn 1997-8 (Built: 19 Mar 1998)
file: 0x83f40 038 File Fetcher © Acorn 1997-8 (Built: 04 Jun 1998)
http: 0x83f80 082 Acorn HTTP © Acorn 1997-8 (Built: 07 May 1998)
Related SWIs: SWI URL_EnumerateSchemes
URL errors
==========
The URL module is allocated two ranges of error numbers, each range being
256 long. The first 32 errors are reserved to the URL module and the rest
are reserved to Acorn protocol modules.
Module Error range
URL &80DE00 - &80DE1F
HTTP &80DE20 - &80DE3F
MAILTO &80DE40 - &80DE5F
File &80DE60 - &80DE7F
FTP &80DE80 - &80DE9F
Gopher &80DEA0 - &80DEBF
WhoIs &80DEC0 - &80DEDF
Finger &80DEE0 - &80DEFF
WAIS &81EF00 - &81EF1F
HTTPS &81EF20 - &81EF3F
News &81EF40 - &81EF5F
Error numbers &81EF60-&81EFFF are reserved for Acorn use only.
URL Module Errors
Error Number Meaning
&80DE00 Session ID not found. A client passed an unknown session ID
in R1 to one of the URL module's SWIs.
&80DE01 URL ran out of memory
&80DE02 No matching fetcher for the URL could be found
&80DE03 SWI not found (URL Module). URL attempted to call a fetcher's
SWI and received a SWI not known error.
&80DE04 Session already has had an object fetch performed in it. You
cannot re-use this session.
&80DE05 No fetch in progress for this session ID. You have called
URL_ReadData or URL_Status having already terminated the fetch.
&80DE06 SWI Method already exists. URL already knows of a module which
provides this method for fetching - another cannot register.
&80DE07 No fetch in progress for this session ID. You have not called
URL_GetURL before URL_Stop,URL_ReadData or URL_Status.
&80DE08 Message not found in Messages file.
&80DE09 (No longer used)
&80DE0A Unable to parse URL.
Error numbers for protocol modules are not within the scope of this
specification.
Performance targets
===================
Final code size of the version described by this document should be about
25K. When fetches are active, more memory will be claimed from the RMA
to record details of the session. The amount claimed depends on the URL
being fetched plus the small overhead for the session information.
Temporary workspace is claimed from the RMA as required for URL resolution
equivalent to three times the total combined length of the base and relative
URLs involved.
Workspace is claimed from the RMA to store details of registered proxies.
All session-specific memory, including proxy information, is freed when the
session is terminated.
References
==========
The following RFC documents are of direct relevance to the URL module:
RFC 1738 - Uniform Resource Locators
RFC 1808 - Relative Uniform Resource Locators
RFC 2068 - HyperText Transfer Protocol specification version 1.1
Glossary
========
FTP
File Transfer Protocol - an application level protocol for the transfer of
files between a remote host computer and a local client, as defined by
RFC 959.
HTTP
HyperText Transfer Protocol - a protocol designed to transfer resources
("documents") from a remote server machine to a local client, as defined by
RFC 1945 (version 1.0) and RFC 2068 (version 1.1).
HTTPS
Secure HyperText Transfer Protocol - HTTP protocol over a communication
channel encrypted using SSL.
URL
Uniform Resource Locator, as defined by RFC 1738 - a subclass of URIs
(Uniform Resource Identifiers, defined in RFC 1630) which map onto network
access protocols. More commonly, the addresses of objects on the World Wide
Web.
NNTP
Network News Transfer Protocol, as defined by RFC 977.
Gopher
The Internet Gopher Protocol - a distributed document search and retrieval
protocol.
SSL
Secure Sockets Layer. A specification for encryption of communications
on networks.
WAIS
Wide Area Information Servers, as defined by RFC 1625.