Simple Offline USENET Packet Format (SOUP) Version 1.2 Copyright (c) 1992-1993 Rhys Weatherley rhys@cs.uq.oz.au Last Update: 14 August 1993 DISTRIBUTION Permission to use, copy, and distribute this material for any purpose and without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies, and that the name of Rhys Weatherley not be used in advertising or publicity pertaining to this material without specific, prior written permission. RHYS WEATHERLEY MAKES NO REPRESENTATIONS ABOUT THE ACCURACY OR SUITABILITY OF THIS MATERIAL FOR ANY PURPOSE. IT IS PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. NOTE: This document is NOT in the public domain. It is copyrighted. However, the free distribution of this document is unlimited. If you create a product which uses this packet format, it is suggested that you include an UNMODIFIED copy of this document to inform your users as to the packet format. All queries about this format, or requests for the latest version should be directed to Rhys Weatherley at the above e-mail address. INTRODUCTION For many years, the FidoNet community has been using QWK and other formats to enable users to download their mail and conferences to be read while off-line. This not only saves phone charges and prevents tying up BBS lines for long periods of time; it also allows a user to use much more powerful tools on their own machine to process the downloaded "packets" than what can be made available in an on-line environment. To date however, very little work has been done in the USENET and dial-in Unix community to facilitate the same user operations. Some attempts have been made to use QWK, but due to QWK's limitations and unsuitability for the USENET message formats, such efforts have not been very successful. Within USENET, the tendency seems to be either "dial-in to some other machine and put up with it", or "set up your own USENET site". The former keeps the user at the mercy of whatever user interfaces the admin of the other machine sees fit to install, and the latter requires far more computing knowledge than the average computer user is expected to have. Both of these can serve to lock out large portions of the computer-literate public from experiencing USENET. The latter option can also give rise to security problems in the form of forged USENET messages, which a more controlled dial-in system avoids. The purpose of this document is to define a new packet format which is aware of the conventions used in the USENET community, forming a middle ground between dial-in user interfaces and full USENET connectivity. It is not limited to downloading USENET news however. The same format could be used to enable a Unix user to package up their Unix mailbox and download it for later perusal. The format is extensible to other kinds of news or conference systems, so it is feasible, although not yet defined, that QWK or FidoNet messages could be accomodated within the same packet as USENET messages. REVISION HISTORY 1.2 Add COMMANDS and ERRORS files. Renamed to "Simple Offline USENET Packet Format". A few extra fields and type codes for the AREAS and LIST files. Message area summaries. 1.1 Add description of the LIST file. Everything else is identical to 1.0. 1.0 Original version of the document. Previously, this document was known as the "Helldiver Packet Format" (HDPF). A variant of HDPF, called the "Simple Local News Packet format" (SLNP) was created by Philippe Goujard (ppg@oasis.icl.co.uk). This document combines the features of both previous formats and the name was changed to make it less product-oriented. TERMINOLOGY Packet: a set of files, collected into a compressed archive. Message packet: the primary kind of packet which contains messages for the user to read. Reply packet: a special kind of packet which contains replies composed by the user, usually in response to the messages in a message packet. Packet generator: a program which generates packets to be downloaded and read, and which processes uploaded reply packets. Packet reader: a program which reads packets, usually by presenting the messages in a packet to the user, and which generates reply packets. Packet processor: either a packet generator or a packet reader. Generating host: the computer on which the packet generator executes. Reading host: the computer on which the packet reader executes. Download: the transfer of a packet from the generating host to the reading host. This transfer may take place in any fashion, although the most common method is through the use of a file transfer protocol such as Zmodem or Kermit. Upload: the transfer of a packet from the reading host to the generating host. Packet stream: a logical link between the generating and reading hosts over which downloads and uploads of packets take place. Message area: a collection of messages which are related by a common topic or purpose. Examples of message areas include USENET newsgroups, Unix mailboxes, and FidoNet conferences. Reply message area: a special kind of message area which contains replies being uploaded to a generating host. Text file: an ASCII file consisting of lines terminated by linefeed characters (LF, 10 decimal). Some operating systems terminate lines in a text file by CRLF pairs: such files must be converted to LF-terminated lines for transmission in a packet. ANATOMY OF A PACKET A packet is a group of files, collected into a compressed archive. The standard compression technique defined by this document is ZIP. Other techniques such as ARJ, ZOO, ARC, LZH, etc can also be used. It is also possible for Unix's tar.Z format to be used to transmit packets. The minimum requirement is a method to collect a group of files into a single packet, and a method to expand the packet back into the original files. ZIP is specified to provide a common compression format for packet processors. Each of the filenames in a packet should be stored in upper case on those systems where case matters (e.g. Unix). The following file specifications may appear in a packet: INFO Optional textual information. LIST List of message areas on the generating host. AREAS Index of the message areas within the packet. REPLIES Index of the reply message areas from the reading host. *.MSG Text of the messages in a particular message area. *.IDX Index information for messages in a message area. COMMANDS Extra commands sent along with a packet. ERRORS Errors that occurred during the execution of commands. Other filenames may also appear in the packet, but are not defined by this specification, so they should be avoided by generating software, and ignored by receiving software. The INFO file is an optional text file which may contain any kind of textual information from the generating system. Typically this file would only be present if there is some kind of urgent message that must be sent to the receiving user. Use of this file to store the name of the generating host and other such static information is possible, but discouraged to save space and transmission time. If such information is required, then the COMMANDS file can be used to transfer it. The LIST file is an optional text file which contains a list of all message areas that are available on the generating host, together with the format of the messages. It is specified further in the section "LIST FILE". The AREAS file is a text file which contains an index of the message areas present within the packet, specifying the name of the message area, the filename the messages may be found in, and the message format. This is specified further in the next section. The REPLIES file is a text file which contains an index of the message areas present within the packet that contain replies from the user which should be mailed or posted on the generating host. In most cases, a packet will contain either an AREAS file or a REPLIES file, but both may be present. See the section "REPLIES FILE" below for more information. The *.MSG files contain the text of the messages from a single message area. The actual format of this file depends on the type of message area specified in the AREAS file. See the section "MESSAGE FILES" below for more information. The *.IDX files provide an index into the *.MSG files, usually specifying where each message starts and the contents of some of the common message header fields. These files are intended for use by reading software on the recipient's system to quickly display an overview of the messages present in a message area. See the section "INDEX FILES" below for more information. The COMMANDS file is a text file which contains commands to be executed on the reading or generating hosts to change the behaviour of the hosts at each end of a packet stream. The ERRORS file contains textual error messages to report to a human at the host the packet is destined for. These two files are explained further in the section "SENDING COMMANDS BETWEEN SYSTEMS" below. AREAS FILE The AREAS file is a text file containing zero or more lines, each of which specifies a single message area, its encoding and the name of the message/index file pair in which the messages appear. In particular, each line has the following form: prefixarea nameencoding[description[number]] where "prefix" specifies the name of the message/index file pair, "area name" is the name of the message area, "encoding" specifies the formats of the message and index files and the type of message area, "description" is a descriptive name for the message area, and "number" is the number of messages in the message file. The last two fields are optional. Additional fields may be added in a future version of this specification. The message and index files corresponding to the message area have the names "prefix.MSG" and "prefix.IDX" respectively. If "prefix" contains alphabetic characters, they must be upper case. The message area name may be any sequence of printable ASCII characters (space through tilde). Under USENET, this is typically a dotted name like "comp.lang.c". Other networks may include spaces or other unusual characters in the area names, so the receiving software must be aware of this fact, and act accordingly. Also, receiving software must deal gracefully with characters that have the high bit set, or names that contain control characters, since people in other countries that speak a language other than English may wish to use their country's native encoding for the message area name. The only hard rule is that the name may not contain TAB, CR or LF. Receiving software should treat the name as an indivisible string to be displayed to the user. The encoding field consists of two or three ASCII characters (usually alphabetic). The first specifies the format of the message file, the second specifies the format of the index file, and the optional third specifies the kind of area (private or public). The following message file formats are currently defined (case is significant): u USENET news articles m Unix mailbox articles M Mailbox articles in the MMDF format b Binary 8-bit clean mail format B Binary 8-bit clean news format i Index file only The individual message file encodings are explained further in the next section. The format 'i' indicates that no message file is present, and the index file should be used as a summary of the messages in the message area. This is explained further in the section "MESSAGE AREA SUMMARIES". The following index file formats are currently defined (again, case is significant): n No index file c C-news overview database format C Shorter C-news overview database format i Offset/length pairs delineating the messages These types are explained further in the section "INDEX FILES" below. See the section "MINIMAL CONFORMANCE" for information on the minimal number of message and index formats that should be supported by packet generators and packet readers. The following kind of message areas are currently defined (again, case is significant): m The message area contains private mail n The message area contains public messages, or news u The message area kind is unknown (the default) This third letter is optional. If it is not present or unknown, the kind of area depends on the message file type. Message types 'm', 'M', and 'b' default to kind 'm', and message types 'u', 'B' and 'i' default to kind 'n'. It is not recommended that the value 'u' for this third letter be used, although future versions of this specification may add additional letters, necessitating 'u' to be placed in the third letter if the kind is unknown. If the message area kind can be solely determined from the message file type, it is recommended that the third letter be omitted to save space and transmission time. Further types may be defined in future versions of this specification. If the packet processor does not recognise a message file type, it should ignore the corresponding message and index files. If the packet processor does not recognise a index file type, it can either ignore the message file, or attempt to break down the message file into separate messages by some other means. If the packet processor does not recognise a message area kind, the kind should be treated as unknown. The user should be warned if a message area has been ignored. The optional message area description in the AREAS file consists of any sequence of printable ASCII characters. This may be used to insert a "readable" name for the message area. It may not contain TAB, CR or LF. A message area may appear more than once in the AREAS file, each time with a different prefix, but this is discouraged. This could be used to split large message areas across more than one message file, but this is more conveniently handled by generating a separate packet containing the area contination. The following examples demonstrate the capabilities of the AREAS file: 0000000 Email mn 0000001 comp.lang.c uc C Programming Language Discussions 125 0000002 news.future Bc Future of USENET 38 EMAIL /usr/spool/mail/fred unm Private e-mail for fred U000001 comp.bbs.misc MCn U000002 comp.bbs.waffle ui MESSAGE FILES The format of the message file depends on the message file format specified in the AREAS file. This version of the specification defines three formats, which are in common use in the USENET and Unix community, and two additional binary formats which permit messages to be stored with no modification or assumptions about line lengths and byte values. For each of these formats, lines are terminated with LF characters. Any CR characters in the messages should be considered as data characters, or ignored on receipt. In particular, MS-DOS systems should strip CR characters from text messages before writing them to a packet. A 'u' (USENET) message file is a text file consisting of one or more messages prefixed with an rnews header. This header has the form "#! rnews n" where "n" is the number of bytes in the message that follows the header, excluding the line-feed character which terminates the header. If the number in the header is followed by white space and other characters, these other characters should be ignored, until the terminating LF character is encountered. A note about the rnews header: although a terser separator could be used, the rnews header has the following advantages: (a) the messages can be extracted in the absense of index files, or where the index files have an unknown type, and (b) the message files can be imported into a USENET system as standard rnews batches. Thus, if the user wishes to set up a real USENET site, or simply use dedicated USENET software to read packets, they can use their existing packet provider as a convenient read-only newsfeed, with no extra burden placed on the system administrator of the generating system. A 'm' (Unix mailbox) message file is a text file consisting of one or more messages. The first line of each message must start with the character sequence "From ". Any remaining lines in the message which start with "From " should have the character '>' prepended. Thus the "From " lines delimit the message file into separate messages. A 'M' (MMDF mailbox) message file is a sequence of one or more messages, separated by at least 4 Control-A characters. The message file may optionally start and end with a sequence of such characters. If a sequence of 4 or more Control-A characters occurs in a message, it should be "adjusted" by the insertion of spaces to split the sequence. The use of Control-A characters within a message is discouraged. The 'm' and 'M' formats were chosen for mail because of their common occurrence in the Unix community. The generating system may elect to instead convert a mailbox into the USENET format if it wishes, and set the area kind to 'm' to inform the packet reader that the message area contains private e-mail rather than news. The 'b' (binary mail) and 'B' (binary news) formats are identical. The contents of each message must conform to RFC-822/1036 and may contain content information compatible with RFC-1341 (MIME). The only difference between the messages of these formats and the preceding formats is that no assumption is made about line lengths, and any of the 256 values for a byte may be used in any position. Each message is preceded by a 4-byte value which indicates the length of the message in bytes, stored in big-endian order (i.e. high byte first, low byte last). The difference between 'b' and 'B' is a semantic one: message files of type 'b' are expected to contain mail messages, and message files of type 'B' are expected to contain news messages. Thus, reader software can make a distinction between the two if it desires. For most practical purposes, 'u', 'm' and 'M' should be sufficient. The binary 'b' and 'B' types should be used for articles that contain 8-bit binary data. It is possible to use type 'u' for binary data as well, but 'm' and 'M' cannot be because the message contents may be modified. When MIME becomes more wide-spread, it is expected that binary messages containing programs, sound, pictures and video will become popular, necessitating these binary types. Note that MIME messages can be stored in 'u', 'm' and 'M' message files, but any binary components should be encoded with quoted-printable or base64 (which is expected to be the most common usage of MIME in the near future). It is not required that 'b' or 'B' be used for MIME messages: only those containing raw unencoded binary data (as indicated by the Content-transfer-encoding header value "binary"). INDEX FILES This specification defines four index file types, which provide varying degrees of support for packet readers. Type 'n' indicates that no index file is present, and it is up to the packet reader to extract messages from the message file. It is useful where the generating system is providing a USENET newsfeed using packets, and the receiving system is not interested in the index information. A type 'c' index file is a text file (LF terminated lines), with one line per message that occurs in the message file. The lines in the index file should be in the same order as the corresponding messages. Each line has the following form: offsetsubjectauthordatemesgid refsbyteslines[selector] [Note: the line-wrapping here is for document-formating purposes only. No line-wrapping occurs in the index files]. The fields have the following semantics: offset Seek position in the message file of where the corresponding message starts. The first seek position is 0. For the 'u' format, this indicates the start of the line following the rnews header line. For the 'm' format, this indicates the start of the "From " line and for the 'M' format, this indicates the start of the article after the Control-A sequence. For the 'b' and 'B' formats, this indicates the first byte of the message after the 4-byte message length. subject The "Subject:" line from the message. author The "From:" line from the message. date The "Date:" line from the message. mesgid The "Message-Id:" line from the message. refs The "References:" line from the message. bytes The number of bytes in the message. If this field is zero, then it indicates that there is no corresponding message in the message file. This is used for summaries: see the section "MESSAGE AREA SUMMARIES" for more details. lines The "Lines:" line from the message. Note that this field is pretty useless these days on USENET, but is still popular. It is meant to indicate the number of lines in the body of the message. Generating software may elect to re-generate this value if it is not present in the original message, but this is not required. selector A string used for summaries to request that a message be sent in a future packet. See the section "MESSAGE AREA SUMMARIES" for more details. This string will usually be a number, but other values such as Message-ID's could be used. Packet readers should treat this string as an indivisible string to be sent in a "sendme" command in the COMMANDS file. A zero-length string indicates that there is no selector string. If any of these fields contained TAB's, newlines or other white space in the original articles, they should be converted into single spaces. All fields must be present, but some may be empty. The "bytes" field must not be empty, since it provides necessary information for packet readers. Each field must conform to the Internet RFC documents RFC-822 or RFC-1036. Optionally, a header line may end with one or more extra TAB-separated fields for other RFC-compliant header fields, together with the header field names. e.g. "Supersedes: <1234@foovax>". These fields are not defined by this version of the specification, and are by arrangement between the generating host and the reading host only. This format is compatible with the news overview (NOV) database format of C-news. The only difference being the substitution of an offset for the article number used by C-news, and the addition of the "selector" field. The C-news format was designed to assist threading newsreaders, so this packet format should provide similar assistance to threading packet readers. The 'C' format is similar to 'c', except that the "mesgid" and "refs" fields are dropped. These fields can commonly be quite long and are mainly of use to packet readers which perform Message-ID based message threading. Packet readers which perform subject threading (i.e. sort on the subject line and then on the date and/or arrival order) do not require such information. The format of the header lines in this case is as follows: offsetsubjectauthordatebyteslines[selector] Further TAB-separated fields may be added in future versions of this specification. The "author" field is slightly different to the 'c' format. Instead of an RFC-822 format address, it is just the author's name, extracted from the "From:" line of the message. Most RFC-822 and RFC-1036 "From:" lines have one of the following forms: address address (name) name
Names may sometimes be surrounded by double-quote characters, have embedded "(...)" sequences, or contain "useless" information after a comma (",") or slash ("/"). The main requirement is that the generating software produce some kind of (more or less) meaningful string for the name of the author which can be displayed to the user by a packet reader. See RFC-822 and RFC-1036 for more information on the syntax of the "From:" line in messages. The 'i' index format is purely binary, using 8 bytes for each message in the corresponding message file. The first 4 bytes specify the offset into the message file of the message and the remaining 4 bytes specify the number of bytes in the message. Each 4-byte quantity is stored in big-endian order (high byte first). This format is supplied to provide a trade-off between transmission time and easy extraction of messages from a message file. REPLIES FILE One of the requirements for an off-line reading system is a mechanism for a user to upload replies or new messages to a generating system for mailing or posting. While it is possible to re-use the AREAS file for this purpose, keeping the download and upload sections separate will help prevent messages being fed back into a network erroneously. The REPLIES file has a similar format to the AREAS file. Each line has the following form: prefixreply kindencoding The "prefix" and "encoding" fields are as before. The "reply kind" field indicates the mechanism to use when transmitting the messages in the message file. The following values are currently defined: mail Transmit an RFC-822 compliant personal mail message news Transmit an RFC-1036 compliant USENET news posting On a Unix system, transmission of mail and news is usually performed with the "sendmail" and "inews" programs respectively. Additional kinds may be specified in a future version of this specification for other message formats. Note: it is discouraged that the kinds "mail" and "news" be used for anything other than RFC-compliant messages. In particular, FidoNet or QWK messages should use a different reply kind. Messages of the same reply kind can be placed in the same message file, or in separate message files. Further TAB-separated fields may be added to the lines in the REPLIES file in a future version of this specification. It is recommended that a message file type of 'b' or 'B' be used for sending replies to minimise the chance of message corruption. The recommended index file types for replies are 'i' and 'n'. The index types 'c' and 'C' are discouraged because they do not provide useful information for reply purposes. The format of the messages in the message files should follow the relevant RFC standards, with the following restriction: any "From:", "Sender:", "Control:", "Approved:" or other similar "dangerous" header lines should be ignored by the system transmitting the replies to prevent forgeries from occuring. In particular, the "From:" header should be determined from the user's login name, or some other similar means, rather than from any data supplied in the user's message. In most cases, mail messages will contain "To:", "Subject:", "Cc:", "Bcc:" and "Reply-To:" header lines, and news messages will contain "Newsgroups:", "Subject:", "Followup-To:", "Keywords:", "Summary:" and "Reply-To:" header lines. Other optional headers (especially MIME content headers) may also be present. The automatic addition of a signature by the generating host which receives the reply packet is discouraged. Signatures should be added by the user's packet reading software instead, if desired. A method for allowing replies from more than one person to be stored in the same packet was considered, but was rejected for security reasons. The following example demonstrates the capabilities of the REPLIES file: R001 mail bn R002 mail bi R003 news Bn R004 news Bi LIST FILE The LIST file may be used to send a list of available message areas to the receiving system. Its format is similar to the AREAS file, with the prefix field deleted. Each line has the following form: area nameencoding[description] where "area name" is the name of the message area, "encoding" is a 2, 3 or 4 letter message, index, area kind, and subscription code, and "description" is an optional message area description. Further optional fields may be added in a future version of this specification. The message, index, and area kind codes are the same as for the AREAS file. The subscription code has one of the following values: y The user is subscribed to the message area n The user is not subscribed to the message area If this field is not present, it defaults to 'n'. Note that the message areas in the LIST file should only be those that can be subscribed to or unsubscribed from using a request in the COMMANDS file. Private e-mail message areas will normally not appear in the list. The following example demonstrates the capabilities of the LIST file: alt.flame ucnn comp.bbs.misc ucny comp.bbs.waffle ucny comp.lang.c ucnn C Programming Language Discussions news.future ucny Future of USENET SENDING COMMANDS BETWEEN SYSTEMS The COMMANDS and ERRORS files contain information for changing the behaviour of each end of a packet stream, or for reporting errors in the execution of commands or the generation of packets. Each is a text file with LF-terminated lines. The ERRORS file is the simplest: it consists of error messages from the program which generated the packet to report on the progress of previously executed commands. The format of these error messages is not defined, but they should be human readable so that packet readers may present the errors to the user for perusal. The COMMANDS file consists of a sequence of commands, one per line, which modify the behaviour of the packet processor at the other end of the packet stream. Usually these commands are sent from the packet reader to the packet generator to change the subscribed message areas, send files, etc. The names of the commands are NOT case significant, but SHOULD be sent in lower case. Any commands that are not understood by a program should be ignored. version n.m The command specifies the version of this specification that the packet conforms to. For this document the version is "1.2". date dd mmm ccyy hh:mm:ss [zone] The date and time when the packet was created. To prevent confusion with different country's date formats, the date MUST always appear as "dd mmm ccyy". For example, "25 Jul 1993". This date format can be converted to local conventions if desired. "hh:mm:ss" is a 24-hour clock time value. The "zone" field is the number of hours and minutes that the timezone is offset from Greenwich Mean Time as "+HHMM" or "-HHMM". For example, US Eastern Standard Time (EST) is "-0500", and Australian Eastern Standard Time is "+1000". If the zone is omitted, it defaults to "local time", however the zone should only be omitted if there is no way to determine it. subscribe name This command requests the packet generating program to subscribe to a new message area. The area name may contain spaces, but not TABs. Additional fields may be added in a future version of this specification after a separating TAB. For now, ignore anything after a TAB. This command may generate an error message if the message area does not exist, or cannot be subscribed to. unsubscribe name This command requests the packet generating program to unsubscribe from a message area. The same remarks about TABs and errors above also apply to this command. catchup [name] This command requests the packet generating program to catchup on the nominated message area. That is, to mark all messages in the area as read and continue batching from the next message received. If the area name is not present, the packet generating program should catchup on all message areas. list [always|never] This command requests the packet generating program to send a full list of all available message areas as a LIST file in the next packet. If the argument "always" is present, then the LIST file should be sent in every packet. The argument value "never" reverses this. For minimal compliance, "list always" should be treated as "list", and "list never" should be ignored. hostname string This command specifies the name of the host or BBS the packet was generated on. It serves an informational role only. The string can be any sequence of printable ASCII characters. software string This command specifies the name and version of the software which generated the packet. It servers an informational role only. The string can be any sequence of printable ASCII characters. sendmeareaselector[selector[...]] This command requests that the packet generator send a number of messages from the nominated message area. The "selector" arguments are taken from the "selector" fields in a 'c' or 'C' index file. Multiple "sendme" commands for the same message area may be present in a COMMANDS file. The maximum length for this command is 500 characters. Note that other commands use spaces to separate arguments, but this command uses TAB's. mail y mail n This command changes whether or not private e-mail should be sent in generated packets. deletemail y deletemail n This command changes whether or not the user's private mailbox should be deleted after being batched into a packet. mailindex x Set the preferred mail index format, where 'x' is one of the values 'n', 'c', 'C' or 'i'. newsindex x Set the preferred news index format, where 'x' is one of the values 'n', 'c', 'C' or 'i'. get filename [putname] Request that a file on the generating side be placed into a packet and sent to the packet reader. "putname" specifies the "filename" argument for the corresponding "put" command. If "putname" is not specified, the default is to use the base name of "filename". If directory paths are specified, the separator must be '/'. It should be noted that security could be breached through the use of this command, so programs which support this command should be very careful, preferably restricting requests to a particular directory tree. put pktname filename This command is usually sent in response to a "get" command, although it can be sent on its own. "pktname" specifies the name of the file in the packet which contains the requested file's contents. The "filename" argument specifies destination file to write the contents to. Note that security could be breached with this command, so the destination filename should be checked, or restricted to a particular directory tree. It is also recommended that the user be prompted for confirmation before writing the file. If directory paths are specified in "filename", the separator must be '/'. It is recommended that the extension "FIL" be used for files in a packet which contain data sent with this command. For example, "put 001.FIL abc.zip" supported cmd ... This command is usually sent from a packet generator to inform a packet reader as to which commands are supported by the generating program. The argument is a space-separated list of command names. For example, "supported subscribe unsubscribe list", or "supported subscribe unsubscribe catchup list mail deletemail". It is recommended that at least "subscribe", "unsubscribe" and "list" (with no arguments) be supported. Packet generators are recommended to add a "supported" line to all packets generated to inform the packet reader which commands can be used. In the absence of a "supported" line, only "subscribe", "unsubscribe" and "list" should be assumed to be supported. If more than one command is received for the same item (e.g. "subscribe", "unsubscribe", "list", "mail", ...), then the last command in the COMMANDS file takes precedence over any previous commands. The following example demonstrates a typical COMMANDS file sent from a packet generator: version 1.2 date 25 Jul 1993 12:34:38 +1000 hostname frobozz.domain.com software Fubar 1.3 supported subscribe unsubscribe catchup list sendme get put 001.FIL abc.zip put 002.FIL def.txt The following example demonstrates a typical COMMANDS file sent from a packet reader: subscribe comp.lang.c subscribe comp.lang.misc unsubscribe alt.swedish.chef.bork.bork.bork list get xyzzy.zip get /usr/local/lib/fubar.txt frobozz.txt MESSAGE AREA SUMMARIES The preceding sections have described a number of features for supporting message area summaries. This section provides greater detail. Since some message areas, notably USENET newsgroups, can get quite large, the user may want to download a summary of a message area instead of all of the messages, and then request that messages of interest be sent at some later time for reading. Usually the summary will list the messages' subjects, authors, and other similar "header information". Optionally, the user may request that the first few lines of the messages also be sent so that the user may peruse the beginning of the message and decide whether to retrieve the rest of the message. This activity is supported in the following fashion in this packet format: summary information is sent in an index file of type 'c' or 'C', usually with no accompanying message file. Therefore, the message file format in the AREAS file will be set to 'i'. Each line in the index file has its "bytes" field set to 0 to indicate that the message is not present in the message file, and the "selector" field is set to some string that can be used to request the message by way of a "sendme" command. Usually this selection string will be the message number of the message on the generating host, but other values such as Message-ID's are allowable. If the first few lines of each message are also desired, the message file format is set to something other than 'i', and the "offset" and "bytes" fields in the index file may be used to extract the trimmed-down messages for perusal. The "selector" field is once again used to request that an entire message be sent at some later time, by way of a "sendme" command. It is possible to create a message area which contains both ordinary messages and summary messages. If the "selector" field is not present, or is zero-length, then the message should be processed in the usual way, and if the "selector" field is present and not zero-length, then it is a summary message and the "bytes" field can be used to determine if the first few lines of a message exist in the message file or not. This mixture can be useful in some situations where the user wishes to download all messages less than a certain length, and download the larger messages as summaries, so that the larger messages can be explicitly requested only if the user really wants them. MINIMAL CONFORMANCE This section describes the minimal amount of work that a packet processor must do to be compliant with this specification. Packet generators should be able to generate message areas for the 'b' and 'u' message formats for private and public message areas respectively, and process replies for the 'b' and 'B' message formats. For minimal conformance, index format 'n' must be supported, and if message area summaries are required, one of index formats 'c' or 'C' should be supported. It is recommended that either 'c' or 'C' be supported in all packet generators, even when message summaries are not required. If message summaries are supported, the minimal requirement is to send an index file with the message file format set to 'i'. Packet generators should support the "subscribe", "unsubscribe" and "list" commands, and also the "sendme" command if message area summaries are required. Packet readers should be able to read all message and index formats, and generate replies for the 'b' and 'B' message formats. If message area summaries are not supported, all areas with message format 'i' should be flagged to the user as not understood. Packet readers should also be able to display the INFO and LIST files if they are present in a packet and be able to prompt the user for "subscribe" and "unsubscribe" requests to be sent to the packet generator. FUTURE ENHANCEMENTS The obvious enhancement that can be made is to support other message formats, especially FidoNet formats. Currently the message area file code 'q' is reserved for QWK-format messages. This will be defined in a future version of this specification if demand warrants. Experimentation with other formats and auxillary files is encouraged, but please contact the author first to prevent double-ups from occurring. The author may be contacted via e-mail at rhys@cs.uq.oz.au.