Internet Engineering Task Force                               P. Cordell
Internet Draft                                        Tech-Know-Ware Ltd
draft-cordell-lumas-05.txt
February 1, 2007                                 
Expires: August 1, 2007                              


                                 Lumas - 
                      Language for Universal Message
                       Abstraction and Specification 

STATUS OF THIS MEMO

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August 1, 2007.

Copyright Notice

   Copyright (C) The IETF Trust (2007).



Abstract

   A number of methods and tools are available for defining the format
   of messages used for application protocols.  However, many of these
   methods and tools have been designed for purposes other than message
   definition, and have been adopted on the basis that they are


Cordell                Expires August 1, 2007              [Page 1]
Internet Draft                 Lumas                     February 2007


   available rather than being ideally suited to the task.  This often
   means that the methods make it difficult to get definitions correct,
   or result in unnecessary complexity and verbosity both in the
   definition and on the wire.

   Lumas - Language for Universal Message Abstraction and Specification
   - has been custom designed for the purpose of message definition.  It
   is thus easy to specify messages in a compact, extensible format that
   is readily machine manipulated to produce a compact encoding on the
   wire.

Table of Contents

   1. Introduction
   2. About Lumas
   3. Lumas and Other Message Definition Languages
   4. Terminology
   5. Example Lumas Message Definition and Message Encoding
   5.1      Principles of the Message Definition
   5.2      An Example Message Definition
   6. Formal Message Definition Syntax
   6.1      Lumas Keywords
   6.2      Lumas Parameters
   6.3      Simple Parameters
   6.4      The Simple Types
   6.5      Simple Type Definition
   6.6      The Pattern Constraint
   6.7      The Name
   6.8      Cardinality
   6.9      Tagging
   6.10     The Plugin Extension Mechanism
   6.11     Reference Parameters
   6.12     Compound Parameters
   6.13     Struct Parameters
   6.14     Union Parameters
   6.15     Combined Parameters
   6.16     Referenced Parameters
   6.17     External Extensions - Plug and Pluggable
   6.18     Module Definition and Directives
   6.19     The Top Level Definition
   6.20     Locating Lumas within a Specification
   7. On-the-Wire Representation
   7.1      Principles of the default On-the-Wire Encoding
   7.2      Formal On-the-Wire Representation
   7.3      Marking Message Boundaries
   7.4      Examples of Encoded Types
   8. Common ABNF Definitions
   9. Notes on Comments
   10.      Locating Lumas Modules
   11.      Mandatory to Understand
   12.      Security Considerations
   13.      Normative References


Cordell                Expires August 1, 2007              [Page 2]
Internet Draft                 Lumas                     February 2007


   14.      Informative References
   15.      Author's Address
   

1.    Introduction

   Lumas is a lightweight, message definition language that is both
   flexible and highly extensible.  This document defines the Lumas
   message definition language, and the default text encoding method for
   messages defined in this way.

2.    About Lumas

   Lumas - Language for Universal Message Abstraction and Specification
   - is a simple message definition language that can be used to define
   the messages used by protocols.  In this context, a message is
   defined as a collection of data used to convey information between
   two or more machines (or processes).  Typically Lumas is used to
   define application layer messages (e.g. at the layer at which the
   likes of SMTP [SMTP] is defined), but there is no practical reason
   why Lumas should not be used at other layers.

   The design objectives of Lumas are simplicity, ease of use,
   efficiency, and extensibility.  

   Lumas provides a high-level method for defining messages and a
   default set of encoding rules for character based protocols.  The
   encoding rules describe how instances of messages that conform to the
   defined high-level definition are represented on the wire.  It is
   also possible to define alternative encoding rules that could be used
   to define representations of messages in binary form, or other
   character based forms; e.g. XML [XML] or JSON [JSON].  In general
   Lumas is not able to describe messages with arbitrary sequences of
   characters and bytes, any more than a C compiler is able to specify
   arbitrary sequences of assembler instructions.

   Lumas recognises that message definition is a small part of the
   overall development process and thus should not warrant a
   disproportionately large investment in learning the language.  Lumas
   uses the 80/20 principle to keep it simple.  Lumas is designed to
   readily allow the use of Lumas aware software tools to aid in the
   development process.  Lumas messages are text-encoded by default so
   that they are easy to read, and it is easy to create test messages
   for debugging.  Using Lumas in applications is designed to be simple
   and efficient.  Lumas addresses a number of different types of
   extensibility, including versioning, external extensions, and
   component based architectures.  

   This makes Lumas an ideal definition language to use where
   simplicity, efficiency, compactness and/or a high degree of
   extensibility is required, especially where the extensibility
   involves plugging external modules into the base syntax.



Cordell                Expires August 1, 2007              [Page 3]
Internet Draft                 Lumas                     February 2007


3.    Lumas and Other Message Definition Languages

   Over the years a number of message definition methods have been
   developed.  These include XDR [XDR], ASN.1 [ASN1], various flavours
   of IDL (such as OMG IDL [OMGIDL]), 'bit pictures,' various flavours
   of BNF (e.g. ABNF [ABNF]), and XML [XML].  It is therefore worthwhile
   considering how Lumas relates to these other message definition
   languages.  

   Lumas differs from XDR in that Lumas is primarily a language for
   defining text-encoded messages.  XDR is fixed to defining binary
   messages of very specific types.  

   ASN.1 is also primarily a language for defining binary messages,
   although recently there have been XML encoding rules defined.  ASN.1
   information object classes are difficult to understand and a
   deterrent to its use.  The complexity of some of the encoding rules,
   such as BER and PER, make the method difficult to use without using
   special tools.  ASN.1 has found uses in the IETF, notably in the
   areas of cryptography (CMS [CMS] etc) and SNMP [SNMP].  However, it
   is not much loved, and efforts such as SMING have been undertaken to
   replace its usage (although at the time of writing this effort seems
   to have stalled).

   The IDL languages such as OMG IDL have similarities with message
   definition languages, but are subtly different.  IDLs define a
   collection of objects, each of which describes a remote procedure
   call.  They also define a return value for the procedure call.  A
   protocol message set is typically a single object that can have a
   number of variants.  A protocol will typically send another message
   is response to a message rather than sending a return value.

   Perhaps for the reasons mentioned, the above methods have not
   received wide usage within the IETF.  The main workhorses for message
   definition in the IETF have been 'bit pictures,' various types of BNF
   and more recently XML.

   The term 'bit pictures' is used to refer to the pictures of bits and
   bytes that is used to capture the layout of parameters within a
   message, such as used to define IP [IP], UDP [UDP] and TCP [TCP].
   This is very low-level and really only suitable for protocols
   containing a few parameters which ideally have fixed positions.  

   At a level higher than pure 'bit pictures' is the scheme used in TLS
   [TLS], but this again is specific to defining binary messages.
   Diameter [DIAMETER] presents another variation on this approach.

   A number of types of BNF have been defined over the years, most
   recently ABNF.  Until recently, the BNFs have been the main workhorse
   of IETF application level protocol definition.  ABNF is very
   low-level, and is much like programming in assembler when high-level
   languages would be more useful.  It is very difficult to get


Cordell                Expires August 1, 2007              [Page 4]
Internet Draft                 Lumas                     February 2007


   definitions correct, and issues such as ensuring extensibility have
   to be addressed not only for each message definition, but also for
   each parameter within the definition.  The implementation route from
   ABNF can also be long as there is typically not enough high level
   information in the specification for tools to extract the important
   elements.

   This leaves XML.  XML is a comprehensive and powerful way of defining
   messages.  It would be a long and unproductive exercise to list all
   the things that XML gets right.  Instead, the focus here is on the
   areas that a developer may wish to consider when choosing between
   Lumas and XML.

   The main differences between Lumas and XML are in the areas of
   simplicity and efficiency.  Whether these differences are significant
   will depend on the application.

   There are two parts of the XML route: XML itself, and the method used
   to define the XML messages.

   Some of the less significant issues to consider are to do with XML
   itself.  For example, it has long been recognised that the format of
   XML messages, with its start and end tags, is inefficient.  (It is
   the author's belief that the extra tagging also makes the messages
   harder to read, because the message is dominated by tags rather than
   the important part, which is the values.  Hence, what works well when
   there is a high ratio of PCDATA to tags, is detrimental when that
   ratio is significantly reduced.)  The separation of parameters into
   attributes and elements adds complexity, but adds no real value in a
   protocol, and is an artefact of markup use.  The provision for
   multiple character encodings (such as UTF-8, UTF-16BE, UTF-16LE,
   ISO-8859-1 etc) places demands on a parser as does the implementation
   of namespaces (where in a start tag the namespace is defined after
   the first use of the namespace), which requires double parsing or
   significant intermediate storage.  The task of converting a namespace
   prefix to a namespace is potentially an area involving significant
   lookup effort.  Once expanded, the effective tag is a long sequence
   of characters on which comparison operations are performed, the size
   of which potentially reduces efficiency.  User definable general
   entities and parameter entities are additional burdens that have
   little value for message definition, as is the white space handling
   which is a hang over from XML as a markup language.  While these are
   surmountable problems, the consideration for a developer has to be
   'why pay for it if I don't need it?'

   The second issue is how to define the XML messages.  Arguably the
   current favourite is W3C XML Schema, although there are other methods
   including RELAX NG [RELAX] and Schematron [STRON].  First of all, it
   has to be admitted that this is currently a controversial area and
   the existence of the latter two is largely due to concerns about the
   former.  The main concern with XML Schema is again complexity.  Maybe
   in the future one of the other methods will prevail.  



Cordell                Expires August 1, 2007              [Page 5]
Internet Draft                 Lumas                     February 2007


   Keeping with XML Schema for now, firstly the language can be very
   difficult to learn.  The specification is some 350 pages long
   (ignoring XML itself, and XML namespaces etc), and uses a formal
   language that is very confusing to interpret.  In a number of areas
   there is even debate among the experts about what is intended.  The
   constructs can be confusing and apparently contradictory in a number
   of areas, such as the notion of complexType with simpleContent and so
   on.  While XML Schema is touted as being extensible, in practice for
   the unwary, there are a number of traps to fall into.  For example,
   incorporated attribute and element groups, especially those from
   different schemas can easily result in name clashes when they are
   extended independently.  Enumerated strings can not be extended
   without careful consideration.  Indeed, the Unique Particle
   Attribution Constraint makes defining an extensible schema messy and
   not something that happens by accident [XMLVER].  There is no support
   for capturing what has changed from one version of a schema to the
   next, other than doing a diff operation on two files.  This again
   makes it difficult for tools.  Other features also make it difficult
   for tools, such as the ability to use patterns to restrict the format
   of basic types such as floating point numbers. XML Schema has no
   concise way of specifying short tag names while at the same time
   specifying descriptive formal names.  For example, the most common
   XML like syntax, HTML [HTML], has an abundance of short tags such as
   <a>, <p>, <b> etc.  This makes it easy for the expert to type, and it
   must be assumed that the approach has some merit otherwise it
   wouldn't have been done that way.  But XML Schema does not readily
   support this.  Verbosity is even more of an issue when it comes to
   XML Schema, in a number of cases requiring five of more lines of text
   when only one would do.  This means extra scrolling or page turning
   when editing and viewing, which makes a schema harder to write,
   harder to check, and harder for a third-party to understand.   

   Many of these problems are subjective.  Some can also be avoided by
   defining style guides and best practices for using XML Schema (for
   example [XMLBCP]).  Compression can be used to reduce the size of
   messages.  However, this really just addresses the complexity by
   adding more complexity.  Not only does this make it harder to learn,
   it is important to remember that where there is complexity, there is
   the potential for bugs.  And bugs not only affect the integrity of
   the code, but can affect the security of the system on which the code
   runs also.  Complexity is also a barrier to implementation.  It could
   be argued that the Internet has been successful because of its use of
   simple protocols.  Using XML Schema would seem to be at odds with
   that principle.  

   By being designed to be simple, Lumas avoids these problems.

   In summary, currently the main tools used for message definition in
   the IETF are ABNF and XML Schema.  In many respects these represent
   two extremes, one simple and very low-level, and the other complex
   and high-level.  Lumas is a data point between these two extremes,
   giving much of the flexibility of XML with the ease of understanding


Cordell                Expires August 1, 2007              [Page 6]
Internet Draft                 Lumas                     February 2007


   and compactness of ABNF.  As such it is a useful extra tool that
   allows protocol developers to better tailor protocols to their needs.

   On another level, although message definition languages have been
   around for many years now, the relative paucity of options available,
   and the fact that XML is being trumpeted as a break through in
   inter-platform communication suggests that in terms of evolution, the
   field is in its infancy.  It's easy to see why this might be.
   Message definition has not been seen as a core activity, and
   developers simply make-do by borrowing what is already available in
   other fields, even if they are not an ideal fit to their
   requirements.  This would suggest that there is scope for much
   development, and it may transpire that XML turns out to be the
   FORTRAN or COBOL of the message definition world, and there is much
   more exciting stuff to come.  It is hoped that Lumas can play a part
   in that story.

4.    Terminology

   The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
   in this document are to be interpreted as described in [KWORDS]. 

   For the purposes of this document, a "tag" is a fixed sequence of
   characters used on the wire as an identifier for the value or values
   that it is associated with.  Thus identified, the value can be
   interpreted and processed in the right way.

5.    Example Lumas Message Definition and Message Encoding

   As the Lumas message definition syntax is C-like it is felt that many
   will immediately understand the majority of a message definition.
   For this reason the basic principles of the definition language and a
   short example are presented before describing the format in detail.

5.1   Principles of the Message Definition

   Following the C language format, the basic format of a parameter
   definition is:

      type  name ;

   'Type' specifies things like integers, booleans, ASCII strings,
   Unicode strings and so on.    

   The 'name' is the name of the parameter.

   Thus a parameter definition might be:

      ascii rfc-name ;

   This says that 'rfc-name' is an ASCII string.  In addition, a
   parameter definition can express constraints on the type, constraints


Cordell                Expires August 1, 2007              [Page 7]
Internet Draft                 Lumas                     February 2007


   on the cardinality (how many instances of the type are valid in a
   message), and the tag to be used for the value on the wire.  (A tag
   is a fixed sequence of characters that is used to identify the value
   or values that it is associated with.)  For example, an integer may
   be limited to the values 0 to 255, and an ASCII string may be limited
   to a maximum size.  The fuller format of a parameter definition has
   the form:

      type <constraint>  name [cardinality]  tagging ;

   For example:

      int <1..30000>  referenced-rfcs  [0..255]  as  refers ;

   This defines an integer that can have values between 1 and 30000.
   The name of the parameter is 'referenced-rfcs', but is tagged
   on-the-wire using the character sequence 'refers'.  The parameter can
   consist of between 0 and 255 instances of the integer in a valid
   message.

   Two main types of compound parameter are possible, these being
   'struct' and 'union'.  Having much the same meaning as they have in
   C, a struct specifies a group of parameters, all of which may be used
   in a particular instance of the struct.  A union similarly specifies
   a group of parameters, but in this case only one of the parameters
   can be used in any one instance of the union.

   An example of a struct is:

      struct  rfc-info
      {
            ascii           rfc-name;
            int <1..30000>  referenced-rfcs[0..255]  as  refers ;
      };

   A third form of compound type called 'combi' is also available.  The
   name is short for 'combined' and the type allows a number of values
   to be concatenated together into what looks like a single value.
   Hence it can be used to define constructs like the character sequence
   'HTTP/1.0', and that the '1' and the '0' are the major and minor
   version numbers.

5.2   An Example Message Definition and How it is Encoded

   The following is an example message definition that is intended to
   represent a very crude meeting controller:








Cordell                Expires August 1, 2007              [Page 8]
Internet Draft                 Lumas                     February 2007


      lumas module com.tech-know-ware.my-example;
      /*
      An example Lumas definition
      */
      import com.tech-know-ware.general as tkwg;

      struct  my-example
      {
            int <0..255>    participant-id  as  ?;
            Action          action  as  ?;
            struct          my-addition[0..1] 
                                   as new.tech-know-ware.com plugin
            {
                  bool      tkw-app-capable  as  ?;
            };
      };

      union  Action
      {
            Join            join;
            Message         message  as  msg;
            void            leave;
      };

      struct  Join
      {
            unicode<0..63>  name;
      };

      struct  Message
      {
            int <0..255>    to-participants[1..127]  as  to;
            unicode<1..255> message  as  msg;
            [               // Version 2 additions
            tkwg::Priority  priority;
            ]
            [               // Version 5 additions
            ascii<0..16>    font-name[0..1] as font;
            void            bold[0..1];
            void            italic[0..1];
            void            underlined[0..1] as ul;
            ]
      };

   The first construct (in this case the struct my-example) is the root
   of all messages for the protocol.  Each message identifies a
   participant using an integer in the range 0 to 255, called
   'participant-id'.  When encoded on the wire, this parameter will be
   untagged due to the 'as ?' specification.  

   On-the-wire, the default encoding generally encodes parameters in the
   form:



Cordell                Expires August 1, 2007              [Page 9]
Internet Draft                 Lumas                     February 2007


       tag = value

   where 'value' is a textual representation of the parameters value.

   However, if a parameter is marked as untagged, then it is represented
   simply as:

       value

   Hence, if in a message an instance of participant-id is to have a
   value of, say, 12, then, due to being marked as untagged, it is
   encoded simply as:

       12

   Rather than the following, which would be the case if it was not
   marked as untagged:

       participant-id = 12

   In this example, each message then has an action, which is also
   untagged.  The type of the action parameter is not immediately
   specified, and instead references the 'Action' definition.  

   The Action definition is a union in which only one of the specified
   parameters may appear in an instance of the Action construct.  This
   effectively represents a fork in the semantics of any given message.
   In this case the options within Action can indicate that somebody has
   joined the meeting, left the meeting, or is sending a message to
   other participants.

   There is no explicit tag for the 'join' and 'leave' options, so these
   will be tagged on-the-wire by the parameters' names, 'join' and
   'leave' respectively.  Conversely, an explicit tag for the 'message'
   parameter is specified, and hence the message option will be tagged
   by 'msg' on-the-wire.

   The join parameter also has a referenced definition; the struct named
   Join.  For the purposes of this example, when a person joins a
   meeting, all the other participants are informed of their name.  The
   name member in the struct is a UTF-8 encoded Unicode string that has
   a minimum length of 0 characters and a maximum length of 63
   characters.  Hence an example of the join parameter encoded on the
   wire is:

       join = { name = "Alice" }

   Here, the braces delimit the extent of the members in the struct and
   the double quotes delimit the characters representing the name.

   The message option is also a referenced definition.  Conceptually, to
   send a message, the 'participant-id' is used to identify the sender,


Cordell                Expires August 1, 2007              [Page 10]
Internet Draft                 Lumas                     February 2007


   and the 'to-participants' field contains the participant ids of all
   the people to whom the message is being sent.  On-the-wire, the
   to-participants parameter will be tagged with 'to'.  Between 1 and
   127 (inclusive) instances of the to-participants parameter may appear
   in a message.  For efficiency, Lumas allows multiple occurrences of
   the same parameter to be represented as a comma separated list.
   Hence an example of the on-the-wire encoding of the to-participants
   parameter would be:

       to = 2, 5, 8, 58

   Also, the message itself is included.  The message will consist of
   Unicode characters and can be between 1 and 255 Unicode characters
   long.  On-the-wire, the message parameter will have the tag 'msg'.
   An example of the on-the-wire format is thus:

       msg = "Where are we going for dinner"

   The priority field within the message struct has been added in a
   later version of the protocol.  This is indicated by the square
   brackets in which the parameter is wrapped.  Similarly, font-name,
   and the associated parameters have, according to the comment, been
   added in version 5 of the protocol.  The type of the 'priority'
   parameter is defined in an external module that has the alias 'tkwg'.
   The 'import' directive at the beginning of the example indicates that
   the 'tkwg' alias corresponds to the module
   'com.tech-know-ware.general', and it is in this module that the
   definition of 'Priority' is located.  The definition indicates that
   'font-name' is an ASCII string.  The reader should already understand
   enough of the definition language to understand the meaning of the
   other fields.

   Returning to the 'my-example' root, a third-party has added an
   extension to the protocol in the form of the 'my-addition' parameter.
   It is identified as not being part of the base specification by the
   keyword 'plugin'.  On-the-wire, the additional parameter will be
   identified by the tag 'new.tech-know-ware.com' to differentiate it
   from additions that may be made by other third parties.

   In summary, the following are complete examples of the default
   on-the-wire representation of the example message definition:

      12  
      join = { name = "Alice" }  
      new.tech-know-ware.com  =  { True }

   and:







Cordell                Expires August 1, 2007              [Page 11]
Internet Draft                 Lumas                     February 2007


      12  
      msg = { to = 2, 5, 8, 58  
            msg = "Where are we going for dinner"
            font = 'Arial' }  

   and:

      12  
      leave  

   Note that the placing of each parameter on a separate line is not
   significant.  Lumas is free form with respect to white space.  Hence,
   the message above could equally be represented as:

       12 join={name="Alice"} new.tech-know-ware.com={True}

6.    Formal Message Definition Syntax

   The sections below describe the Lumas message definition syntax.  The
   'top-level' production is 'lumas-definition', which is defined in
   6.19, "The Top Level Definition".  The following sections define the
   components of the message definition language building up to the
   top-level production.

   The Lumas syntax is defined using ABNF [ABNF].  

6.1   Lumas Keywords

   Lumas keywords are case-sensitive.  Therefore "AS" can not be used in
   place of "as".  As ABNF literal strings are case-insensitive, this
   section defines the Lumas keywords in a case-sensitive way.

      as-kw          = %x61.73                   ; as in lowercase
      ascii-kw       = %x61.73.63.69.69          ; ascii in lowercase
      b              = %x62 
      bool-kw        = %x62.6F.6F.6C             ; bool in lowercase
      bytes-kw       = %x62.79.74.65.73          ; bytes in lowercase
      combi-kw       = %x63.6F.6D.62.69          ; combi in lowercase
      const-kw       = %x63.6F.6E.73.74          ; const in lowercase
      d-upper        = %x44                      ; Uppercase D
      d              = %x64 
      date-kw        = %x64.61.74.65             ; date in lowercase
      double-kw      = %x64.6F.75.62.6C.65       ; double in lowercase
      embedded-kw    = %x65.6D.62.65.64.64.65.64 ; embedded in lowercase
      endmodule-kw   = %x65.6E.64.6D.6F.64.75.6C.65 
                                         ; endmodule in lowercase
      extends-kw     = %x65.78.74.65.6E.64.73    ; extends in lowercase
      f              = %x66 
      float-kw       = %x66.6C.6F.61.74          ; float in lowercase
      import-kw      = %x69.6D.70.6F.72.74       ; import in lowercase
      int-kw         = %x69.6E.74                ; int in lowercase
      into-kw        = %x69.6E.74.6F             ; into in lowercase


Cordell                Expires August 1, 2007              [Page 12]
Internet Draft                 Lumas                     February 2007


      ipv4-kw        = %x69.70.76.34             ; ipv4 in lowercase
      ipv6-kw        = %x69.70.76.36             ; ipv6 in lowercase
      lumas-kw       = %x6C.75.6D.61.73          ; lumas in lowercase
      module-kw      = %x6D.6F.64.75.6C.65       ; module in lowercase
      n              = %x6E 
      oid-kw         = %x6F.69.64                ; oid in lowercase
      plug-kw        = %x70.6C.75.67             ; plug in lowercase
      pluggable-kw   = %x70.6C.75.67.67.61.62.6C.65 
                                         ; pluggable in lowercase
      plugin-kw      = %x70.6C.75.67.69.6E       ; plugin in lowercase
      r              = %x72 
      s-upper        = %x53                      ; Uppercase S
      s              = %x73 
      single-kw      = %x73.69.6E.67.6C.65       ; single in lowercase
      struct-kw      = %x73.74.72.75.63.74       ; struct in lowercase
      t              = %x74 
      time-kw        = %x74.69.6D.65             ; time in lowercase
      unicode-kw     = %x75.6E.69.63.6F.64.65    ; unicode in lowercase
      union-kw       = %x75.6E.69.6F.6E          ; union in lowercase
      unquoted-ascii-kw = %x75.6E.71.75.6F.74.65.64.2D.61.73.63.69.69 
                                         ; unquoted-ascii in lowercase
      void-kw        = %x76.6F.69.64             ; void in lowercase
      w              = %x77 
      w-upper        = %x57                      ; Uppercase W
      x              = %x78 
      z              = %x7A 

6.2   Lumas Parameters

   The main building block of a Lumas message definition is the
   parameter.  There are three classess of parameter in Lumas, simple
   parameters, compound parameters and reference parameters, which are
   defined as: 

      lumas-parameter  =  simple-param / compound-param /
                          reference-param

   A simple parameter typically describes a simple value such as a
   string, integer or date.  They may represent a name, a temperature or
   a birthday.  

   Compound parameters are collections of simple parameters and other
   compound parameters, similar to how Java and C++ classes group
   together simple variables and other classes.

   Reference parameters allow a parameter to be defined in terms of a
   type (either simple, compound or reference) that is defined elsewhere
   in the message definition.

6.3   Simple Parameters

   The ABNF definition of a simple parameter is:



Cordell                Expires August 1, 2007              [Page 13]
Internet Draft                 Lumas                     February 2007


      simple-param = simple-type WS name [ OWS cardinality ] 
                                         [ WS as-kw WS explicit-tag ]
                                         [ WS plugin-kw ] OWS ";" OWS

   where 'WS' represents white space, and 'OWS' represents optional
   white space.  ('WS' and 'OWS' are defined in Section 8 - 'Common ABNF
   Definitions'.  Generally, comments can be included wherever white
   space is allowed.)

   As can be seen, the main parts of the definition of a simple
   parameter are the simple type and the name.  Additional specification
   allows further control of the message contents.  These fields are
   discussed below.

6.4   The Simple Types

   Simple parameters have simple types such as integers, booleans etc.
   Each of Lumas' simple types are listed and described below.  How
   these simple types are specified in a message definition is described
   in the following section.

   The Lumas simple types are:

      void

         A parameter that has no value.  This is most useful in unions
         (wherein a converts a union into an enumerated type), and can
         also be used in a struct to represent boolean events wherein
         the absence of the parameter indicates false, and the presence
         of the parameter indicates true.  It is more useful than you
         might at first think!

      bool

         A Boolean value.  Can be true or false.

      int

         An integer value.

      float

         A floating point value.  The constraints of a float specify the
         float to be either in accordance with a single precision value
         or a double precision value as specified in IEEE 754 [IEEE754].
         The absence of a constraint indicates a single precision value.

      ipv4

         Represents an IPv4 address, but not the port.

      ipv6



Cordell                Expires August 1, 2007              [Page 14]
Internet Draft                 Lumas                     February 2007


         Represents an IPv6 address, but not the port.

      date

         Date according to the Gregorian calendar, with year, month and
         day of month.  Other calendar types may be constructed from
         primitive types if required.

      time

         Represents the time in hours, minutes and seconds using the 24
         hour clock notation.  By default the time MUST be adjusted to
         UTC, unless the time can be guaranteed to have only local
         significance.

      oid

         This is an ASN.1 style Object Identifier.  This is primarily
         included to enable identification of security protocols.

      ascii

         A string made up of ASCII characters, limited to the values 0
         to 127.

      unquoted-ascii

         An ascii string usually has quote marks around it.  This type
         does not have quotes around it.  Consequently it can not have
         any white space, or include any special characters (such as
         "=", ")", and "}") that would confuse the parser.

      unicode

         A string representing Unicode characters.

      const

         This type allows a constant value to be inserted into the
         encoded message.  It will typically be untagged.  One thing it
         might be used for is identifying the protocol of the message
         definition.  For example:

            const <HTTP>   protocol as ?;

      bytes

         An array of bytes.  Also useful for carriage of opaque data.

      embedded

         The value is an embedded Lumas message.  This allows layering


Cordell                Expires August 1, 2007              [Page 15]
Internet Draft                 Lumas                     February 2007


         of message definitions.

6.5   Simple Type Definition

   Lumas simple types are specified in a Lumas message as described in
   this section.  The 'simple-type' construct represents the type of the
   parameter.  It has the following form:

      simple-type = void-kw / bool-kw / integer-type / float-type / 
                    ipv4-kw / ipv6-kw / date-kw / time-kw / oid-kw / 
                    string-type / const-type / bytes-type /
                    embedded-type

   As can be seen, many of the types are specified using a single
   keyword.  Other types such as integers and strings allow the
   specification of additional constraints (such as the maximum value
   that an integer is allowed to have).  The definition of these types
   are as follows:

      integer-type  =  int-kw OWS "<" OWS int-constraint OWS ">"

      float-type  =  float-kw OWS [ "<" OWS float-constraint OWS ">" ]

      string-type  =  ( ascii-kw / unquoted-ascii-kw / unicode-kw ) 
                      [ OWS "<" OWS string-constraint OWS ">" ]

      const-type = const-kw OWS "<" first-safe-char *( safe-char ) ">"
                       ; See the section 'Notes on Comments' below

      bytes-type = bytes-kw [ OWS "<" OWS length-constraint OWS ">" ]

      embedded-type = embedded-kw [OWS "<" OWS embed-constraint OWS ">"]

   The constraints for the numerical types are specified as follows:

      int-constraint = min-int-constraint OWS ".." OWS max-int-constraint 
                             [ OWS use-leading-zero-marker ]
      min-int-constraint  =  ["-"] pos-number
      max-int-constraint  =  ["-"] pos-number
      use-leading-zero-marker = z   ; lower case z

      float-constraint = single-kw / double-kw 

   The constraints for the string, const, bytes and embedded types are
   as follows:









Cordell                Expires August 1, 2007              [Page 16]
Internet Draft                 Lumas                     February 2007


      string-constraint = [ length-constraint ] [ OWS pattern-constraint ]
      embed-constraint = [ length-constraint ] 
                                        [ OWS embedded-module-constraint ]
      embedded-module-constraint = "(" OWS module-name OWS ")"

      length-constraint = 
                [ min-len-constraint OWS ".." OWS ] max-len-constraint
      min-len-constraint      =  pos-number
      max-len-constraint      =  pos-number  /  unlimited-length-token
      unlimited-length-token  =  "*"

   These constraints use the following definition:

      pos-number = 1*DIGIT         ; Decimal number
                   / "0"x 1*HEXDIG ; Hex number
                   / 1*DIGIT b     ; Specifies number of binary bits

   In the case of 'integer-type', the mandatory constraint specifies the
   minimum and maximum permissible values that the integer can take.  If
   the 'use-leading-zeros-marker' character ('z') is included in the
   constraint, then where necessary the integer MUST be represented on
   the wire with leading zeros to make the value fixed width.  (This is
   primarily applicable to combined types.)

   The 'pos-number' construct used to specify the integer value
   constraint has a form that can specify the number of binary bits.
   The number of bits specified does not include any sign bits.  Hence
   an unsigned 32 bit number can be represented as 0..32b, whereas a
   signed 32 bit number can be represented as -31b..31b (although this
   will actually exclude the most negative value of a signed 32 bit
   number).

   A float is either a single precision IEEE 754 number or a double
   precision IEEE 754 number [IEEE754].  The absence of a constraint
   indicates single precision.  (Developers are advised that in a number
   of cases a binary IEEE 754 number can not be exactly represented in a
   text-based base 10 format.  Hence the decoder's binary representation
   of a floating-point number may differ from the encoder's binary
   representation of the number.  If such discrepancies are not
   acceptable, developers should use an alternative representation for
   floating-point numbers.)

   In the case of 'string-type', the optional constraint specifies the
   minimum and maximum number of characters that are allowed to be
   represented in a valid encoding and optionally a valid pattern of
   characters.  The minimum and maximum character constraint specifies
   the minimum and maximum number of characters at the application
   level, not the actual number of characters that are used to represent
   the application level characters on the wire.  The format of the
   pattern constraint is designed to simplify regular expression
   evaluation by preventing the need for the trial and error type
   processing of general regular expressions.  Thus, in accordance with


Cordell                Expires August 1, 2007              [Page 17]
Internet Draft                 Lumas                     February 2007


   Lumas' 80/20 principle, valid patterns MUST not require the regular
   expression evaluator to do backtracking.  The pattern constraint is
   described further in Section 6.6.

   In the case of 'bytes-type', the optional constraint specifies the
   minimum and maximum number of bytes that are allowed to be
   represented in a valid encoding.  The constraint specifies the
   minimum and maximum number of bytes at the application level, not the
   number of characters that are used to encode those bytes on the wire.

   The optional constraint in 'embedded-type' MAY specify the permitted
   length of the embedded message and/or the Lumas module name of the
   message that is to be embedded.  For example:

            embedded<(com.tech-know-ware.scp)>   embedded-scp;

   In the constraint syntax, a maximum value '*' means infinite or
   unbounded.

6.6   The Pattern Constraint

   The pattern-constraint has the following form:

      pattern-constraint = "/" sub-pattern *( "|" sub-pattern ) "/"
      sub-pattern = *pattern-element
      pattern-element = pattern-char [ quantifier ]
      pattern-char = %x20-29 / %x2C-2E / %x30-3E / %x40-5A
                         / %x5D-7A / %x7D-FF  ;not \/|[?*+{
                     / escaped-char / special-char / character-class
      escaped-char = "\\"     ; Matches \
                   / "\/"     ; Matches /
                   / "\|"     ; Matches |
                   / "\["     ; Matches [
                   / "\?"     ; Matches ?
                   / "\*"     ; Matches *
                   / "\+"     ; Matches +
                   / "\{"     ; Matches {
                   / "\."     ; Matches .
      special-char = "\" r    ; Matches the return character
                   / "\" n    ; Matches the new line character
                   / "\" t    ; Matches the tab character
                   / "\" f    ; Matches the form feed character
                   / "\" s    ; Matches white space [ \t\r\n\f]
                   / "\" d    ; Matches any digit [0-9]
                   / "\" w    ; Matches any word character [a-zA-Z_0-9]
                   / "\" s-upper ; \S Matches anything not matched by \s
                   / "\" d-upper ; \D Matches anything not matched by \d
                   / "\" w-upper ; \W Matches anything not matched by \w
                   / "."      ; Matches any character





Cordell                Expires August 1, 2007              [Page 18]
Internet Draft                 Lumas                     February 2007


      character-class = matching-character-class / inverse-character-class
      matching-character-class = "[" *(class-char / class-range) "]"
                   ; For a successful match, the character in the string 
                   ; being matched must be one of the characters 
                   ; specified in the matching-character-class.
      inverse-character-class = "[^" *(class-char / class-range) "]"
                   ; For a successful match, the character in the string 
                   ; being matched must NOT be one of the characters 
                   ; specified in the inverse-character-class.

      class-char = class-single-char / class-escaped-char 
                   / escaped-char / special-char 
      class-single-char = %x20-2C / %x2E-5B / %x5E-FF ; not - ] \
      class-escaped-char = 
                   "\-"       ; Matches -
                   / "\]"     ; Matches ]
                   ; /|[?*+{. need not be escaped within character-class
      class-range = first-range-char "-" last-range-char
                   ; The class-range matches all character that have 
                   ; an ASCII value greater or equal to that of 
                   ; first-range-char and less than or equal to 
                   ; last-range-char.
      first-range-char = class-single-char / class-escaped-char 
                   / escaped-char
      last-range-char = class-single-char / class-escaped-char 
                   / escaped-char

      quantifier = "?" / "*" / "+" 
                   / "{" quant-min-occurs [ "," [ quant-max-occurs ] ] "}"
                   ; The absence of a quantifier indicates once and only 
                   ; once
      quant-min-occurs = 1*DIGIT
      quant-max-occurs = 1*DIGIT

   The 'pattern-constraint' allows a number of 'sub-pattern's to be
   defined, any one of which may match the string value.  In each
   'sub-pattern' there are no grouping or alternation constructs.  This
   removes the need for backtracking and is suitable for 80% (or more)
   of applications.  

   The pattern matching uses a "greedy" match.  Each 'sub-pattern' can
   be viewed as a concatenation of 'pattern-element's.  

   Each 'pattern-element' is a pattern-char and an optional
   'quantifier'.  The 'pattern-char' may actually match multiple
   characters.  The 'quantifier' indicates how many times the associated
   'pattern-char' may appear in a valid pattern.  If the 'quantifier' is
   '?', the 'pattern-char' may appear 0 or 1 times.  If the 'quantifier'
   is '*', the 'pattern-char' may appear 0 or more times.  If the
   'quantifier' is '+', the 'pattern-char' may appear 1 or more times.
   If the quantifier is of the form '{n,m}', the 'pattern-char' may
   appear a minimum of n times, and a maximum of m times.  If the


Cordell                Expires August 1, 2007              [Page 19]
Internet Draft                 Lumas                     February 2007


   quantifier is of the form '{n}', the 'pattern-char' must appear
   exactly n times.  If the quantifier is of the form '{n,}', the
   'pattern-char' may appear n or more times. 

   To ensure that a string is in a suitable form to represent the value,
   the application, subject to the quantifier of a pattern-element,
   MUST, starting with the first character, keep matching successive
   characters of the string with the first pattern-element until the
   match fails.  The application MUST then try to match the unmatched
   character of the string along with subsequent characters in the
   string with the next pattern-element, again taking into account the
   quantifier for that pattern-element.  If a pattern-element has a
   quantifier that allows zero matches, then if the unmatched character
   of the previous pattern-element does not match the current
   pattern-element, the application should attempt to match the
   unmatched character against the next pattern-element, and so on.  The
   process is repeated until the whole string is matched, or the
   application is unable to match the current string character with an
   appropriate pattern-element.  If the application is unable to match
   the current input character with an appropriate patter-element, the
   whole sub-pattern match is deemed to have failed.  The application
   MUST NOT backtrack to a previous pattern-element in order to attempt
   to find a match.  This process is repeated for each of the
   sub-patterns until one of the sub-patterns matches the string, or all
   sub-patterns fail to match the string.  The message MUST NOT be
   encoded if none of the patterns matches the string.

   Example patterns include /\d{4} \d{4} \d{4} \d{4}/ for a (UK) credit
   card number, or /\d{4}-\d{2}-\d{2}T\d+:\d+:\d+Z/ for a date & time
   matching the form 2003-03-03T12:45:32Z.  The pattern / ?\d+|
   ?\d+\.\d+| ?\d+\.\d+[eE][+\ ]?\d+/ matches a floating point number
   that can be represented as either an integer, a decimal without
   exponent, or full 'scientific' format.  This pattern illustrates some
   of the impact of not allowing pattern groupings.

   For more information on regular expressions, see [PERL].

6.7   The Name

   Referring back to the simple-param definition, 'name' is the name of
   the parameter.  It has the format:

      name  =  ALPHA  *(  ALPHA / DIGIT  /  "-"  /  "_"  )

   If there is no explicitly defined tag, then, in the case of character
   based protocols, the name is also used as the parameter's tag
   on-the-wire.  In this case, the length of the name MUST NOT exceed 63
   characters in length.  See Section 6.9 for more on tagging.  

6.8   Cardinality

   The cardinality of a parameter specifies how many times a particular


Cordell                Expires August 1, 2007              [Page 20]
Internet Draft                 Lumas                     February 2007


   parameter can appear in a message.  The format mirrors a C-like array
   specification, but uses UML style ranges rather than the single
   values used in C.  If the cardinality field is absent, then one and
   only one instance of the parameter must occur in a valid message.  

   The format of the cardinality specification is:

      cardinality = "[" ( cardinality-range / "?" / "*" / "+" ) "]"
                      ; [?] short hand for [0..1]
                      ; [*] short hand for [0..*]
                      ; [+] short hand for [1..*]
      cardinality-range = [ min-occurrences ".." ] max-occurrences
      min-occurrences  =  1*DIGIT
      max-occurrences  =  1*DIGIT / unbounded-token
      unbounded-token  =  "*"

   Once again, the '*' in max-occurrences represents infinite or
   unbounded.  If in the 'cardinality-range' only 'max-occurrences' is
   present and it has a numerical value, the containing struct MUST have
   exactly 'max-occurrences' instances of the parameter.  

   Example cardinalities are as follows:

      [0..1]      ; Zero or one time

      [?]         ; Short hand for zero or one time

      [0..*]      ; Zero or more times

      [*]         ; Same as above, zero or more times

      [1..*]      ; One or more times

      [+]         ; Same as above, one or more times

      [2..*]      ; Two or more times

      [5]         ; Exactly five times

6.9   Tagging

   A parameter can have a tag associated with it.  A tag is a fixed
   sequence of characters used on the wire to enable a parser to
   identify the value or values that it is associated with.  

   By default, the name of the parameter is used as the tag.  If the
   name of the parameter is used as the tag the name MUST NOT exceed 63
   characters in length.  

   Alternatively an explicit tag can be specified.  It can be any
   sequence of characters that do not have special significance to the
   parser.  To facilitate buffer management, an explicit tag MUST NOT


Cordell                Expires August 1, 2007              [Page 21]
Internet Draft                 Lumas                     February 2007


   exceed 63 characters in length.  If the tag definition begins with a
   "?", the "?" is discarded.  Thus to specify that "?" should be used
   as the tag on-the-wire, 'explicit-tag' should be specified as "??".

      explicit-tag = [ "?" ] tag  ; tag defined in common definitions

   In certain constructs a parameter may also be untagged.  This is
   discussed in the relevant sections below.

6.10  The Plugin Extension Mechanism

   Marking a parameter as 'plugin' indicates to the developer and the
   tools that this parameter is (probably) not part of the original
   message definition.  For example, it might be a proprietary
   extension.  It also indicates that the parameter may not be present
   in all received messages.

   A parameter that is marked as 'plugin' MUST have an explicit-tag
   defined for it.  The explicit-tag MUST be constructed from a domain
   name [DOMAINS] owned by the entity defining the parameter, plus a
   sequence of characters that differentiate the explicit-tag from other
   explicit-tags defined by the defining entity.  The component parts of
   the explicit-tag are presented in the normal domain name order so
   that the most variable part of the string is at the beginning, thus
   improving parsing efficiency.

   An example explicit-tag for tech-know-ware.com might be:

      my-tag.tech-know-ware.com

6.11  Reference Parameters

   In a struct or union, it is also possible to reference types that are
   defined elsewhere.  The format of a 'reference-param' is:

      reference-param = reference-name WS name [ OWS cardinality ] 
                                         [ WS as-kw WS explicit-tag ]
                                         [ WS plugin-kw ] OWS ";" OWS
      reference-name = [ module-name "::" ] name

   Other forms of reference-parameter are defined in the sections below.

6.12  Compound Parameters

   The compound types are struct, union and combi.  For a struct,
   depending on the various parameters' cardinality specifications, any
   all or none of the parameters that a struct groups together may
   appear in a valid encoding.  In the case of a union, only one of the
   parameters may be encoded in a valid instance.  The combi form is
   effectively a compact encoding of a struct, but is subject to a
   number of additional constraints, which are described below.  



Cordell                Expires August 1, 2007              [Page 22]
Internet Draft                 Lumas                     February 2007


   The definition format of each of the compound parameters is similar
   to the simple parameters.  

   The 'compound-param' has the form:

      compound-param = struct-param / union-param / combined-param

6.13  Struct Parameters

   The definition of a 'struct-param' is:

      struct-param = struct-kw WS name [ OWS cardinality ] 
                                        [ WS as-kw WS explicit-tag ] 
                                        [ WS pluggable-kw ]
                                        [ WS plugin-kw ] 
                                WS "{" struct-body "}" OWS ";" OWS

   'Cardinality' and 'explicit-tag' have the same meaning as for the
   simple types.  The 'pluggable' keyword is defined in Section 6.17.

   The format of the 'struct-body' is:

      struct-body = *( untagged-lumas-parameter )
                    *( lumas-parameter ) 
                    *( struct-extension )

   The struct body starts with all the untagged parameters.  Untagged
   parameters may have a cardinality other than one.  Note that, if the
   cardinality of an untagged parameter allows it to be absent, then
   when encoded on the wire, if the untagged parameter is absent, then
   all subsequent parameters, including tagged parameters MUST also be
   absent.  Thus great care is recommended when defining a message
   syntax that allows for an untagged parameter to be absent. 

   The tagged parameters follow the untagged parameters.  

   When the message definition is subsequently extended, an instance of
   the 'struct-extension' construct MUST be added to the end of the
   struct definition for each version in which the struct is extended.
   The 'struct-extension' construct wraps the added parameters within
   square brackets to indicate that they are added in a new version.
   This not only allows a developer to see what has been added in a new
   version, but also allows a parser to do the same.  This is important
   because a parser must always consider absence of the new parameters
   to be a valid encoding so that it can receive messages from entities
   that are using an earlier version of the protocol.  (To do this
   manually would dictate that all extension parameters would have to
   have a cardinality specification that included zero.  This would be
   tedious, potentially error prone, and loses some expressiveness.)
   During the extension process, all new parameters MUST be added onto
   the end of an existing construct, and the order of parameters MUST
   NOT be rearranged from one version to the next.  Note that


Cordell                Expires August 1, 2007              [Page 23]
Internet Draft                 Lumas                     February 2007


   'struct-extension' does not allow the specification of untagged
   parameters.

   All of these have a similar format to the types already defined,
   except that in some cases they may be untagged.  To make the ABNF
   definition accurate it is therefore necessary to repeat the above
   basic definitions with the appropriate tagging specifications.

   The definition of the untagged struct parameters is:

      untagged-lumas-parameter  =  untagged-simple-param  / 
                                      untagged-compound-param /
                                      untagged-reference-param

      untagged-simple-param = simple-type WS name [ OWS cardinality ] 
                                             WS as-kw WS "?" OWS ";" OWS

      untagged-compound-param = untagged-struct-param / 
                                     untagged-union-param /
                                     untagged-combined-param

      untagged-struct-param = 
                           struct-kw WS name [ OWS cardinality ] 
                                     WS as-kw WS "?"  
                                     [ WS pluggable-kw ]
                                     WS "{" struct-body "}" OWS ";" OWS

      untagged-union-param = union-kw WS name [ OWS cardinality ] 
                                     WS as-kw WS "?"
                                     [ WS pluggable-kw ]
                                     WS "{" union-body  "}" OWS ";" OWS

      untagged-combined-param = 
                              combi-kw WS name [ OWS cardinality ] 
                                     WS as-kw WS "?"
                                     WS "{" combined-body  "}" OWS ";" OWS

      untagged-reference-param = reference-name WS name [ OWS cardinality ] 
                                         OWS ";" OWS

   Note that the 'plugin' keyword is not applicable to untagged
   parameters.

   The tagged parameters have the basic parameter definition that was
   initially presented, i.e. lumas-parameter.

   The struct body extension fields have the format:

      struct-extension = "[" OWS 1*( lumas-parameter ) "]" OWS

6.14  Union Parameters



Cordell                Expires August 1, 2007              [Page 24]
Internet Draft                 Lumas                     February 2007


   A union parameter has the following definition:

      union-param = union-kw name [ OWS cardinality ] 
                                        [ WS as-kw WS explicit-tag ]
                                        [ WS pluggable-kw ]
                                        [ WS plugin-kw ]
                                WS "{" union-body "}" OWS ";" OWS

   'Cardinality' and 'explicit-tag' have the same meaning as for the
   simple types.  The 'pluggable' keyword is defined in Section 6.17.

   A union-body MAY have a single untagged integer parameter.  All other
   parameters MUST be tagged and have a cardinality of one and only one.
   Other than the cardinality constraints of a union, a union can be
   extended in the same way as a struct.

   The untagged integer parameter allows integers to be defined that
   have wild-carding options.  For example, a union might be defined as:

      union  select
      {
            int<0..65535>  numbered  as ?;
            void           any       as *;
      };
      

   Examples of the encoded form might be:

      select = 12

      select = *

   The parameters within a union are only allowed unary cardinality to
   avoid ambiguity in the on-the-wire encoding.  If multiple instances
   of a parameter must be included as an option in a union, it is
   necessary to wrap the parameters within a struct, using something
   similar to:

      struct X { X      x[1..*] as ?; };

   The definition of a union-body is as follows:

      union-body = [ integer-type WS name WS as-kw WS "?" OWS ";" OWS ]
                   *( singular-lumas-parameter ) 
                   *( union-extension )
      

   As mentioned previously, most of the parameters within a union are
   tagged and have a cardinality of one.  Their defininition is:





Cordell                Expires August 1, 2007              [Page 25]
Internet Draft                 Lumas                     February 2007


      singular-lumas-parameter  =  singular-simple-param  / 
                                   singular-compound-param /
                                   singular-reference-param

      singular-simple-param = simple-type WS name 
                                        [ WS as-kw WS explicit-tag ] 
                                        [ WS plugin-kw ] OWS ";" OWS

      singular-compound-param = singular-struct-param / 
                                singular-union-param /
                                singular-combined-param

      singular-struct-param = struct-kw WS name [ WS as-kw WS explicit-tag ]
                                                [ WS pluggable-kw ]
                                                [ WS plugin-kw ] 
                                OWS "{" struct-body "}" OWS  ";" OWS

      singular-union-param = union-kw WS name [ WS as-kw WS explicit-tag ] 
                                              [ WS pluggable-kw ]
                                              [ WS plugin-kw ]
                                OWS "{" union-body "}" OWS ";" OWS

      singular-combined-param = combi-kw WS name 
                                             [ WS as-kw WS explicit-tag ] 
                                             [ WS plugin-kw ]
                                OWS "{" combined-body "}" OWS ";" OWS

      singular-reference-param = reference-name WS name 
                                         [ WS as-kw WS explicit-tag ]
                                         [ WS plugin-kw ] OWS ";" OWS

   The union extension operates in a similar fashion to that of a
   struct, but references singular-lumas-parameters.  Its definition is:

      union-extension = "[" OWS 1*( singular-lumas-parameter ) "]" OWS

6.15  Combined Parameters

   A combined parameter has the following definition:

      combined-param = combi-kw name [ OWS cardinality ] 
                                     [ WS as-kw WS explicit-tag ]
                                     [ WS plugin-kw ]
                                WS "{" combined-body "}" OWS ";" OWS

   The combined compound type provides a simple mechanism for defining
   new combined types similar to that used for date and time.  All the
   members of a combined type are encoded on the wire using their
   untagged form and concatenated together with no intervening white
   space.  The result of the encoding MUST meet all the constraints of
   an unquoted-ascii value.  In addition, the parameters that make up
   the combined type are subject to the following constraints:



Cordell                Expires August 1, 2007              [Page 26]
Internet Draft                 Lumas                     February 2007


      -     Each unquoted-ascii parameter that is part of a combined
            body MUST have a fixed number of characters,

      -     The first character of unquoted-ascii and const parameters
            MUST NOT be a digit,

      -     integer values MUST NOT be adjacent.

   The form of the combined body is:

      combined-body = *( combined-simple-type WS name ";" )

      combined-simple-type = integer-type / const-type / 
                       unquoted-ascii-kw OWS "<" 1*DIGIT ">"

   In many respects the combined type simply makes the encoded form look
   prettier, and anything that can be encoded with the combined type can
   also be represented with the struct type.  The combined type should
   also not be used for defining patterns of ASCII or Unicode
   characters.  Note also that a combined type is not pluggable and
   hence can not be extended.  It is therefore recommended that the
   combined type be used sparingly.

   An example of a combined type is:

      combi protocol as ?
      {
          const <HTTP/> const1;
          int<0..99>    major-version;
          const <.>     const2;
          int<0..99>    minor-version;
      };

      Which might be encoded as: HTTP/1.1

   Combined types also allow you to define numbers that contain decimal
   points.  An example of such is:

















Cordell                Expires August 1, 2007              [Page 27]
Internet Draft                 Lumas                     February 2007


      union currency as ?
      {
            void dollars as US$;
            void pounds as GBP;
            void francs as FFr;
      }

      combi amount as ?
      {
          int<-31b..31b>   main-denomination;
          const <.>        const2;
          int<0..99z>      sub-denomination;
      };

      Which might be encoded as: US$ 100.05

6.16  Referenced Parameters

   It was mentioned previously that structs and unions can reference
   types that are defined elsewhere.  Referenced types do not have a
   cardinality specification, and do not specify an explicit tag.  This
   is because the cardinality and tagging of the type are defined in the
   item that does the referencing, rather than where the referenced type
   is defined.  (If a referenced type needs a cardinality other than
   one, it is recommended that the technique for giving a parameter
   within a union a non-unary cardinality be used.)  

   The definition of the referenced types are:

      referenced-lumas-parameter  =  referenced-simple-param / 
                                   referenced-compound-param /
                                   referenced-reference-param

      referenced-simple-param = simple-type WS name OWS ";" OWS

      referenced-compound-param = referenced-struct-param / 
                                 referenced-union-param /
                                 referenced-combined-param

      referenced-struct-param = struct-kw WS name [ WS pluggable-kw ]
                                OWS "{" struct-body "}" OWS ";" OWS

      referenced-union-param = union-kw WS name [ WS pluggable-kw ]
                                OWS "{" union-body "}" OWS ";" OWS

      referenced-combined-param = combi-kw WS name
                                OWS "{" combined-body "}" OWS ";" OWS

      referenced-reference-param = reference-name WS name OWS ";" OWS

6.17  External Extensions - Plug and Pluggable



Cordell                Expires August 1, 2007              [Page 28]
Internet Draft                 Lumas                     February 2007


   A protocol may be extended via an external specification without
   directly modifying the original definition.  This may be to define a
   proprietary extension, or to define an external profile of the base
   protocol.  The specification for this type of extension is:

      external-extension = 
                       plug-kw WS
                           ( external-struct-extension / 
                             external-union-extension )
                       WS into-kw WS into-name
                             *( OWS COMMA OWS into-name ) OWS ";" OWS
      into-name = [ module-name "::" ] hierarchical-name
      hierarchical-name = *( name "." ) name

      external-struct-extension = 1*lumas-parameter
      external-union-extension = 1*singular-lumas-parameter
      

   This specifies a parameter that is to be plugged into an existing
   construct.  For example, if the following is defined:

      plug 
            ascii cookie as cookie.tech-know-ware.com;
      into my-example.my-addition;
      

   The resulant definition would be treated as if it were:

      struct  my-example
      {
            int <0..255>  participant-id  as  ?;
            Action        action  as  ?;
            struct        my-addition[0..1] 
                                    as new.tech-know-ware.com plugin;
            {
                  bool    tkw-app-capable  as  ?;
                  ascii   cookie as cookie.tech-know-ware.com plugin;
            };
      };

   The 'into-name' field indicates the name of the construct that the
   item is to be plugged into.  The optional 'module-name' part of the
   name specifies the name of the module that contains the parameter
   into which the extension is to be plugged.  The 'hierarchical-name'
   specifies the name of the parameter within the module that the
   extensions are to be plugged into.  The name is hierarchical because
   parameters can be locally defined within structs and unions.  The
   hierarchical name is made up of the name of each of the parameter's
   ancestors' names plus the name of the parameter itself joined
   together by the '.' character.  If the parameter to be extended is
   contained within another parameter, the first name is the name of the
   outer-most parameter that contains the parameter to be extended (i.e.


Cordell                Expires August 1, 2007              [Page 29]
Internet Draft                 Lumas                     February 2007


   one that is not contained within any other parameter), and the second
   name is the name of the next outer-most parameter that contains the
   parameter to be extended (if present), and so on until the parameter
   itself is named.  An illustration of the naming is shown in the
   example above.

   In a struct and union the 'pluggable' keyword is used to indicate
   that the construct is a location that the message designers have
   formally declared as extendible using the 'plug' mechanism.  Lumas
   compilers SHOULD emit warnings when extra material is plugged into
   locations that are not marked as pluggable, but MUST NOT consider it
   an error.  Combined types are not pluggable.

   If a party other than the original message designers use the plug
   mechanism to define an extension, each added parameter MUST have an
   explicit-tag constructed according to the rules described in Section
   6.10.

6.18  Module Definition and Directives

   A single protocol may be defined in a number of message definition
   files.  This might be for the purpose of accessing predefined
   libraries, or specifying a definition that the current definition
   extends.  A message definition therefore begins with a set of
   optional directives expressing this information.  They have the form:

      lumas-directives =
            [ lumas-kw WS module-kw WS module-name OWS ";" OWS ]
            [ extends-kw WS module-name [ WS as-kw WS alias ] OWS ";" OWS ]
            *( import-kw WS module-name [ WS as-kw WS alias ] OWS ";" OWS )

      module-name = [ "+" ] name *( "." name )
      alias = name
      

   The 'module' directive specifies the name of the module.

   The 'extends' directive is used in a definition that contains an
   external extension.  The module-name in the extends specification
   indicates the message definition that is being extended.

   The 'import' statement indicates a library message definition that
   contains referenced types that are referenced within the message
   definition.  

   The 'module-name' is a hierarchical namespace that is based on the
   name of the protocol, combined with a domain name [DOMAINS] owned by
   the entity defining the protocol.  The parts of the module-name are
   combined together so that it looks like a regular domain name.  The
   order in which the domain levels is written is then reversed, so that
   the top-level domain becomes the first written domain, and the second
   level domain becomes the second written domain and so on.  For


Cordell                Expires August 1, 2007              [Page 30]
Internet Draft                 Lumas                     February 2007


   example, if a protocol called the Simple Conference Protocol (SCP)
   was defined by Tech-Know-Ware Ltd with a domain name of
   tech-know-ware.com, the module name might be:

      com.tech-know-ware.scp

   It is the responsibility of the entity owning the domain name to
   ensure that the module names it creates using its domain name are
   unique.

   Lumas defines a number of pseudo top level domains for its own
   purposes.  These are currently as follows:

   +ietf A pseudo top level domain for the Internet Engineering Task
         Force.

   +iso  A pseudo top level domain for the International Standards
         Organisation.  The sub-domains of this domain follow the
         structure of ISO defined Object Identifiers.  All spaces must
         be removed and numbers in brackets should be ignored when
         parsing this domain.  E.g. iso(1) member-body(2) us(840)
         rsadsi(113549) digestAlgorithm(2) 5 is represented as
         +iso(1).member-body(2).us(840).rsadsi(113549).digestAlgorithm(2).5
         and looked up as +iso.member-body.us.rsadsi.digestAlgorithm.5 .

   +itu  A pseudo top level domain for the International
         Telecommunications Union.  The sub-domains of this domain
         follow the structure of ITU defined Object Identifiers.
         Processing of such identifiers follows that defined for
         processing of ISO Object Identifiers.

   +lms  A pseudo top level domain for defining Lumas extensions and
         libraries. 

   +uuid A pseudo top level domain that uses Universally Unique
         Identifiers for identification.  An example is: 

            +uuid.4d36e96c-e325-11ce-bfc1-08002be10318

   National standards bodies such as ANSI and BSI are defined under
   their national top-level domain.

   The 'alias' part of the import and export statements is used as an
   alias of the 'module-name', so that items within 'module-name' can be
   referenced in the abbreviated form of:

      alias::item

   For example, if a parameter definition called 'id' is contained in
   the module 'com.tech-know-ware.scp', and the following import
   statement is specified:



Cordell                Expires August 1, 2007              [Page 31]
Internet Draft                 Lumas                     February 2007


      import com.tech-know-ware.scp as tkwscp;

   Then 'id' can be referenced by:

      tkwscp::id

6.19  The Top Level Definition

   Finally, we are in a position to describe a complete Lumas message
   definition.  This is:

      lumas-definition  =  OWS lumas-directives
                              *external-extension
                              *referenced-lumas-parameter
                              [ OWS endmodule-kw OWS ";" ]
                              OWS

   The first parameter defined within the message definition is the root
   of the message definition tree, and is thus the outer-most construct
   of an encoded message.

   The end of a Lumas definition MAY be marked with the 'endmodule'
   keyword.  Marking the end of a module in this way allows multiple
   Lumas definitions to be included in a single a file or document.  If
   the 'endmodule' keyword is not present, the definition ends at the
   end of the file or document.

6.20  Locating Lumas within a Specification

   It is not sufficient to use Lumas alone to define a protocol.
   Additional narrative is required to define the semantics of a
   protocol in addition to the syntax defined by Lumas.  Thus Lumas and
   narrative typically need to be combined in a single document.  The
   issue here is that at some point the Lumas must be extracted from the
   document to be useful.  If the Lumas is intermingled with the
   narrative, it can be manually removed using cut and paste, however
   this is tedious and error-prone.  An alternative is to put all the
   Lumas in a separate section so that it can be easily extracted.
   However, this distances the Lumas specification from the narrative
   that explains it, which is undesirable.  A third option is to do both
   - interleave one copy of the Lumas with the narrative and a separate
   copy that can be used for compiling.  This approach makes it
   difficult to keep the two versions in step, and errors can easily
   creep in.

   Lumas compilers MUST implement a fourth option.  Before parsing a
   file, a compiler MUST first look for a line of text on which the
   first non-white space text is lumas*/ and only has white space after
   it.  If such a line is found, compilation starts at the following
   line.  Subsequent narrative is then included in /* */ comment marks.
   If no such line is found, then compilation begins at the beginning of
   the file.  



Cordell                Expires August 1, 2007              [Page 32]
Internet Draft                 Lumas                     February 2007


   For example, if any */ character sequences that follow this example
   are removed (which have been included to discuss how they are used
   and hence not properly matched), a Lumas compiler must be able to
   find and process the following Lumas syntax:

      lumas*/
      // The first 'official' line of Lumas
      struct top
      {
            not-much   not-much;
      };
      /*
      This is narrative.
      */
      int <0..1> not-much;
      /**

   For a fuller description of Lumas comments, see Section 9.

7.    On-the-Wire Representation

   This section describes the default character based on-the-wire
   encoding of Lumas messages.  Messages defined using the Lumas message
   definition language may be represented using other character encoding
   forms or even binary forms.

7.1   Principles of the default On-the-Wire Encoding

   The basic format of the default text based on-the-wire encoding is to
   use the format:

      tag  =  value

   The tag is a fixed sequence of characters that identifies the
   parameter with which a particular value (or values) is associated.
   For example, there may be multiple parameters that have integer
   values within a struct, that might specify, say, width and height.
   The tags are used to identify which integer value belongs to which
   parameter.

   If there are multiple instances of a parameter, then they may either
   be conveyed as multiple instances of the above construct, or as a
   comma separated list, as in:

      tag  =  value, value, value

   If a tag is explicitly specified in the message definition, then this
   is used on the wire.  If no tag is explicitly specified, then the
   name of the parameter is used as the tag.  

   Tagged items may appear in any order within a struct, and do not have
   to be in the same order as they are defined in the struct definition.



Cordell                Expires August 1, 2007              [Page 33]
Internet Draft                 Lumas                     February 2007


   It is also possible to specify that no tag should be used on the wire
   by specifying 'as ?'.  All untagged items MUST appear in a struct in
   the same order that they are defined in the message definition, and
   MUST appear before any tagged items within a struct definition.
   Untagged parameters that have greater than one instance MUST be
   constructed as a comma separated list.  Thus untagged values have the
   format:

      value

   or:

      value, value, value

   If an untagged parameter has a cardinality that allows it to be
   absent from an encoded message, then all subsequent parameters in the
   enclosing struct, including tagged parameters, MUST also be absent.
   Consequently, great care should be taken when defining a message
   definition that allows untagged parameters to be absent.

   For the examples quoted earlier, that is:

      ascii              rfc-name ;
      int <1..30000>     referenced-rfcs [0..255] as refers;

   The format on the wire would be something like (depending on the
   actual values in question):

      rfc-name = 'Lumas'  refers = 2234, 791, 2045

7.2   Formal On-the-Wire Representation

   The principle representation of a Lumas defined message on the wire
   is text based.  

   The top-level construct of a Lumas definition is a referenced type,
   which essentially has no tag associated with it.  (Indeed, the
   presence of such a tag would not convey any information.)  The
   top-level construct on the wire is therefore either a struct body, or
   a union body, as in:

      lumas-text-message  = (struct-body / union-body) OWS

   A struct body can contain untagged and tagged parameters.  All
   untagged parameters MUST appear before any tagged parameters.  The
   values of untagged parameters that have non-singular cardinality MUST
   be comma separated.  Tagged parameters that have non-singular
   cardinality may either have a tag followed by a comma separated list
   of values, have multiple instances of the "tag = value" form, or some
   combination of the two.  All parameters in a struct body are
   separated by white space, but white space is optional either before
   or after the struct body.  (This logical specification of where white


Cordell                Expires August 1, 2007              [Page 34]
Internet Draft                 Lumas                     February 2007


   space is used leads to an unfortunately complex ABNF definition for a
   struct body.) 

   The definition of a struct-body is therefore:

      struct-body = OWS (
                    struct-untagged-set
                    / struct-tagged-set
                    / (struct-untagged-set  WS  struct-tagged-set) )

      struct-untagged-param = value *( COMMA value )
      struct-untagged-set = struct-untagged-param *(WS struct-untagged-param)

      struct-tagged-param = tag              ; For a void parameter
                      / (tag EQUAL value *( COMMA  value ))
      struct-tagged-set = struct-tagged-param *(WS struct-tagged-param)

   Except for a single integer parameter that may be untagged, all items
   of a union body MUST be tagged.  Also, parameters must only have a
   cardinality of one in the encoding to avoid ambiguities in the
   encoded message.  Therefore a union body has the form:

      union-body =  OWS ( integer-value
                         / tag                   ; For a void parameter
                         / ( tag EQUAL value ) )

   The definition for 'tag' is defined in the common definitions
   section, Section 8.

   'value' has the following definition:

      value = simple-value / compound-value

      simple-value = bool-value / integer-value / float-value / 
                     ipv4-value / ipv6-value /   
                     date-value / time-value  / oid-value /
                     ascii-value / unquoted-ascii-value / unicode-value /
                     const-value / bytes-value / embedded-value

   Which in turn are defined as follows:

      bool-value = True-kw / False-kw / T / F

      integer-value = [ "-" ] 1*DIGIT










Cordell                Expires August 1, 2007              [Page 35]
Internet Draft                 Lumas                     February 2007


      float-value = float-number  
                    / NaN-kw       ; IEEE 754 Not a Number
                    / INF-kw       ; Positive infinity
                    / "-" INF-kw   ; Negative infinity
                    ; Note that "-0" is included in float-number
      float-number   = float-mantissa [ (e/E) float-exponent ]
      float-mantissa = ["-"] 1*DIGIT ["." 1*DIGIT]
      float-exponent = ["-"/"+"] 1*DIGIT

      True-kw        = %x54.72.75.65      ; 'True'
      False-kw       = %x46.61.6C.73.65   ; 'False'
      T              = %x54               ; 'T'
      F              = %x46               ; 'F'
      NaN-kw         = %x4E.61.4E         ; 'NaN'
      INF-kw         = %x49.4E.46         ; 'INF'
      E              = %x45               ; 'E'
      e              = %x65               ' 'e'

   The value encoding of a float is the base 10 representation of a base
   2 number.  There will typically be a degree of error introduced when
   the conversion is made.  Hence the float type should be looked upon
   as a convenient way to convey floating point information where bit
   level accuracy between the encoder's base 2 representation of the
   number and the decoder's base 2 representation of the number is not
   required.  If this is not acceptable, then implementers should seek
   other ways of presenting floating point numbers that do not suffer
   from this loss of accuracy.  

   The 'float-mantissa' part of the number is NOT restricted to the
   range 1.0 to 9.9.  

   An 'oid-value' is represented as:

      oid-value = 1*DIGIT *( "~" 1*DIGIT )    

   As can be seen, only the oid's numerical values are encoded.

   The IP address values are:

      ipv4-value = 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT "." 1*3DIGIT

      ipv6-value = hexseq / hexseq "::" [ hexseq ] / "::" [ hexseq ]
      hexseq         =  hex4 *( ":" hex4)
      hex4           =  1*4HEXDIG

   Note that the IPv4 address within an IPv6 address format is not
   supported.

   Date and time parameters have fixed width to aid parsing.  As such
   the various fields have leading zeros if required.  (They adopt one
   of the ISO-8601 formats.)



Cordell                Expires August 1, 2007              [Page 36]
Internet Draft                 Lumas                     February 2007


   Dates are according to the Gregorian calendar.  Other calendar types
   may be constructed from other types if required.

   Unless the time can be guaranteed to have only local significance,
   the time MUST be converted to UTC prior to including it in a message.
   The time uses 24-hour clock notation.  The absence of the
   'time-seconds' field is interpreted as meaning seconds = 0.

      date-value = date-year "-" date-month "-" date-day-of-month
      date-year = 4DIGIT            ; e.g. 2002
      date-month = 2DIGIT           ; With leading zeros, 01 to 12
      date-day-of-month = 2DIGIT    ; With leading zeros, 01 to 31

      time-value = time-hours ":" time-minutes [ ":" time-seconds ]
      time-hours = 2DIGIT         ; With leading zeros, e.g. 00 to 23
      time-minutes = 2DIGIT       ; With leading zeros, e.g. 00 to 59
      time-seconds = 2DIGIT       ; With leading zeros, e.g. 00 to 59

      unquoted-ascii-value =  first-safe-char *( safe-char )
                       ; See the section 'Notes on Comments' below

   The string types have the format:

      ascii-value = 
           "'" *( %x00-26 / %x28-5B / %x5D-7F / "\\" / "\'" ) "'"

      unicode-value = DQUOTE
                 *( %x00-21 / %x23-5B / %x5D-FF / "\\" / "\" DQUOTE ) 
                  DQUOTE
                             ; DQUOTE defined in [ABNF]

   For 'unicode-value', each Unicode character is represented on the
   wire using the UTF-8 transform [UTF8].

   The 'bytes-value' encodes binary data using the Base64 transform
   [BASE64], and is defined as:

      bytes-value = "[" OWSNC base64-line *( WSNC base64-line ) OWSNC "]"
      base64-line = 0*18( 4BASE64-CHAR ) 
                     ( 
                     ( 4BASE64-CHAR ) /
                     ( 3BASE64-CHAR "=" ) /
                     ( 2BASE64-CHAR "=" "=" )
                     )
      BASE64-CHAR = ALPHA / DIGIT / "+" / "/"

   The white space between base64-lines should include characters to
   move to a new line as specified in [BASE64].






Cordell                Expires August 1, 2007              [Page 37]
Internet Draft                 Lumas                     February 2007


      const-value = first-safe-char *( safe-char )
                       ; See the section 'Notes on Comments' below

      embedded-value = "(" *(%x00-FF) ")" 

   Any occurrence of '(' within an embedded message that is not part of
   a string, must be matched by a corresponding ')'.

   Illustrating the recursiveness of the message format, we have:

      compound-value = struct-value / union-value / combined-value

      struct-value = "{" struct-body "}" 

      union-value = union-body

      combined-value = first-safe-char *( safe-char )

      EQUAL = OWS "=" OWS

7.3   Marking Message Boundaries

   Before a message is parsed it is necessary to know the boundary of
   the message.  There are many ways in which this can be done, and the
   method adopted should be specified in the protocol specification.
   However, in the absence of any other way, Lumas parsers should take
   the presence of an unmatched closing brace to be the end of message
   marker.  Hence, the definition of a message delimited in this way
   becomes:

      delimited-lumas-text-message = lumas-text-message ( "}" / ")" )

7.4   Examples of Encoded Types

   This section illustrates how the types look once they have been
   encoded according to the syntax above.  The tag of each item has the
   format 'my-XXXX'.  Except in the case of the 'void' example, the XXXX
   part indicates the type that is encoded to the right of the equals
   sign.

      my-void                // Tag only for a void parameter

      my-bool = True

      my-int = 5643

      my-float = 102.4519

      my-ipv4 = 192.0.2.1

      my-ipv6 = 2001:DB8::1



Cordell                Expires August 1, 2007              [Page 38]
Internet Draft                 Lumas                     February 2007


      my-date = 2002-02-28

      my-time = 12:00:00

      my-oid = 1~2~840~113549~2~5

      my-ascii = 'Lumas'

      my-unquoted-ascii = Lumas

      my-unicode = "Lumas"

      my-const = Lumas

      my-bytes = [ 01AF3C== ]

      my-embedded = ( my-other-int=5 single-closing-bracket-text=')' )

      my-struct = { 5434 All time=98787654654 }

      my-union = 5434

      my-union = Switch

      my-union = Volume = 11

8.    Common ABNF Definitions

   The following definitions are common to both the message definition
   syntax and the on the wire representation.
























Cordell                Expires August 1, 2007              [Page 39]
Internet Draft                 Lumas                     February 2007


      tag = first-tag-safe-char 0*62( safe-char )
                         ; Tag MUST NOT exceed 63 characters in length

      first-tag-safe-char = %x21 / 
                  ; Not "
                  %x23-26 / 
                  ; Not ' ( )
                  %x28-2B /
                  ; Not , -
                  %x2E-2F /
                  ; Not 0 1 2 3 4 5 6 7 8 9
                  %x3A-3C / 
                  ; Not =
                  %x3E-5A /
                  ; Not [
                  %x5C-7A /
                  ; Not {
                  %x7C /
                  ; Not }
                  %x7E-7F
                  ; Visible characters except = , " ' { } ( ) [ -
                  ; and digits (tags must not get confused with integers)

      first-safe-char = first-tag-safe-char / DIGIT / "-"

      safe-char = first-safe-char / DQUOTE / "'" / "{" / "(" / "["
                        ; Not = } ) ,

      WS = 1*( comment / SP / HTAB / CR / LF )  
                                  ; HTAB, CR, LF defined in [ABNF]
      OWS = [ WS ]                ; Optional white space

      WSNC = 1*( SP / HTAB / CR / LF )    ; Whitespace - no comment
      OWSNC = [ WSNC ]            ; Optional white space - no comment

      COMMA = OWS "," OWS

      ; See section 'Notes on Comments' below for more on comments
      comment = c-comment / cpp-comment / narrative-comment
      c-comment = "/*" <any except */> (nested-end / hard-end )
      nested-end = "*/"
      hard-end = "**/"
      cpp-comment = "//" *( HTAB / %x20-7F ) ( CR / LF )
      narrative-comment = "/**" <any except "lumas*/"> "lumas*/"
                ; A comment is treated as a single space during parsing
   ALPHA, DIGIT, HEXDIG and DQUOTE are defined in [ABNF].

9.    Notes on Comments

   To aid development Lumas allows comments to appear in both a message
   definition and on the wire.  



Cordell                Expires August 1, 2007              [Page 40]
Internet Draft                 Lumas                     February 2007


   On the wire, const and unquoted-ascii values MUST NOT begin with
   comment start markers ('//' and '/*').  However, if the values
   contain comment start marker characters, the characters MUST be
   interpreted as part of the value, and do not indicate the start of a
   comment.  

   For example, in the first of the examples below, the text
   "This-is-a-comment" MUST be treated as a comment, whereas in the
   second example the text "this-is-part-of-the-value" MUST be treated
   as part of the value.

      ascii-value = /*This-is-a-comment*/This-is-the-value

      ascii-value = and-//this-is-part-of-the-value

   In a message definition (but not on the wire) the ABNF c-comment
   production allows nesting of comments.  In a nested comment, each
   occurrence of the '/*' character sequence MUST be matched by a
   corresponding occurrence of the '*/' character sequence before the
   comment ends or, the end of the comment can be forced by the hard end
   of comment marker defined as '**/', which overrides the nesting.
   (This provision allows the commenting out of headers and footers in
   text only message definition documents.)

   To further support Lumas embedded in specification documents, Lumas
   supports a 'narrative-comment'.  These are comments that may
   coincidentally contain Lumas end of comment markers such as C example
   code.  The narrative comment begins with the symbol '/**', and ends
   with the symbol 'lumas*/'.

   A comment is treated as a single space for the purposes of parsing.

10.   Locating Lumas Modules

   It is not intended that applications should find Lumas modules
   'on-the-fly'.  It is expected that some human involvement will be
   required to locate and interpret a Lumas definition.  A Lumas
   definition does not therefore have any way of specifying the physical
   location from where a referenced definition can be acquired.
   Instead, the strategy is to exploit the fact that a module definition
   can begin with the text "lumas module" followed by the module name.
   By entering this text (e.g. "lumas module org.lumas.mine") into a web
   search engine (either one that covers the whole Internet, or is
   limited to a specific site) a user can locate a particular Lumas
   module.  Determining whether a Lumas module so located is authentic
   is beyond the scope of this document.

11.   Mandatory to Understand

   Many protocols require the capability to signal that certain
   extension parameters are mandatory to understand, and if they are not
   understood the message should be rejected in some way.  Lumas


Cordell                Expires August 1, 2007              [Page 41]
Internet Draft                 Lumas                     February 2007


   provides no in-built mechanism for this feature.  Instead
   implementers are recommended to use a feature similar to SIP's
   'Require' header [SIP] which presents a list of feature identifiers
   that must be understood.  Naturally, provision for this mechanism
   must be included in the first version of the protocol, as it is not
   possible to define such semantics at a later time.  An example of
   such a construct might be:

      union require [*] pluggable { };

   And could be populated using:

      plug
          void     my-feature;
      into require;

12.   Security Considerations

   Lumas itself does not have any security issues related to it, but the
   security requirements of a protocol must be borne in mind when
   writing a Lumas message definition.  Common advice is that it is
   difficult to add security to a protocol once it has been released,
   and hence security issues must be considered from the outset.  This
   is of issue to a Lumas message definition as it may affect the format
   of messages.  This is particularly the case for integrity check
   values that are effectively appended to the end of the message once
   it is encoded.  This may mean that it is appropriate to define both a
   main message definition and a message definition that is a wrapper
   that can provide cryptographic services for the main message
   definition.  For example, a message definition wrapper might look
   like:

      struct my-protocol-wrapper
      {
          embedded     main-definition as ?;
          bytes<1..64> signature as signed;
          oid          signature-algorithm as sig-alg;
      };

13.   Normative References

   [ABNF]D. Crocker, & P. Overell, "Augmented BNF for Syntax
         Specifications: ABNF, " Internet Engineering Task Force, RFC
         4234, October 2005.

   [BASE64]N. Freed, & N. Borenstein, "Multipurpose Internet Mail
         Extensions (MIME) Part One: Format of Internet Message Bodies,"
         Internet Engineering Task Force, RFC 2045, November 1996.

   [DOMAINS]J. Postel, "Domain Name System Structure and Delegation,"
         Internet Engineering Task Force, RFC 1591, March 1994.



Cordell                Expires August 1, 2007              [Page 42]
Internet Draft                 Lumas                     February 2007


   [IEEE754]"IEEE Standard for Binary Floating-Point Arithmetic," IEEE
         754-1985, IEEE, 1985.

   [KWORDS]S. Bradner, "Key words for use in RFCs to Indicate
         Requirement Levels," RFC 2119, March 1997.

   [PERL]L. Wall, T.Christiansen, & J. Orwant, "Programming Perl",
         O'Reilly, ISDN-0-596-00027-8.

   [UTF8]F. Yergeau, "UTF-8, a transformation format of ISO 10646," RFC
         2279, January 1998.

14.   Informative References

   [ASN1]International Organization for Standardization, "Information
         Processing Systems - Open Systems Interconnection -
         Specification of Abstract Syntax Notation One (ASN.1)", ISO
         Standard 8824, December 1990.

   [CMS] R. Housley, "Cryptographic Message Syntax," RFC 2630, June
         1999.

   [DIAMETER]Pat R. Calhoun, John Loughney, Erik Guttman, Glen Zorn,
         Jari Arkko, "Diameter Base Protocol,"
         draft-ietf-aaa-diameter-xx, Work in Progress.

   [IP]  "Internet Protocol," RFC 791, September 1981.

   [JSON]"Introducing JSON," http://www.json.org/.

   [OMGIDL]"Common Object Request Broker Architecture: Core
         Specification, " Object Management Group, December 2002.
         (Accessible via:
         http://www.omg.org/technology/documents/corba_spec_catalog.htm)

   [RELAX]OASIS Technical Committee: RELAX NG, "RELAX NG Specification",
         December 2001,
         <http://www.oasis-open.org/committees/relax-ng/spec-20011203.html>.

   [SCHEMA]Thompson, H., Beech, D., Maloney, M. and N. Mendelsohn, "XML
         Schema Part 1: Structures", W3C REC-xmlschema-1, May 2001,
         <http://www.w3.org/TR/xmlschema-1/>, and Biron, P. and A.
         Malhotra, "XML Schema Part 2: Datatypes", W3C REC-xmlschema-2,
         May 2001, <http://www.w3.org/TR/xmlschema-2/>.

   [SIP] J. Rosenberg et al., "SIP: Session Initiation Protocol,"
         Internet Engineering Task Force, RFC 3261, June 2002.

   [SMTP]Klensin, J. (Ed.), "Simple Mail Transfer Protocol", RFC 2821,
         April 2001.

   [SNMP]J. Case, M. Fedor, M. Schoffstall, J. Davin, "A Simple Network


Cordell                Expires August 1, 2007              [Page 43]
Internet Draft                 Lumas                     February 2007


         Management Protocol (SNMP)," RFC 1157, May 1990.

   [STRON]Jelliffe, R., "The Schematron", November 2001,
         <http://www.ascc.net/xml/schematron/>.

   [TCP] "Transmission Control Protocol," RFC 793, September 1981.

   [TLS] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", RFC
         2246, January 1999.

   [UDP] "User Datagram Protocol, " RFC 768, August 1980.

   [XDR] R. Srinivasan, "XDR: External Data Representation Standard,"
         RFC 1832, August 1995.

   [XML] "Extensible Markup Language (XML) 1.0 (Second Edition)", W3C
         REC-xml, October 2000.

   [XMLBCP]S. Hollenbeck, M. Rose, and L. Masinter, "Guidelines for the
         Use of Extensible Markup Language (XML) within IETF Protocols,"
         RFC 3470, January 2003.

   [XMLVER]David Orchard, "Versioning XML Vocabularies," XML.com,
         December 03, 2003,
         http://www.xml.com/pub/a/2003/12/03/versioning.html

15.   Author's Address

   Pete Cordell
   Tech-Know-Ware Ltd
   P.O. Box 30
   Ipswich
   IP5 2WY
   UK
   pete@tech-know-ware.com
   http://www.tech-know-ware.com
   

Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED


Cordell                Expires August 1, 2007              [Page 44]
Internet Draft                 Lumas                     February 2007


   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).






















Cordell                Expires August 1, 2007              [Page 45]