DOM Level 3 will provide an API for loading XML source documents into a DOM representation and for saving a DOM representation as a XML document.
Some environments, such as the Java platform or COM, have their own ways to persist objects to streams and to restore them. There is no direct relationship between these mechanisms and the DOM load/save mechanism. This specification defines how to serialize documents only to and from XML format.
Requirements that apply to both loading and saving documents.
Documents must be able to be parsed from and saved to the following sources:
Note that Input and Output streams take care of the in memory case. One point of caution is that a stream doesn't allow a base URI to be defined against which all relative URIs in the document are resolved.
While creating a new document using the DOM API, a mechanism must be provided to specify that the new document uses a pre-existing Content Model and to cause that Content Model to be loaded.
Note that while DOM Level 2 creation can specify a Content Model when creating a document (public and system IDs for the external subset, and a string for the subset), DOM Level 2 implementations do not process the Content Model's content. For DOM Level 3, the Content Model's content must be read.
When processing a series of documents, all of which use the same Content Model, implementations should be able to reuse the already parsed and loaded Content Model rather than parsing it again for each new document.
This feature may not have an explicit DOM API associated with it, but it does require that nothing in this section, or the Content Model section, of this specification block it or make it difficult to implement.
Some means is required to allow applications to map public and system IDs to the correct document. This facility should provide sufficient capability to allow the implementation of catalogs, but providing catalogs themselves is not a requirement. In addition XML Base needs to be addressed.
Loading a document can cause the generation of errors including:
Saving a document can cause the generation of errors including:
This section, as well as the DOM Level 3 Content Model section should use a common error reporting mechanism. Well-formedness and validity checking are in the domain of the Content Model section, even though they may be commonly generated in response to an application asking that a document be loaded.
The following requirements apply to loading documents.
Parsers may have properties or options that can be set by applications. Examples include:
A mechanism to set properties, query the state of properties, and to query the set of properties supported by a particular DOM implementation is required.
The fundamental requirement is to write a DOM document as XML source. All information to be serialized should be available via the normal DOM API.
There are several options that can be defined when saving an XML document. Some of these are:
The following items are not committed to, but are under consideration. Public feedback on these items is especially requested.
Provide the ability for a thread that requested the loading of a document to continue execution without blocking while the document is being loaded. This would require some sort of notification or completion event when the loading process was done.
Provide the ability to examine the partial DOM representation before it has been fully loaded.
In one form, a document may be loaded asynchronously while a DOM based application is accessing the document. In another form, the application may explicitly ask for the next incremental portion of a document to be loaded.
Provide the capability to write out only a part of a document. May be able to leverage TreeWalkers, or the Filters associated with TreeWalkers, or Ranges as a means of specifying the portion of the document to be written.
Document fragments, as specified by the XML Fragment specification, should be able to be loaded. This is useful to applications that only need to process some part of a large document. Because the DOM is typically implemented as an in-memory representation of a document, fully loading large documents can require large amounts of memory.
XPath should also be considered as a way to identify XML Document fragments to load.
Document fragments, as specified by the XML Fragment specification, should be able to be loaded into the context of an existing document at a point specified by a node position, or perhaps a range. This is a separate feature than simply loading document fragments as a new Node.
DocumentBuilder
(Sun) and DOMParser
(Xerces).SAXException.toString()
and
SAXException.getMessage()
always the same? If not, we
need to add another attribute.DOMSystemException
needs to
be defined as part of the error handling module that is to be
shared with CM. Common I/O type errors need to be defined for it,
so that they can be reported in a uniform way. A way to embed
errors or exceptions from the OS or language environment is needed,
to provide full information to applications that want it.This section defines an API for loading (parsing) XML source documents into a DOM representation and for saving (serializing) a DOM representation as an XML document.
The proposal for loading is influenced by Sun's JAXP API for XML Parsing in Java, http://java.sun.com/xml/download.html, and by SAX2, available at http://www.megginson.com/SAX/index.html
Here is a list of each of the interfaces involved with the Loading and Saving XML documents.
DOMImplementationLS
-- A new DOMImplementation
interface that provides the
factory methods for creating the objects required for loading and
saving.DOMBuilder
-- A parser interface.DOMInputSource
-- Encapsulate information about the source of the XML to be
loaded.DOMEntityResolver
-- During loading, provides a way for applications to redirect
references to external entities.DOMBuilderFilter
-- Provide the ability to examine and optionally remove Element
nodes as they are being processed during the parsing of a
document.DOMWriter
-- An interface for writing out or serializing DOM documents.DOMImplementationLS
contains the factory methods
for creating objects implementing the DOMBuilder
(parser) and DOMWriter
interfaces.
interface DOMImplementationLS { DOMBuilder createDOMBuilder(); DOMWriter createDOMWriter(); };
createDOMBuilder
DOMBuilder
.
The newly constructed parser may then be configured by means of its
setFeature()
method, and used to parse documents by
means of its parse()
method.
The newly created parser object. |
createDOMWriter
A parser interface.
DOMBuilder
provides an API for parsing XML
documents and building the corresponding DOM document tree. A
DOMBuilder
instance is obtained from the DOMImplementationLS
interface by invoking its
createDOMBuilder()
method.
DOMBuilder
s have a number of named properties that
can be queried or set. Here is a list of properties that must be
recognized by all implementations.
validate-if-cm
feature will alter the validation
behavior when this feature is set true.interface DOMBuilder { attribute DOMEntityResolver entityResolver; attribute DOMErrorHandler errorHandler; attribute DOMBuilderFilter filter; void setFeature(in DOMString name, in boolean state) raises(DOMException); boolean supportsFeature(in DOMString name); boolean canSetFeature(in DOMString name, in boolean state); boolean getFeature(in DOMString name) raises(DOMException); Document parseURI(in DOMString uri) raises(DOMException, DOMSystemException); Document parseDOMInputSource(in DOMInputSource is) raises(DOMException, DOMSystemException); };
entityResolver
of
type DOMEntityResolver
DOMEntityResolver
has been specified, each time a reference to an external entity is
encountered the DOMBuilder
will pass the public and
system IDs to the entity resolver, which can then specify the
actual source of the entity.errorHandler
of
type DOMErrorHandler
DOMDcoumentBuilder
will call back to
the errorHandler
with the error information.
Note: The DOMErrorHandler interface is being developed separately, in conjunction with the design of the content model and validation module.
filter
of type DOMBuilderFilter
Element
node. The filter implementation can choose to
remove the element from the document being constructed or to
terminate the parse early.canSetFeature
DOMBuilder
to recognize a feature
name but to be unable to set its value.
name
of type
DOMString
state
of type
boolean
|
true if the feature could be successfully set to the specified value, or false if the feature is not recognized or the requested value is not supported. The value of the feature itself is not changed. |
getFeature
name
of type
DOMString
|
The current state of the feature (true or false). |
|
Raise a NOT_FOUND_ERR When the |
parseDOMInputSource
DOMInputSource
.
is
of type DOMInputSource
DOMInputSource
from which the source document is to be read.
|
Exceptions raised by |
|
Exceptions raised by |
parseURI
uri
of type
DOMString
|
Exceptions raised by |
|
Exceptions raised by |
setFeature
DOMBuilder
to recognize a feature
name but to be unable to set its value.
name
of type
DOMString
state
of type
boolean
|
Raise a NOT_SUPPORTED_ERR exception When the
Raise a NOT_FOUND_ERR When the |
supportsFeature
DOMBuilder
recognizes a feature name.DOMBuilder
to recognize a feature
name but to be unable to set its value. For example, a
non-validating parser would recognize the feature "validation",
would report that its value was false, and would raise an exception
if an attempt was made to enable validation by setting the feature
to true.
name
of type
DOMString
|
true if the feature name is recognized by the
|
This interface represents a single input source for an XML entity.
This interface allows an application to encapsulate information about an input source in a single object, which may include a public identifier, a system identifier, a byte stream (possibly with a specified encoding), and/or a character stream.
The exact definitions of a byte stream and a character stream are binding dependent.
There are two places that the application will deliver this
input source to the parser: as the argument to the
parseDOMInputSource
method, or as the return value of
the DOMEntityResolver.resolveEntity
method.
The DOMBuilder
will use the DOMInputSource
object to determine how to
read XML input. If there is a character stream available, the
parser will read that stream directly; if not, the parser will use
a byte stream, if available; if neither a character stream nor a
byte stream is available, the parser will attempt to open a URI
connection to the resource identified by the system identifier.
An DOMInputSource
object belongs to the
application: the parser shall never modify it in any way (it may
modify a copy if necessary).
interface DOMInputSource { attribute DOMInputStream byteStream; attribute DOMReader characterStream; attribute DOMString encoding; attribute DOMString publicId; attribute DOMString systemId; };
byteStream
of
type DOMInputStream
characterStream
of type DOMReader
encoding
of type
DOMString
publicId
of type
DOMString
systemId
of type
DOMString
DOMEntityResolver
Provides a way for applications
to redirect references to external entities.
Applications needing to implement customized handling for
external entities must implement this interface and register their
implementation by setting the entityResolver
property
of the DOMBuilder
.
The DOMBuilder
will then allow the application to intercept any external entities
(including the external DTD subset and external parameter entities)
before including them.
Many DOM applications will not need to implement this interface, but it will be especially useful for applications that build XML documents from databases or other specialized input sources, or for applications that use URI types other than URLs.
DOMEtityResolver
is based on the SAX2
EntityResolver
interface, described at
http://www.megginson.com/SAX/Java/javadoc/org/xml/sax/EntityResolver.html
interface DOMEntityResolver { DOMInputSource resolveEntity(in DOMString publicId, in DOMString systemId ) raises(DOMSystemException); };
resolveEntity
DOMBuilder
will call this method before opening any external entity except the
top-level document entity (including the external DTD subset,
external entities referenced within the DTD, and external entities
referenced within the document element); the application may
request that the DOMBuilder
resolve the entity itself, that it use an alternative URI, or that
it use an entirely different input source.DOMBuilder
must resolve it fully before reporting it to the application
through this interface.
Note: See issue #4. An alternative would be to pass the URL out without resolving it, and to provide a base as an additional parameter. SAX resolves URLs first, and does not provide a base.
publicId
of type
DOMString
systemId
of type
DOMString
A |
|
Any |
DOMBuilderFilter
s provide applications the ability
to examine Element nodes as they are being constructed during a
parse. As each elements is examined, it may be modified or removed,
or the entire parse may be terminated early.
interface DOMBuilderFilter { boolean endElement(in Element element); };
endElement
element
of type
Element
|
return true |
DOMWriter provides an API for serializing (writing) a DOM document out in the form of a source XML document. The XML data is written to an output stream, the type of which depends on the specific language bindings in use.
Three options are available for the general appearance of the formatted output: As-is, canonical and reformatted.
DOMWriter
accepts any node type for serialization.
For nodes of type Document
or Entity
, well formed XML will be created. The
serialized output for these node types is either as a Document or
an External Entity, respectively, and is acceptable input for an
XML parser. For all other types of nodes the serialized form is not
specified, but should be something useful to a human for debugging
or diagnostic purposes. Note: rigorously designing an external
(source) form for stand-alone node types that don't already have
one defined by the XML rec seems a bit much to take on here.
Within a Document or Entity being serialized, Nodes are processed as follows
DOMWriter.writeNode()
, output a Text Decl and the
entity expansion. The resulting output will be valid as an external
entity.Document
."&entityName;"
) in the output. Child
nodes (the expansion) of the entity reference are ignored.Within the character data of a document (outside of markup), any characters that cannot be represented directly are replaced with character references. Occurrences of '<' and '&' are replaced by the predefined entities < and &. The other predefined entities (>, &apos, etc.) are not used; these characters can be included directly. Any character that can not be represented directly in the output character encoding is serialized as a numeric character reference.
Attributes not containing quotes are serialized in quotes. Attributes containing quotes but no apostrophes are serialized in apostrophes (single quotes). Attributes containing both forms of quotes are serialized in quotes, with quotes within the value represented by the predefined entity ". Any character that can not be represented directly in the output character encoding is serialized as a numeric character reference.
Within markup, but outside of attributes, any occurrence of a character that cannot be represented in the output character encoding is reported as an error. An example would be serializing the element <LaCañada/> with the encoding=US-ASCII
Unicode Character Normalization. When requested by setting the
normalizeCharacters
option on DOMWriter, all data to
be serialized, both markup and character data, is normalized
according to the rules defined by Unicode Canonical Composition,
Normalization Form C. The normalization process affects only the
data as it is being written; it does not alter the DOM's view of
the document after serialization has completed. The W3C character
model and normalization are described at
http://www.w3.org/TR/charmod/#TextNormalization. Unicode
normalization forms are described at http://www.unicode.org/unicode/reports/tr15/
Name space checking and fixup during serialization is a user option. When the option is selected, the serialization process will verify that name space declarations, name space prefixes and the name space URIs associated with Elements and Attributes are consistent. If inconsistencies are found, the serialized form of the document will be altered to remove them. The exact form of the alterations are not defined, and are implementation dependent.
Any changes made affect only the name space prefixes and declarations appearing in the serialized data. The DOM's view of the document is not altered by the serialization operation, and does not reflect any changes made to name space declarations or prefixes in the serialized output.
DOMWriter
s have a number of named properties that
can be queried or set. Here is a list of properties that must be
recognized by all implementations.
interface DOMWriter { attribute DOMString encoding; readonly attribute DOMString lastEncoding; attribute unsigned short format; // Modified in DOM Level 3: attribute DOMString newLine; void writeNode(in DOMOutputStream destination, in Node node) raises(DOMSystemException); };
encoding
of type
DOMString
format
of type
unsigned short
lastEncoding
of type
DOMString
, readonlynewLine
of type
DOMString
, modified in DOM Level
3writeNode
DOMWriter
. Writing a Document or
Entity node produces a serialized form that is well formed XML.
Writing other node types produces a fragment of text in a form that
is not fully defined by this document, but that should be useful to
a human for debugging or diagnostic purposes.
destination
of type
DOMOutputStream
node
of type
Node
Document
or Entity
node to be written. For other node types,
something sensible should be written, but the exact serialized form
is not specified.
|
This exception will be raised in response to any sort of IO or system error that occurs while writing to the destination. It may wrap an underlying system exception. |