This chapter describes the optional DOM Level 3 Content
Model (CM) feature. This module provides a
representation for XML content models, e.g., DTDs and XML Schemas,
together with operations on the content models, and how such
information within the content models could be applied to XML
documents used in both the document-editing and CM-editing worlds.
It also provides additional tests for well-formedness of XML
documents, including Namespace well-formedness. A DOM application
can use the hasFeature
method of
theDOMImplementation
interface to determine whether a
given DOM supports these capabilities or not. One feature string
for the CM-editing interfaces listed in this section is "CM-EDIT"
and another feature string for document-editing interfaces is
"CM-DOC".
This chapter interacts strongly with the Load and Save chapter, which is also under development in DOM Level 3. Not only will that code serialize/deserialize content models, but it may also wind up defining its well-formedness and validity checks in terms of what is defined in this chapter. In addition, the CM and Load/Save functional areas will share a common error-reporting mechanism allowing user-registered error callbacks. Note that this may not imply that the parser actually calls the DOM's validation code -- it may be able to achieve better performance via its own -- but the appearance to the user should probably be "as if" the DOM has been asked to validate the document, and parsers should probably be able to validate newly loaded documents in terms of a previously loaded DOM CM.
Finally, this chapter will have separate sections to address the needs of the document-editing and CM-editing worlds, along with a section that details overlapping areas such as validation. In this manner, the document-editing world's focuses on editing aspects and usage of information in the CM are made distinct from the CM-editing world's focuses on defining and manipulating the information in the CM.
In the October 9, 1997 DOM requirements document, the following appeared: "There will be a way to determine the presence of a DTD. There will be a way to add, remove, and change declarations in the underlying DTD (if available). There will be a way to test conformance of all or part of the given document against a DTD (if available)." In later discussions, the following was added, "There will be a way to query element/attribute (and maybe other) declarations in the underlying DTD (if available)," supplementing the primitive support for these in Level 1.
That work was deferred past Level 2, in the hope that XML Schemas would be addressed as well. It is anticipated that lowest common denominator general APIs generated in this chapter can support both DTDs and XML Schemas, and other XML content models down the road.
The kinds of information that a Content Model must make available are mostly self-evident from the definitions of Infoset, DTDs, and XML Schemas. Note that some kinds of information on which the DOM already relies, e.g., default values for attributes, will finally be given a visible representation here, however.
The content model referenced in these use cases/requirements is an abstraction and does not refer solely to DTDs or XML Schemas.
For the CM-editing and document-editing worlds, the following use cases and requirements are common to both and could be labeled as the "Validation and Other Common Functionality" section:
Use Cases:
Requirements:
Specific to the CM-editing world, the following are use cases and requirements and could be labeled as the "CM-editing" section:
Use Cases:
Requirements:
Specific to the document-editing world, the following are use cases and requirements and could be labeled as the "Document-editing" section:
Use Cases:
Requirements:
General Issues:
QName
, e.g.,
foo:bar
, whereas the latter will report its namespace
and local name, e.g., {http://my.namespace}bar
. We
have added the isNamespaceAware
attribute to the
generic CM object to help applications determine which of these
fields are important, but we are still analyzing this
challenge.A list of the proposed Content Model data structures and functions follow, starting off with the data structures and "CM-editing" methods.
CMModel
is an abstract object that could map to a
DTD, an XML Schema, a database schema, etc. It's a generalized
content model object, that has both an internal and external
subset. The internal subset would always exist, even if empty, with
the external subset (if present) being represented as by an
"active" CMExternalModel
.
Many CMExternalModel
s
could exist, but only one can be specified as "active"; it is also
possible that none are "active". The issue of multiple content
models is misleading since in this architecture, only one
CMModel
exists, with an internal subset that
references the external subset. If the external subset changes to
another "acitve" CMExternalModel
,
the internal subset is "fixed up." The CMModel also contains the
factory methods needed to create a various types of CMNodes like
CMElementDeclaration
,
CMAttributeDeclaration
, etc.
interface CMModel : CMNode { readonly attribute boolean isNamespaceAware; attribute CMElementDeclaration rootElementDecl; DOMString getLocation(); nsElement getCMNamespace(); CMNamedNodeMap getCMNodes(); boolean removeNode(in CMNode node); boolean insertBefore(in CMNode newNode, in CMNode refNode); boolean validate(); CMElementDeclaration createCMElementDeclaration(inout DOMString namespaceURI, in DOMString qualifiedElementName, in int contentSpec) raises(DOMException); CMAttributeDeclaration createCMAttributeDeclaration(inout DOMString namespaceURI, in DOMString qualifiedName) raises(DOMException); CMNotationDeclaration createCMNotationDeclaration(in DOMString name, in DOMString systemIdentifier, inout DOMString publicIdentifier) raises(DOMException); CMEntityDeclaration createCMEntityDeclaration(in DOMString name) raises(DOMException); CMChildren createCMChildren(in unsigned long minOccurs, in unsigned long maxOccurs, inout unsigned short operator) raises(DOMException); };
isNamespaceAware
of type boolean
, readonlyQNames
.rootElementDecl
of
type
CMElementDeclaration
createCMAttributeDeclaration
namespaceURI
of type
DOMString
qualifiedName
of type
DOMString
A new CMAttributeDeclaration object with
|
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. |
createCMChildren
minOccurs
of type
unsigned long
maxOccurs
of type
unsigned long
operator
of type
unsigned short
A new CMChildren object. |
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. |
createCMElementDeclaration
namespaceURI
of type
DOMString
qualifiedElementName
of
type DOMString
contentSpec
of type
int
A new CMElementDeclaration object with |
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. DUPLICATE_NAME_ERR:Raised if an element declaration already exists with the same name for a given CMModel. |
createCMEntityDeclaration
name
of type
DOMString
A new CMNotationDeclaration object with |
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. |
createCMNotationDeclaration
name
of type
DOMString
systemIdentifier
of type
DOMString
publicIdentifier
of type
DOMString
A new CMNotationDeclaration object with
|
|
INVALID_CHARACTER_ERR: Raised if the specified name contains an illegal character. DUPLICATE_NAME_ERR:Raised if a notation declaration already exists with the same name for a given CMModel. |
getCMNamespace
CMModel
.
|
Namespace of |
getCMNodes
getLocation
|
This method returns a DOMString defining the absolute location from which this document is retrieved including the document name. |
insertBefore
removeNode
validate
|
Is the CM valid? |
CMExternalModel
is an abstract object that could
map to a DTD, an XML Schema, a database schema, etc. It's a
generalized content model object that is not bound to a particular
XML document.
interface CMExternalModel : CMModel { };
CMNode
is analogous to a Node
in the
Core DOM, e.g., an element declaration. This can exist for both CMExternalModel
and CMModel
.
It should be able to handle constructs such as comments and
processing instructions.
Opaque.
interface CMNode { const unsigned short CM_ELEMENT_DECLARATION = 1; const unsigned short CM_ATTRIBUTE_DECLARATION = 2; const unsigned short CM_NOTATION_DECLARATION = 3; const unsigned short CM_ENTITY_DECLARATION = 4; const unsigned short CM_CHILDREN = 5; const unsigned short CM_MODEL = 6; const unsigned short CM_EXTERNALMODEL = 7; readonly attribute unsigned short cmNodeType; attribute CMModel ownerCMModel; attribute DOMString nodeName; attribute DOMString prefix; attribute DOMString localName; attribute DOMString namespaceURI; CMNode clone(); };
CMElementDeclaration
.CMAttributeDeclaration
.
CMNotationDeclaration
.CMEntityDeclaration
.
CMChildren
.CMModel
.CMExternalModel
.cmNodeType
of type unsigned
short
, readonlylocalName
of type
DOMString
qualified name
of
this CMNode.namespaceURI
of type
DOMString
nodeName
of type
DOMString
qualified name
of this CMNode depending on the
CMNode type.ownerCMModel
of type CMModel
CMModel
object associated with this CMNode. For a node of type
CM_MODEL
, this is null
.prefix
of type
DOMString
CMNodeList
is the CM analogue to
NodeList
; the document order is meaningful, as opposed
to CMNamedNodeMap
.
interface CMNodeList { };
CMNamedNodeMap
is the CM analogue to
NamedNodeMap
. The order is not meaningful.
interface CMNamedNodeMap { };
The primitive datatypes supported by base DOM CM implementation
is: string
type only.
interface CMDataType { const short STRING_DATATYPE = 1; short getCMPrimitiveType(); };
string
data type as defined
in XML
Schema Datatypes.getCMPrimitiveType
|
code representing the primitive type of the attached data item. |
The primitive types supported by optional DOM CM implelementations. A DOM application can use the hasFeature method of the DOMImplementation interface to determine whether this interface is supported or not. The feature string for all the interfaces listed in this section is "CMPTYPES" and the version is "3.0".
interface CMPrimitiveType : CMDataType { const short BOOLEAN_DATATYPE = 2; const short FLOAT_DATATYPE = 3; const short DOUBLE_DATATYPE = 4; const short DECIMAL_DATATYPE = 5; const short HEXBINARY_DATATYPE = 6; const short BASE64BINARY_DATATYPE = 7; const short ANYURI_DATATYPE = 8; const short QNAME_DATATYPE = 9; const short DURATION_DATATYPE = 10; const short DATETIME_DATATYPE = 11; const short DATE_DATATYPE = 12; const short TIME_DATATYPE = 13; const short YEARMONTH_DATATYPE = 14; const short YEAR_DATATYPE = 15; const short MONTHDAY_DATATYPE = 16; const short DAY_DATATYPE = 17; const short MONTH_DATATYPE = 18; const short NOTATION_DATATYPE = 19; attribute decimal lowValue; attribute decimal highValue; };
boolean
data type as defined
in XML
Schema Datatypes.float
data type as defined
in XML
Schema Datatypes.double
data type as defined
in XML
Schema Datatypes.decimal
data type as defined
in XML
Schema Datatypes.hexbinary
data type as defined
in XML
Schema Datatypes.base64binary
data type as
defined in XML Schema
Datatypes.uri reference
data type as
defined in XML Schema Datatypes.
Note: @@uriReference is no longer part of the XML Schema PR draft.
XML qualified name
data type
as defined in XML
Schema Datatypes.duration
data type as defined
in XML
Schema Datatypes.datetime
data type as defined
in XML
Schema Datatypes.date
data type as defined in XML Schema
Datatypes.time
data type as defined in
XML
Schema Datatypes.yearmonth
data type as defined
in XML
Schema Datatypes.year
data type as defined in
XML
Schema Datatypes.monthday
data type as defined
in XML
Schema Datatypes.day
data type as defined in XML Schema
Datatypes.month
data type as defined in
XML
Schema Datatypes.NOTATION
data type as defined in
XML
Schema Datatypes.The element name along with the content specification in the
context of a CMNode
.
interface CMElementDeclaration : CMNode { attribute CMDataType elementType; readonly attribute boolean isPCDataOnly; attribute DOMString tagName; int getContentType(); CMChildren getCMChildren(); CMNamedNodeMap getCMAttributes(); CMNamedNodeMap getCMGrandChildren(); };
elementType
of type
CMDataType
isPCDataOnly
of type boolean
, readonlytagName
of type
DOMString
getCMAttributes
CMNamedNodeMap
containing
CMAttributeDeclarations
for all the attributes
that can appear on this type of element.
Attributes list for this |
getCMChildren
Content model of element. |
getCMGrandChildren
CMNamedNodeMap
containing CMElementDeclarations
for all the
Element
s that can appear as children of this type of
element. Note that which ones can actually appear, and in what
order, is defined by the
CMChildren
.
Children list for this |
getContentType
|
Content type constant. |
The content model of a declared element.
interface CMChildren : CMNode { const unsigned long UNBOUNDED = MAX_LONG; const unsigned short NONE = 0; const unsigned short SEQUENCE = 1; const unsigned short CHOICE = 2; attribute unsigned short listOperator; attribute unsigned long minOccurs; attribute unsigned long maxOccurs; attribute CMNodeList subModels; CMNode removeCMNode(in unsigned long nodeIndex); int insertCMNode(in unsigned long nodeIndex, in CMNode newNode); int appendCMNode(in CMNode newNode); };
subModels
. This is
usually the case where the subModels contain a single element
declaration.listOperator
of type unsigned short
subModels
. For example,
if the list operator is CHOICE and the components in subModels are
a, b and c then the content model for the element being declared is
(a|b|c)maxOccurs
of type unsigned long
minOccurs
of type unsigned long
subModels
of type CMNodeList
CMNode
s
in which the element can be defined.appendCMNode
subModels
.
newNode
of type CMNode
|
the length of the |
insertCMNode
nodeIndex
of type
unsigned long
newNode
of type CMNode
|
The index value at which it is inserted. If the nodeIndex is
outside the bound of the |
removeCMNode
nodeIndex
of type
unsigned long
The node removed is returned as a result of this method call.
The method returns |
An attribute declaration in the context of a CMNode
.
interface CMAttributeDeclaration : CMNode { const short NO_VALUE_CONSTRAINT = 0; const short DEFAULT_VALUE_CONSTRAINT = 1; const short FIXED_VALUE_CONSTRAINT = 2; attribute DOMString attrName; attribute CMDataType attrType; attribute DOMString attributeValue; attribute DOMString enumAttr; attribute CMNodeList ownerElement; attribute short constraintType; };
attrName
of type
DOMString
attrType
of type
CMDataType
attributeValue
of type DOMString
constraintType
of type short
enumAttr
of type
DOMString
ownerElement
of
type CMNodeList
Models a general entity declaration in a content model.
interface CMEntityDeclaration : CMNode { const short INTERNAL_ENTITY = 1; const short EXTERNAL_ENTITY = 2; attribute short entityType; attribute DOMString entityName; attribute DOMString entityValue; attribute DOMString systemId; attribute DOMString publicId; attribute DOMString notationName; };
entityName
of type DOMString
entityType
of type short
entityValue
of type DOMString
null
.notationName
of type DOMString
null
.publicId
of type DOMString
null
.systemId
of type DOMString
null
.This interface represents a notation declaration.
interface CMNotationDeclaration : CMNode { attribute DOMString notationName; attribute DOMString systemId; attribute DOMString publicId; };
notationName
of type DOMString
publicId
of type DOMString
systemId
of type DOMString
This section contains "Validation and Other" methods common to
both the document-editing and CM-editing worlds (includes Document
,
DOMImplementation
, and DOMErrorHandler
methods).
The setErrorHandler
method is off of the
Document
interface.
interface Document { void setErrorHandler(in DOMErrorHandler handler); };
setErrorHandler
handler
of type DOMErrorHandler
This interface extends the Document
interface with additional methods for both document and CM
editing.
interface DocumentCM : Document { const short WF_CHECK = 1; const short NS_WF_CHECK = 2; const short PARTIAL_VALIDITY_CHECK = 3; const short STRICT_VALIDITY_CHECK = 4; attribute boolean continuousValidityChecking; attribute short wfValidityCheckLevel; int numCMs(); CMModel getInternalCM(); CMNodeList getCMs(); CMModel getActiveCM(); void addCM(in CMModel cm); void removeCM(in CMModel cm); boolean activateCM(in CMModel cm); };
continuousValidityChecking
of type boolean
wfValidityCheckLevel
of type short
isValid
method.activateCM
CMModel
active. Note that if a user wants to activate one CM to get default
attribute values and then activate another to do validation, a user
can do that; however, only one CM is active at a time. In case
where an attribute is declared in an internal subset and
corresponding ownerElement
points to
CMElementDeclaration
defined in an external subset,
changing active CM will cause the ownerElement
to be
re-computed. If the owner element is not defined in the newly
active CM, the ownerElement
will be an empty node
list.
cm
of type CMModel
CMModel
points to a list of CMExternalModel
s;
with this call, only the specified CM will be active.
|
True if the |
addCM
CMModel
with a document. Can be invoked multiple times to result in a list
of CMExternalModel
s.
Note that only one sole internal CMModel
is associated with the document, however, and that only one of the
possible list of CMExternalModel
s
is active at any one time.
cm
of type CMModel
getActiveCM
CMExternalModel
for a document.
|
getCMs
CMNodes
of typeCM_EXTERNALMODEL
s associated with the
document.This list arises when addCM()
is invoked.
A list of |
getInternalCM
numCMs
CMExternalModel
s
associated with the document. Only one CMModel
can be associated with the document, but it may point to a list of
CMExternalModel
s.
|
Non-negative number of external CM objects. |
removeCM
CMExternalModel
.
Can be invoked multiple times to remove a number of these in the
list of CMExternalModel
s.
cm
of type CMModel
This interface extends the DOMImplementation
interface with additional methods.
interface DOMImplementationCM : DOMImplementation { CMModel createCM(); CMExternalModel createExternalCM(); };
createCM
A NULL return indicates failure. |
createExternalCM
A NULL return indicates failure. |
This section contains "Document-editing" methods (includes
Node
, Element
, Text
and Document
methods).
This interface extends the Node
interface with
additional methods for guided document editing.
interface NodeCM : Node { boolean canInsertBefore(in Node newChild, in Node refChild) raises(DOMException); boolean canRemoveChild(in Node oldChild) raises(DOMException); boolean canReplaceChild(in Node newChild, in Node oldChild) raises(DOMException); boolean canAppendChild(in Node newChild) raises(DOMException); boolean isValid() raises(DOMException); };
canAppendChild
AppendChild
.
newChild
of type
Node
Node
to be appended.
|
Success or failure. |
|
DOMException. |
canInsertBefore
Node::InsertBefore
operation would make this document
invalid with respect to the currently active CM. ISSUE: Describe
"valid" when referring to partially completed documents.
newChild
of type
Node
Node
to be inserted.refChild
of type
Node
Node
.
|
A boolean that is true if the |
|
DOMException. |
canRemoveChild
RemoveChild
.
oldChild
of type
Node
Node
to be removed.
|
Success or failure. |
|
DOMException. |
canReplaceChild
ReplaceChild
.
newChild
of type
Node
Node
.oldChild
of type
Node
Node
to be replaced.
|
Success or failure. |
|
DOMException. |
isValid
|
True if the node is valid/well-formed in the current context and check level defined by wfValidityCheckLevel, false if not. |
|
NO_CM_AVAILABLE: Exception is raised if the DocumentCM related to this node does not have any activeCM and wfValidityCheckLevel is set to STRICT_VALIDITY_CHECK. |
This interface extends the Element
interface with
additional methods for guided document editing.
interface ElementCM : Element,NodeCM { int contentType(); CMElementDeclaration getElementDeclaration() raises(DOMException); boolean canSetAttribute(in DOMString attrname, in DOMString attrval); boolean canSetAttributeNode(in Node node); boolean canSetAttributeNodeNS(in Node node); boolean canSetAttributeNS(in DOMString attrname, in DOMString attrval, in DOMString namespaceURI, in DOMString localName); boolean canRemoveAttribute(in DOMString attrname); boolean canRemoveAttributeNS(in DOMString attrname, inout DOMString namespaceURI); boolean canRemoveAttributeNode(in Node node); };
canRemoveAttribute
attrname
of type
DOMString
|
true or false. |
canRemoveAttributeNS
attrname
of type
DOMString
namespaceURI
of type
DOMString
|
true or false. |
canRemoveAttributeNode
node
of type
Node
Attr
node to remove from the attribute
list.
|
true or false. |
canSetAttribute
attrname
of type
DOMString
attrval
of type
DOMString
|
true or false. |
canSetAttributeNS
setAttributeNS
.
attrname
of type
DOMString
attrval
of type
DOMString
namespaceURI
of type
DOMString
namespaceURI
of namespace.localName
of type
DOMString
localName
of namespace.
|
Success or failure. |
canSetAttributeNode
node
of type
Node
Node
in which the attribute can possibly be
set.
|
Success or failure. |
canSetAttributeNodeNS
node
of type
Node
Attr
to be added to the attribute list.
|
Success or failure. |
contentType
|
Constant for mixed, empty, any, etc. |
getElementDeclaration
CMElementDeclaration object |
|
If no DTD is present raises this exception |
This interface extends the CharacterData
interface
with additional methods for document editing.
interface CharacterDataCM : Text,NodeCM { boolean isWhitespaceOnly(); boolean canSetData(in unsigned long offset, in DOMString arg) raises(DOMException); boolean canAppendData(in DOMString arg) raises(DOMException); boolean canReplaceData(in unsigned long offset, in unsigned long count, in DOMString arg) raises(DOMException); boolean canInsertData(in unsigned long offset, in DOMString arg) raises(DOMException); boolean canDeleteData(in unsigned long offset, in DOMString arg) raises(DOMException); };
canAppendData
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
canDeleteData
offset
of type
unsigned long
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
canInsertData
offset
of type
unsigned long
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
canReplaceData
offset
of type
unsigned long
count
of type
unsigned long
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
canSetData
offset
of type
unsigned long
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
isWhitespaceOnly
|
True if content only whitespace; false for non-whitespace if it is a text node in element content. |
This interface extends the DocumentType
interface
with additional methods for document editing.
interface DocumentTypeCM : DocumentType,NodeCM { boolean isElementDefined(in DOMString elemTypeName); boolean isElementDefinedNS(in DOMString elemTypeName, in DOMString namespaceURI, in DOMString localName); boolean isAttributeDefined(in DOMString elemTypeName, in DOMString attrName); boolean isAttributeDefinedNS(in DOMString elemTypeName, in DOMString attrName, in DOMString namespaceURI, in DOMString localName); boolean isEntityDefined(in DOMString entName); };
isAttributeDefined
elemTypeName
of type
DOMString
attrName
of type
DOMString
|
Success or failure. |
isAttributeDefinedNS
elemTypeName
of type
DOMString
attrName
of type
DOMString
namespaceURI
of type
DOMString
namespaceURI
of namespace.localName
of type
DOMString
localName
of namespace.
|
Success or failure. |
isElementDefined
elemTypeName
of type
DOMString
|
Success or failure. |
isElementDefinedNS
elemTypeName
of type
DOMString
namespaceURI
of type
DOMString
namespaceURI
of namespace.localName
of type
DOMString
localName
of namespace.
|
Success or failure. |
isEntityDefined
entName
of type
DOMString
|
Success or failure. |
This interface extends Attr
to provide guided
editing of an XML document.
interface AttributeCM : Attr,NodeCM { CMAttributeDeclaration getAttributeDeclaration(); CMNotationDeclaration getNotation() raises(DOMException); };
getAttributeDeclaration
The attribute declaration corresponding to this attribute |
getNotation
Returns the notation declaration for this attribute if the type is of notation type, null otherwise. |
|
DOMException |
This section contains DOM error handling interfaces.
Basic interface for DOM error handlers. If an application needs
to implement customized error handling for DOM such as CM or
Load/Save, it must implement this interface and then register an
instance using the setErrorHandler
method. All errors
and warnings will then be reported through this interface.
Application writers can override the methods in a subclass to take
user-specified actions.
interface DOMErrorHandler { void warning(in DOMLocator where, in DOMString how, in DOMString why) raises(DOMSystemException); void fatalError(in DOMLocator where, in DOMString how, in DOMString why) raises(DOMSystemException); void error(in DOMLocator where, in DOMString how, in DOMString why) raises(DOMSystemException); };
error
where
of type DOMLocator
how
of type
DOMString
why
of type
DOMString
|
A subclass of DOMException. |
fatalError
where
of type DOMLocator
how
of type
DOMString
why
of type
DOMString
|
A subclass of DOMException. |
warning
where
of type DOMLocator
how
of type
DOMString
why
of type
DOMString
|
A subclass of DOMException. |
This interface provides document location information and is similar to a SAX locator object.
interface DOMLocator { int getColumnNumber(); int getLineNumber(); DOMString getPublicID(); DOMString getSystemID(); Node getNode(); };
getColumnNumber
|
The column number, or -1 if none is available. |
getLineNumber
|
The line number, or -1 if none is available. |
getNode
|
The NODE, or null if none is available. |
getPublicID
|
A string containing the public identifier, or null if none is available. |
getSystemID
|
A string containing the system identifier, or null if none is available. |
Editing and generating a content model falls in the CM-editing world. The most obvious requirement for this set of requirements is for tools that author content models, either under user control, i.e., explicitly designed document types, or generated from other representations. The latter class includes transcoding tools, e.g., synthesizing an XML representation to match a database schema.
It's important to note here that a DTD's "internal subset" is part of the Content Model, yet is loaded, stored, and maintained as part of the individual document instance. This implies that even tools which do not want to let users change the definition of the Document Type may need to support editing operations upon this portion of the CM. It also means that our representation of the CM must be aware of where each portion of its content resides, so that when the serializer processes this document it can write out just the internal subset. A similar issue may arise with external parsed entities, or if schemas introduce the ability to reference other schemas. Finally, the internal-subset case suggests that we may want at least a two-level representation of content models, so a single DOM representation of a DTD can be shared among several documents, each potentially also having its own internal subset; it's possible that entity layering may be represented the same way.
The API for altering the content model may also be the CM's official interface with parsers. One of the ongoing problems in the DOM is that there is some information which must currently be created via completely undocumented mechanisms, which limits the ability to mix and match DOMs and parsers. Given that specialized DOMs are going to become more common (sub-classed, or wrappers around other kinds of storage, or optimized for specific tasks), we must avoid that situation and provide a "builder" API. Particular pairs of DOMs and parsers may bypass it, but it's required as a portability mechanism.
Note that several of these applications require that a CM be able to be created, loaded, and manipulated without/before being bound to a specific Document. A related issue is that we'd want to be able to share a single representation of a CM among several documents, both for storage efficiency and so that changes in the CM can quickly be tested by validating it against a set of known-good documents. Similarly, there is a known problem in DOM Level 2 where we assume that the DocumentType will be created before the Document, which is fine for newly-constructed documents but not a good match for the order in which an XML parser encounters this data; being able to "rebind" a Document to a new CM, after it has been created may be desirable.
As noted earlier, questions about whether one can alter the content of the CM via its syntax, via higher-level abstractions, or both, exist. It's also worth noting that many of the editing concepts from the Document tree still apply; users should probably be able to clone part of a CM, remove and re-insert parts, and so on.
In addition to using the content model to validate a document instance, applications would like to be able to use it to guide construction and editing of documents, which falls into the document-editing world. Examples of this sort of guided editing already exist, and are becoming more common. The necessary queries can be phrased in several ways, the most useful of which may be a combination of "what does the DTD allow me to insert here" and "if I insert this here, will the document still be valid". The former is better suited to presentation to humans via a user interface, and when taken together with sub-tree validation may subsume the latter.
It has been proposed that in addition to asking questions about specific parts of the content model, there should be a reasonable way to obtain a list of all the defined symbols of a given type (element, attribute, entity) independent of whether they're valid in a given location; that might be useful in building a list in a user-interface, which could then be updated to reflect which of these are relevant for the program's current state.
Remember that namespaces also weigh in on this issue, in the case of attributes, a "can-this-go-there" may prompt a namespace-well-formedness check and warn you if you're about to conflict with or overwrite another attribute with the same namespaceURI/localName but different prefix... or same nodeName but different namespaceURI.
As mentioned above, we have to deal with the fact that the shortest distance between two valid documents may be through an invalid one. Users may want to know several levels of detail (all the possible children, those which would be valid given what precedes this point, those which would be valid given both preceding and following siblings). Also, once XML Schemas introduce context sensitive validity, we may have to consider the effect of children as well as the individual node being inserted.
The most obvious use for a content model (DTD or XML Schema or any Content Model) is to use it to validate that a given XML document is in fact a properly constructed instance of the document type described by this CM. This again falls into the document-editing world. The XML spec only discusses performing this test at the time the document is loaded into the "processor", which most of us have taken to mean that this check should be performed at parse time. But it is obviously desirable to be able to validate again a document -- or selected subtrees -- at other times. One such case would be validating an edited or newly constructed document before serializing it or otherwise passing it to other users. This issue also arises if the "internal subset" is altered -- or if the whole Content Model changes.
In the past, the DOM has allowed users to create invalid documents, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax... or that they would be checked for validity when read back in. We considered adding validity checks to the DOM's existing editing operations to prevent creation of invalid documents, but are currently inclined against this for several reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations, e.g., if the change is occurring in a context where we know the result will be valid. Second, "the shortest distance between two good documents may be through a bad document". Preventing a document from becoming temporarily invalid may impose a considerable amount of additional work on higher-level code and users Hence our current plan is to continue to permit editing to produce invalid DOMs, but provide operations which permit a user to check the validity of a node on demand.
Note that validation includes checking that ID attributes are unique, and that IDREFs point to IDs which actually exist.
XML defined the "well-formed" (WF) state for documents which are parsed without reference to their DTDs. Knowing that a document is well-formed may be useful by itself even when a DTD is available. For example, users may wish to deliberately save an invalid document, perhaps as a checkpoint before further editing. Hence, the CM feature will permit both full validity checking (see previous section) and "lightweight" WF checking, as requested by the caller, as well as processing entity declarations in the CM even if validation is not turned on. This falls within the document-editing world.
While the DOM inherently enforces some of XML's well-formedness conditions (proper nesting of elements, constraints on which children may be placed within each node), there are some checks that are not yet performed. These include:
In addition, Namespaces introduce their own concepts of well-formedness. Specifically:
namespaceNormalize
operation, which would
create the implied declarations and reconcile conflicts in some
reasonably standardized manner. This may be a major undertaking,
since some DOMs may be using the namespace to direct subclassing of
the nodes or similar special treatment; as with the existing
normalize
method, you may be left with a
different-but-equivalent set of node objects.In the past, the DOM has allowed users to create documents which violate these rules, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax. We considered adding WF checks to the DOM's existing editing operations to prevent WF violations from arising, but are currently inclined against this for two reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations (for example, if the change is occurring in a context where we know the illegal characters have already been prevented from arising). Second, "the shortest distance between two good documents may be through a bad document" -- preventing a document from becoming temporarily ill-formed may impose a considerable amount of additional work on higher-level code and users. (Note possible issue for Serialization: In some applications, being able to save and reload marginally poorly-formed DOMs might be useful -- editor checkpoint files, for example.) Hence our current plan is to continue to permit editing to produce ill-formed DOMs, but provide operations which permit a user to check the well-formedness of a node on demand, and possibly provide some of the primitive (e.g., string-checking) functions directly.