[sc34wg3] Canonicalization of embedded XML

Lars Marius Garshol larsga at ontopia.net
Fri Apr 21 04:18:44 EDT 2006

* Lars Heuer
> Your example
>       Foo <a d="e" b="c">bar</a>
> is a valid one for embedded XML?

Yes. (It's valid according to the RELAX-NG schema, and there's  
nothing in the prose prohibiting it.)

> The XTM reader has to search for the XML inside the character sequence
> "Foo <a d="e" b="c">bar</a>" and canonicalize it?

No. The XTM reader will be using an XML parser. That XML parser will  
parse this XML fragment in the same way as the rest of the document.  
The XTM reader then has to switch into "canonicalizing mode" inside  

Note, however, that this requirement is much less strong than it  
seems, since canonicalization is a procedure that only has two effects:

   (1) It preserves the namespace declarations in the source context  
       might otherwise be lost.

   (2) It transforms the *syntactic expression* of the embedded XML.

What this means is that unless you are going to treat the embedded  
XML as a string you can ignore (2), and only preserve the namespace  
declarations. The downside is that for duplicate suppression  
(equality rules in TMDM etc) you actually do treat the embedded XML  
as a string. I wouldn't be too surprised if some Topic Maps software  
were to have a disclaimer on the box stating that it doesn't do this.

> If I understood it correctly this would also be valid example for
> embedded XML:
>    <foo a="b">bla</foo>
>    <bar c="d">blub</bar>
> Is such thing c14n'able with standard conform XML canonicalizers since
> the root node is missing?


> BTW: The TMDM mentions xsd:anyType as datatype, the XTM mentions
> xsd:any as datatype.

Whoops. I'll look into this. Thanks!

Lars Marius Garshol, Ontopian               http://www.ontopia.net
+47 98 21 55 50                             http://www.garshol.priv.no

More information about the sc34wg3 mailing list