[sc34wg3] Canonicalization of embedded XML

Lars Marius Garshol larsga at ontopia.net
Wed Mar 22 14:39:26 EST 2006


* Lars Heuer
>
> Requiring the canonicalization makes implementing XTM writers more
> complex. If I look at the org.apache.xml.security.c14n package I've to
> recognize that you'll need a lot of code to provide a correct XML
> canonicalizer.

It's true that canonicalizing XML is non-trivial, and it does add  
complexity. However, XTM exporters are unaffected. It's perfectly OK  
to embed non-canonical XML in XTM; the requirement is that you must  
canonicalize it when you *read* the XTM.

In other other words,

   <topic ...>
     ...
     <occurrence>
       <instanceOf X
       <resourceData>Foo <a b="c" d="e">bar</a></resourceData>
     </occurrence>
     <occurrence>
       <instanceOf X
       <resourceData>Foo <a d="e" b="c">bar</a></resourceData>
     </occurrence>
     ...
   </topic>

is a topic with a single occurrence of type X, because of the  
requirement that you canonicalize on reading. It's perfectly valid  
XTM, though, despite the XML not being canonical.

> If the c14n process is avoidable I'd like to avoid it.

Avoidable it certainly is, if we decide to omit that requirement.

> The reason 1) is IMO not very good because duplicate suppression is
> not a task for XTM but may be done internally, implementation
> specific.

What do you mean? We have equality rules for all item types...

> Maybe the TM processor does duplicate suppression with XML
> C14N, but the creator / writer (either human or machine) of the XTM
> document shouldn't be forced to do the canonicalization.

He/she/it isn't, so that part is fine.

> Not forcing C14N makes it also easy to spit out XTM by applications
> that are not TM processors (i.e. by a PHP script or whatever).

Yep.

--
Lars Marius Garshol, Ontopian               http://www.ontopia.net
+47 98 21 55 50                             http://www.garshol.priv.no




More information about the sc34wg3 mailing list