[sc34wg3] Line breaks in CXTM

Kal Ahmed sc34wg3@isotopicmaps.org
Sun, 28 Mar 2004 21:30:37 +0100

On Fri, 2004-03-26 at 22:53, Robert Barta wrote:
> On Fri, Mar 26, 2004 at 03:43:43PM +0100, Steve Pepper wrote:
> > Secondly, the most readily available diff tools (such as Unix
> > diff) won't work with such documents because they are line or
> > record oriented. Again this is a great inconvenience which I
> > think is unnecessary.
> xmldiff? Or piping the XML through xmllint --format is not an
> option?
> > I would like therefore to propose that the spec be amended to
> > include the insertion of suitable line breaks. Essentially
> > section 5 should specify the insertion of line feeds such
> > that canonicalization according to XML-C14N would result in a
> > document with line feeds after every end-tag, and also after
> > every start-tag for elements that have element content or are
> > empty.
> Steve,
> If I understand that correctly, I would strongly discourage that. We
> are trying to fix a problem in the area of __canonicalized XML__ in a
> TM context.  As XTM has no mixed content creating a baseline with
> _any_ formatting tool is always possible and is orthogonal to CXTM.

Actually, its not really broken in CXML - the recommendation specifies a
normalisation for whitespace between elements, but does not remove it

What does need to be considered (and I hope to do that between now and
our meeting in Amsterdam) is if/how this affects the building of the XML
infoset that is described by the CXTM standard. I suspect that it does
not have any effect at all, but I need to make sure.

The other issue is exactly what form of line break is to be inserted
(CR, CR/LF etc.) but thats just a matter of choosing one.

> The argument, that this is difficult to read for a human is IMHO no
> argument here.

That was my original argument from a purist point of view. Then I tried
to implement a canonicalizer and test it. Testing it became a major pain
because visually diffing output and expected output quickly gets
difficult with large files.

In principal I agree with the purist standpoint. In practice, I now see
the distinct advantage in making CXTM output more human-readable because
doing so gives you much more information than just PASS or FAIL - it
lets you (as a developer or tester) see exactly where your output failed
to match what was expected and to figure out why.


Kal Ahmed <kal@techquila.com>