[sc34wg3] Canonical XTM: implementation report

Lars Marius Garshol sc34wg3@isotopicmaps.org
Thu, 22 Jan 2004 19:50:18 +0100

* Kal Ahmed
| Excellent! That makes you the winner of the first pint :)

Wheee! I hope the winners get real ale, as opposed to lager. :-)
| Yeah, the goal of CXTM should be string comparison, not
| human-readability, although it would be useful to have some more
| human-readable formatting, I think that its a minor issue really.

Yes and no. It does make it harder to work with CXTM if you have to
build a special viewer in order to be able to even look at the files.
I think in practice not being able to look directly at the files
(something you'll want to do often...) is going to be quite awkward.

* Lars Marius Garshol
| If anyone wants actual canonicalized documents with corresponding
| input I'll be happy to provide examples.
* Kal Ahmed
| That would be a Good Thing.

I can do a couple of examples and post them in a .zip.
* Lars Marius Garshol
|  - All the empty XML Infoset properties being specified throughout the
|    document makes the useful stuff drown in the dross and really makes
|    the document hard to read. I think it would be much easier to
|    review and implement this standard if we cut that out, since then
|    the substance would be visible rather than hidden.
* Kal Ahmed
| Yeah, its a balance between readability and completeness. In the end
| the people round the table at Philadelphia voted for completeness.

One wonders if they've actually tried reading the document carefully. :)

| Lets see what comes out of the CD stage, but I take your comment on
| board.

| Ken and I have had problems with the HTML and PDF generated by the
| stylesheets and I got really busy and haven't got back to him yet on
| this. So there is no CD yet, but I'll try and get something back to
| Ken this week. It might make sense for me to edit in the stuff about
| relativizing addresses along with some of the points below before
| going out to CD, so there is some silver lining to this particular
| cloud.

* Lars Marius Garshol
|  - 3.3: TMDM already requires strings to be in NFC, so there's no need
|    to repeat it here. (It could go in as a note if someone feels it's
|    a useful clarification.)

* Kal Ahmed
| OK, there is no sense in repeating TMDM - though I think the note
| would be useful.

I've nothing against a note.
* Lars Marius Garshol
|  - 4.7: The RNC schema contradicts the order given here. The schema
|    has scope first, then type, while the text has it the other way
|    around.

* Kal Ahmed
| The text is right, this order was changed for consistency with other
| serialisation orders.

Good! ('Cause I followed the text. :)
* Lars Marius Garshol
|  - 4.10: Here we need to give more guidance on how to serialize
|    locators. I think we should stress that they should be
|    externalized, meaning that in URIs difficult characters should be
|    escaped etc. Referencing some relevant W3C document specifying this
|    would be good. (RFC 2396 is less clear than it could be on
|    precisely which characters *must* be escaped.)

* Kal Ahmed
| Is it necessary to escape the characters ? 

Actually, no, it isn't, and it's a lot easier not to do it, but then
you have to specify that the characters should *not* be escaped. Come
to think of it it may be a better idea not to escape them.

| What is TMDM's position on this, I would have thought that the
| string value of the address property is a completely unescaped
| string. 

Yes. The TMDM is not a serialization format, though, so it's not
concerned about this kind of thing. I just automatically assumed that
since CXTM was writing stuff out we had to escape so that we could
read back in, but if we never read back in...

| After all, here we are not concerned with this address being usable
| as a URI or even with it conforming to the relevant RFCs, we just
| need a canonical string representation of the address. So I would
| have thought an unescaped URI string in NFC would do the trick.

It would. (And it will be in NFC by default, since that's a constraint
on the TMDM string type.)

| [reified]
| I'm in two minds about this. On the one hand leaving it out does
| make the canonicalisation process simpler. On the other hand it
| means that there is no test that the [reified] property is set
| correctly. I think CXTM needs to be complete even if it makes it
| more onerous to implement.
| By the same logic, the topic.[roles] property must be canonicalised
| too.

Both of these are going to be real bastards to get right. I think you
are to some extent right that although these properties are strictly
speaking redundant it may be worthwhile to verify that implementations
have picked them up correctly. On the other hand, it's difficult to
see how someone could screw this up if they get the other stuff right.
I'm tempted to say that it's a goal to keep CXTM easy to implement in
order to encourage people to actually test their implementations. If
we raise the bar on this we make it harder for people to verify that
they are conformant.

| [4.13]
| Yeah, I guess that its not really necessary to repeat 4.10 here - we
| should instead canonicalise the locator item that is the value of
| the [resource] property.

I'd prefer that, yes.
| I think that all the places that call out to canonicalising the
| [type] property say that it should only be done if the property
| value is not null. So I think we are covered.

You are indeed covered. My bad. (You could of course move the
condition out to the relevant section and shorten the text, but there
is no bug in the text as it stands.) 

Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >