[sc34wg3] Comments on CXTM (N0454)

11 Dec 2003 16:35:27 +0100

* Kal Ahmed
|
| I have been instructed by the WG to produce a CD and then make a
| call for implementations, so it seems that all great minds are
| thinking alike on this point :-)

Great! Sounds like exactly what we want. When do you think you can
have a CD ready?

| We discussed this and the general feeling was that it would be
| better to provide explicit values everywhere (including all the
| missing ones).  <snip>

OK. I have no strong feelings about this.

* Lars Marius Garshol
|
| Here we refer to ISO 10646 character codes instead of Unicode scalar
| values as does TMDM. I realize the Infoset uses this terminology, but
| USV is a) consistent with TMDM, b) more accurate, and c) we can be
| consistent and reference Unicode instead of ISO 10646 everywhere.

* Kal Ahmed
|
| You will have to help me out here, because character sets are not my
| strong point. Are Unicode scalar values the same as 10646 character
| codes ? 

Nearly. As far as I know ISO 10646 "character code" is the same as
Unicode "code point". Unicode "scalar value" is different, however,
because only characters have them, so the surrogate code points are
explicitly excluded. This *is* a detail.

| If not, won't this cause problems in the XML canonicalisation step ?

Nope. The numbers are the same, so the result will be the same. It's
just an editorial issue of consistency across 13250, really.

| I want to avoid having to write a CXTM-specific XML canonicalisation
| algorithm but just define an XML Infoset and then say "serialise
| this using CXML".

Agreed.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >