[sc34wg3] Editorial structure of N0396

Martin Bryan sc34wg3@isotopicmaps.org
Tue, 22 Apr 2003 14:41:37 +0100


Patrick wrote

> On a more technical issue, you might want to note that definition of
> String in the SAM:
>
> > String
> >
> >     Strings are sequences of abstract Unicode characters conforming to
> >     Unicode Normalization Form C [unicode]
> >     <http://www.isotopicmaps.org/sam/sam-model/#unicode>
> >
>
> While following the W3C for XML 1.1 (see details at:
> http://www.w3.org/TR/charmod/) does exclude (unless this is one of those
> optional things) other normalization forms that may be required in
> non-Web based topic map contexts. This may be of particular significance
> for systems using Chinese/Japanese texts in non-web based topic maps.

Coming from someone who I seem to remember criticised me for suggesting that
something other than the concrete abstract syntax should be applicable in
SGML I find this somewhat rich ;-)

W3C have, after much arguing and many years of wrangling, finally got around
to agreeing a single prefered normalization form for Unicode within XML
documents and Patrick wants us to allow topic map users to be able to adopt
an alternative normalization scheme!!! This is supposed to make integration
of topic maps easier in some way. Two topic maps, using different encodings,
both in XTM cannot be merged safely if they adopt different normalization
methods.

Having said that, I do believe that this statement should not be part of the
SAM model, but should be part of the XTM serialization of the model. As HyTM
is based on SGML rather than XML we can expect user-defined character sets
to be defined as part of HyTM. We can, of course, agree to differ as to
whether or not topic maps based on different character sets need to be
normalized to conform to Form C before being interchanged/merged. This
subject should be added to one of the discussion lists for London, but I'd
hate to suggest which one.

Martin Bryan