[sc34wg3] TMQL - Unicode as "native character set"?

Patrick Durusau patrick at durusau.net
Wed Sep 2 15:18:08 EDT 2009


I am not sure what we mean by:

3.1 Relationships to other standards -- where we say in #2:

> The native character set of TMQL shall be Unicode
The reference is to Unicode 3.0 but I assume we mean the current version 
of Unicode. Yes?

Shouldn't we also specify an encoding to be supported? Like UTF-8?

Or for that matter, I am not real sure what "native" means. Default? 
Subject to specifying some other specific encoding? I don't think we 
should limit the data recorded in topic maps to strictly being in UTF-8.

Considering that data we wish to view as a topic map may be in any 
number of "native" encodings.

As far as requirements, my suggestion would be the XML character set in 
UTF-8 as a default, NFC as the base normalization, with the ability to 
declare other encodings, normalizations and collations.

Setting a base line but also allowing applications to compete by their 
support for other encodings, normalizations and encodings.

Hope everyone is having a great day!


PS: I would suggest that under 4 Requirements for the Language, 4.1 
Functionality, where we say:

> TMQL shall support all natural languages equally. That is, TMQL shall 
> be fully internationalized with respect to text representation, text 
> ordering, etc.
We lose that as a requirement. Or at least define what we mean in some 
meaningful way. Such as I have suggested above for defining Unicode 
support and normalization required, identification of other 
normalizations and collations (Good use for PSIs).

Before anyone protests in favor of internationalization remember that 
Unicode now includes Sumerian (listed as Sumero-Akkadian) and while I 
would welcome TMQL providing the ability to query strings written in its 
base-60 number system, I really don't want to see TMQL delayed until we 
define that capacity. Another example, Ugaritic is known to have a 
different "native" sort order but is recorded in Unicode using the 
modern Hebrew order for similar characters.

Supporting all natural languages *equally* is an excellent ideal but I 
would prefer that we enable *others* to support the languages of their 

Patrick Durusau
patrick at durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

More information about the sc34wg3 mailing list