[sc34wg3] Question on TNC / Montreal minutes

Marc de Graauw sc34wg3@isotopicmaps.org
Sat, 7 Sep 2002 00:17:01 +0200


* Steve Pepper
| To my mind, there are only two levels at which the TNC on/off switch
| can operate meaningfully:
|
| (1) At the level of the topic map as a whole.
| (2) At the level of individual scopes.

* Lars Marius Garshol
| Well, there is another level altogether: whether it is part of the
| core standard or up to individual applications.

* Steve Pepper
| Let's have Level Zero
|
| (0) At the level of the application.

This is "Let's ditch the TNC" phrased politely, not? :-)

I think it would be a big mistake to leave the TNC up to applications. The TNC
supports some very generic behaviours that merit a place in the standard. As I
have said before, I do agree the TNC should be optional (and at level (2)).

I will use an example from my Business Maps article
(http://www.xml.com/pub/a/2002/08/21/topicmapb2b.html). There I needed topics
for data items in B2B vocabularies. The names of those data items will be
unique in the context of the containing B2B vocabulary. In my presentation in
Barcelona I had modelled this thus (data item 'CustomerName' in B2B vocabulary
'Bizwords!'):

<topic id="name">
  ...skipped...
  <subjectIdentity>
    <subjectIndicatorRef xlink:href="http://psi.bizwords.com#CustomerName"/>
  </subjectIdentity>
  <baseName>
    <scope>
      <topicRef xlink:href="#bizwords"/>
    </scope>
    <baseNameString>CustomerName</baseNameString>
  </baseName>
  ...skipped...
</topic>

I noted there are two ways in this topic to merge with another topic: merging
based on subjectIdentity and merging based on the TNC. As a former relational
database administrator I am completely allergic to unnecessary redundancy. It
will only lead to mistakes (one mechanism says merge, the other says don't)
and multiplication of unwarranted merges due to human mistakes. So one
mechanism had to go. The name is not a candidate: users will need the name to
navigate the Topic Map. So the subject identity is redundant: the name plus
the TNC already establishes subject identity. This applies not only to this
example, but to every case of a controlled vocabulary with unique names. Note
that in such vocabularies things as string matching are usually not going to
cause much trouble, since those vocabularies are well defined. There is one
big advantage names have over subject indicators: names are human readable,
subject indicators are not (by most humans).

There are two other arguments against subject identity in the case of a
controlled vocabulary:
1) The vocabulary is already there and can be used right away. PSI's have to
be made first, which is more effort.
2) Some realms do not lend themselves very well to defining PSI's because they
are too volatile. Social security number (privacy issues aside for the sake of
the argument) spring into existence every day. Do we really want to have
someone republish all those numbers as PSI's again? (We could of course use
PSI's which are not published on the WWW, but to me that sounds suspiciously
much like using names and disguising them as PSI's.)

Since the behaviour described is pretty common, I believe we have a very
strong case not to leave the TNC up to applications, just make it optional.
The basic rules would than be:

If you have a neat, well defined, controlled vocabulary, use names and the
TNC.
If not, use subject identity.
Do not use both if you do not have to.

I have been convinced in private email exchange by Lars though that the
default for the TNC should be "off":

* Lars Marius Garshol
| Names are not formal, and so may easily be
| spelt in different ways, have different punctuation, have different
| numbers of spaces etc.
|
| And most of them are not guaranteed to be unique within any scope...

Yes, most names are not unique in any scope. Since this is the general case
and well-defined controlled vocabularies are the exception (though still
common), the default should be "off". I do not think complete backward
compatibility justifies having all those new users creating Topic Maps with
non-unique names and ending up with a single gargantuan topic simply because
the default was "on".

Marc