[sc34wg3] The interpretation of mnemonics

Sun, 27 Apr 2003 02:38:11 +0200

This posting is about the issue of mnemonics in HyTM. You should
only read it if you are interested in using or supporting HyTM,
if you are an editor of ISO 13250, or if you care deeply about
the Topic Maps standard as a whole. It is intended as background
to the discussion on HyTime at the WG3 meeting on Friday May 2nd.
Page references are to ISO 13250:2000 at

   http://www.y12.doe.gov/sgml/sc34/document/0129.pdf

1. What are mnemonics?
----------------------

"Mnemonics" is the collective name I am using for a set of
attributes on many element types in the HyTM DTD, as follows:

   Element type       Mnemonic   Page  (a)  (b)  (c)
   -------------------------------------------------
   topic              linktype    12    N    Y    N
   occurs             occrl       17    Y    Y    Y
   assoc              linktype    19    Y    Y    Y
   assocrl            anchrole    20    Y    Y    Y
   facet              linktype    23    Y    Y    Y
   fvalue             facetval    24    N    Y   (Y)

These attributes do not have corresponding constructs in either
XTM or the current version of the SAM. The issue is therefore
how they should be handled when deserializing a HyTM topic map
document.

In fact, the mnemonics I have shown for <topic> and <fvalue> are
not explicitly termed "mnemonics" in 13250, although all the
others are. (This is the significance of column (a).) However, I
believe all these attributes share the same general properties,
serve the same purpose, and should be treated in the same way.

2. Mnemonics and generic identifiers
------------------------------------

One thing they all have in common is that they default to the
generic identifier of the element on which they are specified.
(This is the significance of column (b).) For those not familiar
with SGML architectures, this requires a short explanation:

The HyTM DTD is an architectural DTD. This means that it is
designed to be used as a basis from which other DTDs can be
derived according to specific rules described in the HyTime
standard. One of the things you can do in a derived (or client)
DTD is change the generic identifiers. For example you could
have an element type called <language> that is derived from the
<topic> element type (which is more properly called an
architectural form). When architectural processing is performed
on an element of this type, it turns into a <topic> element. The
client DTD can specify that the original GI ('language') becomes
the value of an attribute during architectural processing.

For example, you might have the following element in a HyTM
document that conforms to a derived DTD:

   <language>
     <name>Japanese</name>
   </language>

Since 13250 specifies that 'linktype' defaults to the GI, the
client DTD would have to be written such that this would be
interpreted as the following:

   <topic linktype="language">
     <basename>Japanese</basename>
   </topic>

The same thing can be done with all the other element types in
the list above. The advantage of this facility, which I believe
Martin Bryan has exploited quite extensively, is that it enables
information architects to craft conforming topic map DTDs that
are much more finely attuned to the needs of the information and
the users than the base DTD given in the standard.

3. Mnemonics and the 'type' attribute
-------------------------------------

Another thing that most mnemonics have in common is that they
convey essentially the same kind of information as the 'type'
attribute on the corresponding element. This is stated
explicitly in 13250 for every mnemonic except those on <topic>
and <fvalue> elements. (This is the significance of column (c).)
For <topic> elements, 13250 explicitly states (Note 24, page 11)
that

   Neither the value of the linktype attribute nor the generic
   identifier of a topic link has any significance with respect
   to the topic mapping semantics defined by this International
   Standard.

I will return to this note later.

In the case of the <fvalue> mnemonic the text is less clear but
I believe the intention was that it too should convey the same
kind of information as the 'type' attribute.

By "convey the same kind of information" I mean that mnemonics
are used to express exactly the same semantics as the 'type'
attribute, but with less flexibility and representational power.
A number of notes with almost identical wording make this very
clear (see Notes 38, 40, 44, and 49). Note 38, on the <occurs>
element type, is typical:

   NOTE 38 The topic referenced via the type attribute can have
   many names in scopes designed for many different user
   contexts, including many different natural languages and
   delivery platforms, while the occrl attribute or generic
   identifier is just a single token. Therefore, the use of a
   topic, referenced by the type attribute, to characterize the
   occurrence role offers far more flexibility and
   representational power than the simple mnemonic naming
   facility offered by the occrl attribute or generic identifier.

What this note is essentially saying is that the 'occrl'
attribute provides a less powerful way of doing what the 'type'
attribute does. In other words:

   <occurs occrl="email">

is less powerful than

   <occurs type="email">

because the latter references a topic, which itself can have
characteristics, including multiple names in different
languages, thus allowing the type of the occurrence to be
expressed in more flexible ways.

In addition, the HyTM DTD contains a number of comments stating
that mnemonics will not be displayed to users if there exists a
suitable name for the topic referenced by the 'type' attribute.
So even if one were to use both, e.g.:

   <occurs occrl="email" type="email">

the 'occrl' attribute could sometimes/often/always be ignored
(depending on what names the topic with the ID "email" had).

On a historical note: The overlap between mnemonics and the
'type' attribute, and the greater flexibility and
representational power of the latter, were the reasons why
mnemonics were not included in XTM 1.0. I vividly recall one of
the editors saying at the TopicMaps.Org meeting in Dallas:
"Mnemonics? What are they? Oh yes, I remember. I've always hated
those things. Let's get rid of them. People should use topics
instead." I believe he was absolutely right and have never used
mnemonics myself.*

4. Inconsistencies with mnemonics
---------------------------------

Why are mnemonics treated differently for <topic> elements? I do
not know. There may be a good reason but it is nowhere stated.
It could conceivably have to do with the fact that <topic>
elements do not have 'type' attributes - they have 'types'
attributes (because topics, unlike all the other constructs, can
belong to multiple classes) - but I doubt it.

Most likely it is just another one of the inconsistencies and
lacunae 13250 is riddled with. (Those inconsistencies, by the
way, can mostly be traced back to the fact that 13250 was
standardized without a formal data model and in the absence of
sufficient implementation experience.) Perhaps it really was the
intention, despite Note 24, that the 'linktype' attribute should
convey the same kind of information as the 'types' attribute. In
that case, Note 24 should be regarded as a bug and removed.

If, on the other hand, Note 24 is to be taken at face value,
then anyone designing client DTDs with element types derived
from the <topic> architectural form (as in my <language>
example, above) has been wasting their time creating topic maps
whose semantics are not interchangeable.

A bug in 13250, or broken topic maps? Clearly, this issue needs
to be resolved, urgently.

So why are mnemonics treated differently for <fvalue> elements?
Again, I don't know. There could have been a Note 50, like the
notes on all the other mnemonic-bearing element types (except
for <topic>) but there isn't. Probably just another minor
inconsistency.

5. Mnemonics in the SAM
-----------------------

In the light of the preceding discussion, how should mnemonics
(and generic identifiers) be handled when deserializing to the
SAM?

In the case of generic identifiers, I think the answer is easy.

In order for a HyTM document to be interpreted correctly it must
first undergo architectural processing in conformance with the
HyTime standard. This results in a document that conforms to the
base DTD specified in 13250. At this point, the generic
identifiers have all been "normalized" and their tokens have
ended up as the values of the mnemonic attributes. So in
practice we can forget about the generic identifiers and just
focus on the mnemonics.

Since mnemonics convey the same kind of information as 'type'
attributes, they should be treated in a similar way. My proposal
is as follows:

1) When both mnemonic and 'type' attributes are present, the
mnemonic's value should become an additional base name in the
scope "http://psi.topicmaps.org/hytm/1.0/#mnemonic" on the topic
referenced by the 'type' attribute.

2) When only the mnemonic attribute is present, its value should
become the base name in the scope
"http://psi.topicmaps.org/hytm/1.1/#mnemonic" of a *new* topic.
That topic should be made the value of the [type] property of
the information item that corresponds to the element in question.

As an example of case 2):

   <occurs occrl="email">mailto:pepper@ontopia.net</occurs>

would become

   <topic id="xyz123">
     <baseName>
       <scope>
         <subjectIndicatorRef
           xlink:href="http://psi.topicmaps.org/hytm/1.1/#mnemonic"/>
       </scope>
       <baseNameString>email</baseNameString>
     </baseName>
   </topic>

   <occurrence>
     <instanceOf>
       <topicRef xlink:href="#xyz123"/>
     </instanceOf>
     <resourceRef xlink:href="mailto:pepper@ontopia.net"/>
   </occurrence>

Case 1) would be the same, except that there would already be a
topic with the ID "email" (due to the 'type' attribute) which
would get an additional base name identical to the one shown
above.

I believe this handles the mnemonic problem very cleanly and
entirely within the model of the current draft of the SAM. There
may be one or two tiny details that still need to be sorted out
(in addition to the lack of clarity regarding mnemonics on
<topic> conformant elements), but in principle I believe this
simple proposal is workable.

An alternative would be to throw out mnemonics altogether in a
revised and non-backward compatible HyTM 1.1. I imagine this
suggestion will drive Martin into a frenzy, but I know there are
others who would see this as the best solution, not just me and
the aforementioned editor in Dallas.

The argument, I think, is that mnemonics give us just two things
over and above what the 'type' attribute provides:

(1) the ability to create quick and dirty topic maps that don't
use topics when they should,* and

(2) a whole lot more overhead that we could really do without,
given the complexity of the model we already have.

I think those are two things we would be better off without,
but if Martin can present a convincing case for retaining
mnemonics (I doubt anyone else will even try :-), I'm willing to
do so based on the proposal above.

Steve

* If you want to say something, say it with a topic. That's my
   philosophy!