[sc34wg3] How Two Syntaxes Can Make One Standard

Steven R. Newcomb sc34wg3@isotopicmaps.org
Tue, 24 Jul 2001 19:12:32 -0500

We had intended to file a Defect Report for 13250, but we are
submitting this paper to the committee for discussion, instead.  We're
hoping that the questions we raise in this paper will provoke
discussion that will provide guidance for the development of a full
Defect Report, or perhaps for some other kind of approach to the
fulfillment of the community's goals for 13250.

We feel that if we're going to file a Defect Report, we should do it
in the context of consensus that that's the right thing to do, and
that the approach that such a Defect Report (if any) will outline will
be the right way to do it.

We feel that, even though this is more of a discussion kickoff than a
formal agenda-driver, it's an important enough paper to deserve an
SC34 number.  (So, James Mason and/or Sara Hafele, can you please
number this paper?  Thanks.)

Michel and Steve

Michel Biezunski, InfoLoom
Tel +33 1 44 59 84 29 Cell +33 6 03 99 25 29
Email: mb@infoloom.com  Web: www.infoloom.com

Steven R. Newcomb, Consultant
Coolheads Consulting
voice: +1 972 359 8160
fax:   +1 972 359 0270
1527 Northaven Drive
Allen, Texas 75002-1648 USA


ISO/IEC 13250:2000: How Two Syntaxes Can Make One Standard

July 24, 2001

Michel Biezunski and Steven R. Newcomb


The ISO/IEC 13250:2000 "Topic Maps" International
Standard, which seems about to integrate a second
interchange syntax, the XTM DTD, does not explain to
what degree, and exactly how, the two syntaxes are
functionally equivalent.  The standard should explain

How to describe the semantic commonalities of the syntaxes?

One might think that there are two ways to formalize
the semantic commonalities of the two syntaxes:

  (1) Describe a rigorous syntactic transformation
      process that will show how instances of one
      syntax can be transformed into instances of the
      other syntax, or

  (2) Describe how instances of each syntax can be
      transformed into instances of the common
      underlying model (which could be, but need not
      be, a syntactic model), and describe how
      instances of the underlying model can be
      transformed into instances of each syntax.

The first approach might seem easier, at least
superficially.  However, if we select this solution, we
are focusing on just two syntaxes, instead of
recognizing the fact that information that has the
character of topic map information may be expressed in
many different notations.  It is highly desirable to be
able to federate all kinds of "finding information",
not just the finding information that happens to be
expressed in one of only two syntaxes.  For example, it
would be inappropriate to exclude instances of RDF or
NewsML from the possibility of being understood as
interchangeable topic map documents, with their
information becoming directly available to topic map
application software.  If we adopt the first approach,
RDF and NewsML instances would be only indirectly
available, by means of some sort of syntactic
transformation into the form of a syntactic topic map,
which would then, in turn, be parsable as a topic map
and made available to topic map applications.  The
extra overhead and inconvenience of this transformation
would be a barrier for RDF and NewsML instances.

Unlike the first approach, the second approach will be
applicable to any number of notations, although the ISO
13250 standard would only actually apply the approach
to the two syntaxes.  The second approach is more
ambitious in the sense that it requires that the
underlying foundational model be made explicit, and it
will make topic map applications far more ubiquitous
and omnivorous over the long term.

The difference between topic map syntax and topic map

The structure of the topic maps that are represented
for interchange in either the existing HyTime-based
syntax of 13250, or in the newly-contributed XTM
syntax, is *not* identical to the syntactic structures
of the documents used to interchange them.  Therefore,
neither 13250-based nor XTM-based topic map documents
are "ready-to-use" by application-specific logic.  In
other words, a syntactically represented topic map
doesn't reflect exactly what a topic map software
application would be expected to understand from it.
Before a topic map software application can be expected
to perform its application-specific functions, generic
processing -- processing that must be performed in
order to understand the topic map that an
interchangeable instance of that topic map is designed
to represent -- to make the topic map "ready-to-use".

>From an economic standpoint, there are significant
advantages in using a distinct software module that
implements this generic processing, commonly called a
"topic map engine" or a "topic map parser".  We urge
that the term "topic map parsing" be reserved to mean
all of the aspects of "topic map processing" that are
required to be done by all topic map software that
takes, as input, interchangeable topic maps that
conform to either the HyTime-based or XTM-based
syntaxes.  We urge that the term "topic map processing"
be used generically, so that it can be used to refer to
any kind of processing, including both topic map
parsing (as just defined) and application-specific
processing of ready-to-use topic maps.

Four rules must be applied by all topic map parsers:

-- the subject-based merging rule
-- the name-based merging rule
-- the node-demander rule
-- the no-redundancy rule

These rules are already implicit in 13250.  We propose
that 13250 should emphasize their definitions and to
explain their ramifications.  These explanations will
be invaluable to users of the standard who need to
create conventions for the understanding of instances
of various (both ISO and non-ISO) notations as sources
of topic map information.

We urge that 13250 should fully explain and constrain
the topic maps parsing process, but only to the extent
of describing the rules and goals of the parsing
process, and how these rules and goals are to be
applied in the case of each of the two syntaxes.  For
the Topic Maps software industry, this is the
least-constraining approach that is consistent with
13250's goal of facilitating universal and accurate
understanding of Topic Maps information.  This approach
allows software vendors to compete on the grounds of
product differentiation, without unduly increasing the
cost of merging disparate topic maps emanating from
multiple, differently-specialized software

Two Underlying Models Have Been Proposed

Two different underlying models, both expressed in
terms of how XTM instances should be understood by
topic map parsers, have been contributed to the
discussion.  Both deserve serious attention.

 - An "XML Infoset"-like model, called "A Topic Map
   Data Model", has been proposed by Lars Marius

 - A "Processing Model for XTM 1.0" has been proposed
   by Michel Biezunski and Steven R. Newcomb.

The two proposals do not necessarily contradict each
other, and the advantages and drawbacks of each of them
should be studied.

The underlying model that will be adopted by ISO must
clarify how specific applications of Topic Maps can be
defined and identified.

The documents that are available for study include:

 - Lars Marius Garshol, "A Topic Map Data Model -- An
   infoset-based proposal",

 - Michel Biezunski and Steven R. Newcomb,
   "Topicmaps.net's Processing Model for XTM 1.0,
   version 1.0.1" [now sometimes called "PMTM4"],

   Other materials offer help in understanding PMTM4:

   - Biezunski/Newcomb, "The Structure of Topic Maps
     Foundations," http://www.topicmaps.net/struct.htm

   - Biezunski/Newcomb, "A Topic Maps Graph in XML,
     http://www.topicmaps.net/simpleTMGraph3.htm and

   - Biezunski/Newcomb, "An API to a Topic Maps Graphs
     in XML", http://www.topicmaps.net/TMGraphAPI3.htm
     and http://www.topicmaps.net/TMGraphAPI3.dtd

The decisions that will be taken on these issues will
influence the work that need to be done to complete the
work in progress for a topic map query language as well
as the one for a topic map constraint language.

We encourage the members of the ISO working group WG3
to read these documents and to send questions and
comments to the newly created mailing list for
discussion.  (The subscription server is
http://www.isotopicmaps.org/mailman/listinfo/sc34wg3 )