[sc34wg3] How Two Syntaxes Can Make One Standard

27 Jul 2001 10:48:12 +0200

Here is my response to the discussion kick-off paper. It is not really
a paper, but I've given it headers to make it easier to read, since it
is so long. Hopefully this means another step forward.

  INTRODUCTION
==============

I'm not sure exactly how to interpret the July 24 paper, but I assume
it is about finding out what the defects of ISO 13250 are, as they
relate to the need for a foundational model, and how we can fix those.
If so, I very much appreciate the attempt to start a formal discussion
on this. If not, please correct me.

I very much support the idea that before we start creating a
foundational model we should have some consensus on what that model is
supposed to achieve. I am hoping that the Montréal meeting can make
some progress towards achieving such consensus.

I feel that the paper as presented belabours the obvious, but on the
other hand it is difficult to criticize that, given the amount of
controversy and confusion we've already had on this issue. In this
response I try to do the following, in this order: summarize how I
understand the July 24 paper, summarize what I feel is missing from
it, address some issues (as I see them) with the July 24 text, and
then summarize what it is I think we need to do now.

  SUMMARY OF THE JULY 24 PAPER
==========================

As I read the paper, it says that we need a foundational model in
order to meet the requirements listed below, and that the lack of such
a model in ISO 13250 is a defect.

 - The structure of topic map information, as represented by the
   various syntaxes, must be explicitly defined in a way that is
   independent of any particular syntax. That is, there must be a
   foundational model.

 - The process of building instances of the model from the XTM 1.0 and
   HyTime-based syntaxes must be clearly defined. This will also serve
   to integrate the two syntaxes into a single standard with a single
   meaning. 

 - The specification must be written in such a way that independent
   parties can define mappings from other syntaxes and data models
   into the foundational model.

 - The specification of the model and the model-building processes
   must not unduly constrain implementations.

These requirements I fully agree with and support. It may be that
there were other similar requirements in the paper, but if so I did
not realize that they were considered to be of similar importance, and
would like to have them pointed out. I may also have misunderstood
parts of the paper, in which case I would like to be corrected. 

  OMISSIONS OF THE JULY 24 PAPER
================================

While I think I agree with most of the July 24 paper, I don't feel
that it covers all the requirements we have for the data model. For
all I know comprehensiveness may not have been part of the intent.
However, I feel that it we are to develop a foundational model we must
first know what we wish to achieve with it.

Here are the main things I feel we need a foundational model to do:

 - Provide a framework in which the semantics of topic maps can be
   clearly defined, independent of any particular syntax.

 - Rigorously define the requirements on implementations when building
   instances of the model from serialized topic maps or other data
   sources. The requirements need to be spelled out in exhaustive
   detail, to the point where it is possible to create a suite of
   conformance tests.

 - Rigorously define the requirements implementations must conform to
   when maintaining topic maps internally. This is necessary because
   large and important classes of topic map applications may never
   actually load their topic maps from somewhere else, and it is
   necessary to spell out what is and is not allowed when a topic map
   is maintained in a topic map implementation, whether by human
   authors or by software.

 - Provide a foundation in terms of which standards like TMQL and TMCL
   can be defined. This requirement is a must-have for both these
   efforts. As long as it is not met the standards can either make no
   progress, or they must specify their own models, and run the risk
   of diverging from ISO 13250.

 - Rigorously define how to serialize topic maps from the model and
   into the two standardized syntaxes. This is necessary in order to
   fully describe the appropriate transformation from one syntax to
   the other. Coupled with other "topic map building specifications"
   and corresponding deserialization specifications, it will also
   describe how to convert between any two representations of topic
   maps.

How we can meet these requirements (as well as those from the July 24
paper) is discussion I hope we can defer until after we have agreed
what it is we want to achieve.

  DETAIL RESPONSE TO THE JULY 24 PAPER
======================================

The first question to be asked is of course: what is a foundational
model? Is it a data model? A conceptual model? A parsing model? All of
these, and perhaps even more than just these? I don't really know the
answer, but I think the authors of the July 24 paper chose their words
with care. What is a foundational model to you?

Whether to go for a syntactic transformation or a model approach ((1)
vs (2)) is to me obvious: only a model can possibly meet the
requirements. TMQL and TMCL cannot be specified syntactically, both
because they would then have to be specified once for each syntax, and
because such an approach would be unbearably clumsy.

As for the terms "topic map processing" and "topic map parsing" I
agree that "processing" is better used as a generic term. "Parsing",
on the other hand, may not be the perfect term. "Parsing" generally
means (in computer science) to build an abstract structure from a
textual input. 

Deserialization is more general, in that it does not imply that the
data source be textual, but it does imply that it is serial. Perhaps
"topic map building" or "topic map construction" would work better? It
seems to me that these terms can also encompass binary sources as well
as data sources that are effectively databases, or even servers
accessed via some protocol (such as LDAP).

These are just suggestions; I have no strong opinions on this. I just
feel finding the right terms is worth some effort, so that we can
stick to them and stop changing our terminology all the time.

"Four rules must be applied by all topic map parsers", the July 24
paper says. This list of rules is of course very far away from being
comprehensive, so I assume that it is not meant to be. There is a very
large number of cases that must be handled by topic map processors. I
described some of them in the infoset model, but even that is by no
means exhaustive.

"The two proposals do not necessarily contradict each other...". I
agree. In fact, I think much could be gained by carefully reconciling
them and ensuring that they are compatible. I am open to the
possibility that what may serve our needs best may be a specification
that combines both. 

The graph may be best suited to describing the semantics (or meaning,
or is-ness) of topic map information, while the infoset model may the
best foundation for the "topic map building" models and the ancillary
specifications like TMQL and TMCL.

With the proper mapping between the two models, such an approach could
work very well. I am also open to the idea that something else
entirely would be the best approach.

"..., and the advantages and drawbacks of each of them should be
studied." I very strongly agree with this. However, it is impossible
to perform this evaluation without a clear idea of what the models are
supposed to accomplish. Before we know what the models should do, how
can we judge how well they do it, or even if they do it at all?

In my humble opinion, what we need right now is a requirements
document for the data model work, that states what it is we need. The
failure of ISO 13250 to comply with these requirements will then be a
defect, and we can file our report.

  QUO VADIS, WG3?
=================

In my opinion, what we need more than anything else is a foundational
model. Further work on topic maps will basically be standing still
until we get can that in place.

The whole issue of the model has become rather contentious, and it has
been difficult to make any progress in terms of communication between
the different groups of people involved. As SRN and MB point out, we
need to evalute the proposals that we do have and see how they can
best meet our needs. 

As I've said before, to do that we need to agree on what our needs
actually are.  In my opinion, the best way to reach agreement and
document what that agreement is is to put together a requirements
document for the model work. I think working on such a document will
also have the benefit that it makes it easier for people with
differing opinions to communicate and progress towards a common
understanding of what it is we need.

I can think of no justification or excuse for why this should not be
done. If anyone else can I would love to hear it.

Probably the best would be if there were a proposal for such a
document on the table ready for discussion in Montréal. The natural
people to write this would be the ISO 13250 editors, but whether they
agree and have the time to do it is not for me to say.

--Lars M.