[sc34wg3] Clarifying N0323

06 Feb 2003 18:08:48 +0100

Steve posted a proposal that we go back to N0323 as the starting point
for discussion, which I think is the right thing to do. However,
Patrick expressed concern that N0323 may not be clear enough, and I
suspect he is right. Here is an attempt to explain in more detail how
I see us moving forward from this point.

Expressions of assent and dissent are much welcome, as are questions
about anything that might still be unclear.

  THE TOPIC MAP STANDARDS
===========================

  ISO 13250
-----------

The current ISO 13250 should be completely replaced by a new
multi-part standard, consisting of the following parts:

  Part 0: Guide to the topic map standards
  Part 1: The Standard Application Model
  Part 2: The Reference Model
  Part 3: The XML Topic Maps Syntax (XTM)
  Part 4: The HyTime Topic Maps Syntax (HyTM)
  Part 5: Canonical XTM

(These part numbers are only suggestions, based on the order of
appearance in N0323. Feedback welcome.)

Part 0 is intended to be a document that explains to the reader the
structure of the whole standard, so that the reader will know where to
look to find what he/she is looking for. It may be an idea to extend
this to be a tutorial explaining the whole standard in a more
accessible way than the normative text does. (Feedback on this is
requested.)

Part 1 is intended to be the SAM as it currently is, but rewritten to
ISO style. The SAM will then define the key topic map concepts such as
subject, topic, base name, occurrence, and so on. It will also define
the structure of topic maps and their merging rules, together with key
published subjects. APIs and storage models may conform to the SAM, as
may syntaxes.

Part 2 is (as far as I understand) intended to be a Reference Model
that explains the general principles behind topic maps as an identity-
based computing technology, independent of concepts such as base
names, occurrences, scope, associations, and so on. This will be a
later version of the document currently called the RM. (Corrections on
this requested.)

Part 3 is intended to be the new definition of the XTM syntax, which
will consist of two main parts: definition of the syntax itself, and a
specification for how to deserialize instances of it into SAM
instances. In addition, there will be a normative annex with the XTM
DTD, an informative annex with an XSDL schema for the syntax, and an
informative annex with a RELAX-NG schema for the syntax. This will be
a later version of the document that is current SC34 N0328.

Part 4 is intended to be the new definition of the HyTM syntax, which
will be exactly like part 3, except that it will define HyTM instead
of XTM, and use an SGML DTD with no other kinds of schema.

Part 5 will provide an XML serialization of SAM instances that has the
property that given any two SAM topic map items that are considered
equal by the comparison rule for topic maps, they will serialize to
XML documents that are identical byte-by-byte. The purpose of this is
to facilitate automated conformance testing. This document will be
described as a SAM->XML serialization, and will be an updated version
of this document: 
<URL: http://www.ontopia.net/topicmaps/materials/cxtm.html > (Again,
feedback on this would be welcome.)

  ISO 18048 - TMQL
------------------

This standard will provide a way to query topic maps both to extract
information and later also to change the topic maps. It will be
specified in terms of the SAM, which means that it can support both
XTM, HyTM, and any other syntax or model that has a mapping to the
SAM. 

Note that being "specified in terms of" the SAM means that if TMQL
were, for example, tolog-like and had a built-in predicate
"occurrence" its behaviour would be explained as something like this:

  A predicate clause of the form

    occurrence($A : topic, $B : type, $C : locator)

  would produce a virtual table of three columns, where there would be
  one row for every occurrence item in the topic map. The A column
  would hold the topic item in whose [occurrences] property the
  occurrence item is found, the B column the topic item in the
  occurrence item's [type] property, and the C column the locator item
  in the occurrence item's [resource] property.

This is how syntax independence and formality can be achieved at the
same time. (Feedback wanted.)

  ISO 19756 - TMCL
------------------

This standard will provide a way to constrain the allowed structure of
topic maps using some declarative language. It will be specified in
terms of the SAM, which means that it will be able to support any
topic map syntax or model that has a SAM mapping. Again, the fact that
it is specified in terms of the SAM will mean that the text will be
something like this (using OSL as an example):

  A topic constraint of the form

    <baseName min="x" max="y">
      <scope>
        <!-- topic references -->
      </scope>
    </baseName>

  is evaluated by traversing the [base names] property of each topic
  item matched by the topic class definition to which this constraint
  belongs. Every base name item whose value in the [scope] property
  matches the specified scope (reference to section defining the
  <scope> element) also matches the constraint. These must be at least
  x in number and no more than y.

Again, this is how we can be precise and at the same type support
multiple syntaxes and models. (Feedback wanted.)

  XTM Conformance
-----------------

I think we should create an OASIS TC chartered to create an official
XTM conformance test suite. This test suite should consist of a set of
XTM documents, a set of Canonical XTM documents, and a topic map that
describes the test cases.

The topic map would be in XTM syntax, and would look something like
this (using LTM for brevity):

  [test-001 : test-case = "Test Case 001" %"001.xtm"]
  [test-002 : test-case = "Test Case 002" %"002.xtm"]
  {test-002, description, [[Adds a redundant base name to 001.xtm.]]}
  [test-003 : test-case = "Test Case 003" %"003.xtm"]
  {test-003, description, [[Adds a redundant occurrence to 001.xtm.]]}
  [result-001 : result = "Result File 001" % "001.cxtm"]

  canonicalization(test-001 : source, result-001 : result)
  canonicalization(test-002 : source, result-001 : result)
  canonicalization(test-003 : source, result-001 : result)

  [test-004 : test-case = "Test Case 004" % "004.xtm"]
  {test-004, description, [[Has <scope> inside <member>.]]}
  invalid(test-004 : invalid-tm)

The topic map could of course provide much additional information
beyond this, but this is the basic idea. If the ontology were based on
published subjects it could also easily be extended by anyone who
wanted to extend it.

Using this test suite one can quite easily build software for each
topic map implementation that verifies that the implementation
produces the correct canonicalized output for each test case, and that
it correctly detects all errors in the test suite.

This would be extremely useful for the vendors themselves (Ontopia
already has such a test suite, but it is not complete, nor up to date
with the latest changes to SAM/XTM), but also for anyone who wants to
test the conformance of an implementation. In a sense it would also
provide additional guidance beyond that found in the text of the
standard.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >