[sc34wg3] to advance Topic Maps

09 Apr 2003 22:46:02 +0200

I should probably begin by saying that I have no objections to the
spirit of what SRN is saying here, but that we do differ on the
details.  Whether the gap can be bridged remains to be seen, but it's
certainly possible in theory.

First, I should make it clear that there are certain practical
constraints here. The first is that all further progress on topic maps
(except for progress on the RM) hinges on the SAM. This applies to
TMQL, TMCL, the syntax specifications, conformance testing, fragment
interchange, and also published subjects. The first two are also
technologies for which there is appreciable commercial demand right
now.

We've spent two full years taking the SAM to where it is now, and as
it now stands it is pretty much finished. And by finished I actually
do mean finished. The SAM should go to CD after the London meeting,
and roll down the path towards IS as quickly as possible after that.
Given these two things it is difficult to argue that stopping all this
while we 

  a) straighten out the RM and 
  b) adapt the SAM to it 

is worth our while.

Having said that, constructive criticism and contribution to the work
MUST of course be welcome, as must attempts to try new ways forward.

That's enough premable for now.

* Steven R. Newcomb
|
| In order to advance Topic Maps, it is urgent that we align the SAM
| with the requirements for TM Applications prescribed in the TMM.

I don't see any need for that. I'll try to explain why below.

| (1) The SAM should be expressed and constrained in such a way that
|     it is clear that the SAM can be extended, and that its extensions
|     can extend the rules for merging and number of relationship types
|     that can determine the subjects of their role players.
| 
|     Currently, the SAM makes no provision for such extensions.  The
|     SAM provides no general doctrine for merging, in terms of which it
|     explains both its own merging rules, and those that may be added
|     by TM Applications that include (inherit) and extend the SAM.

That's true up to a point. The SAM provides a specific set of merging
rules, but does not make any limitations on what merges can be applied
beyond that. That is a conscious design decision. It has been
considered inappropriate to make any restrictions on what merging may
occur beyond the basic merging required by the SAM *so long as* that
merging respects the structural constraints of the SAM.

|     Specifically, the SAM does not say how (or even whether) the
|     instances of user-defined association types can determine or
|     influence whether their role players should merge.

It does not say how, but it does say that this is allowed:

  "Merging is a process applied to topic maps in order to reduce the
  number of redundant information items representing the same
  information. Merging is required to be performed in certain cases,
  but this is insufficient to guarantee that there will always be one
  topic per subject. Applications are therefore allowed to merge
  topics as they see fit." -- 4, first para

If you want to argue that this should be broadened to cover all item
types I'll buy that.

But, again, no restrictions have been placed on the allowed mechanisms
for doing or expressing merging because this is seen as something for
which applications should have freedom to do as they choose with.

| (2) The SAM should be expressed and constrained in such a way that it
|     is clear that topic maps that are based on the SAM can be merged
|     rigorously and predictably, not only with each other, but also
|     with topic maps that are not based on the SAM.
| 
|     The current SAM makes no provision for this.  

Absolutely correct, and this is also by design. The SAM is designed as
a self-contained expression of the TAO model and to make it extensible
would turn it into something completely different, a data model whose
basis is not the TAO model, but items-with-properties or something
similar. That may have its merits, but it is *not* what we set out to
do.

As I see it there are two worlds: the RM world, where you have a data
model that does *not* consist of topics, associations, and
occurrences, and where you can interoperate directly with other models
that are structured differently. That's fine, and those who wish to
live in such a world can do so. If there is a model for SAM-in-RM they
can also use the SAM in this world and be happy with that.

The other world is the SAM world, where you deal with topics,
associations, and occurrences, and if that is not enough for you
you'll have to go somewhere else. The point here is that when creating
a model one has to choose a set of building blocks, and in the SAM the
building blocks are topics, associations, and occurrences, and that
gives the model a certain flavour which is the whole point of having
it. To make it extensible would destroy that.

To make an analogy[1], XML is not extensible. It consists of elements
and attributes, and that's what you are given to play with. If you
want to use lists, strings, and atoms and make them play with elements
and attributes you have BNF. BNF lets you define whichever syntax you
want, and you can define both S-expressions and XML using BNF. I see
SAM as XML and the RM as BNF here. You can live in a constrained world
and be happy with that, or you can live in an unconstrained world and,
uh, be happy with that.

|     The TMM shows how the SAM can be expressed in such a way as to
|     allow other TM Applications, including but not limited to TM
|     Applications that inherit (or "include") the SAM, to be
|     independently designed and maintained without sacrificing the
|     integrity of the topic maps that are based on them when SAM and
|     non-SAM topic maps are merged.  

As I see it, that's the whole selling point of the RM and what it adds
to what the SAM has. That's why we have *two* models, rather than one.
If the SAM is made extensible what we basically end up with is one
model. That model would be the RM, the RM, and nothing but the RM, and
certainly no SAM except SAM-in-RM, which is not SAM, but a distant
cousin.

| It's possible to reconcile the SAM and the TMM.

I think so, too, and I think most of all it is a psychological issue.
What reconciliation means changes with how you look at it, and the
trick is to choose a view that works for all of us.

As I see it, the RM world can be consistent, it can work, and it can
have the SAM in it, and at the same time the SAM world can be
consistent, it can work, but it won't be extensible (in the sense that
you, SRN, mean extensible when you write it).

The important thing here is that we avoid derailing all the SAM-based
work for which there is actually commercial demand in order to achieve
something for which there is no commercial demand at the moment, but
which may become important in the future. To put it another way: we
have to cater to both the present and the future at the same time, and
I think we can do that.

==========================================================================

  BELOW FOLLOWS DETAIL
  (read if you have time)

==========================================================================

|     It's important to maintain the integrity of knowledge even after
|     it is merged with other knowledge.  The TMM is designed to meet
|     the requirement of preserving the integrity of merged topic
|     maps.
| 
|     Any data models that we publish for Topic Maps should be
|     informed by sensible doctrines that establish the general rubric
|     under which diverse merging rules must co-operate, despite the
|     diversity of the knowledge domains and world-views from which
|     they emanate.  The TMM proposes such a rubric.

Well. I'm not convinced that any of this is true, and I've seen no
substantive arguments to support it.

| (3) The SAM should be expressed and constrained in such a way that it
|     is clear that the SAM reflects the WG3's intentions regarding
|     which subjects it reifies (which subjects are capable of being
|     role players and are subject to merging), vs. which subjects are
|     not reifiable in systems that are governed only by the SAM.

All information item types are subject to merging, as section 4
clearly states. It is also clear that all of these, except locator
items, can be reified.

|     The current SAM document does not clarify this.  In the absence
|     of such clarification, there is no basis for any claims we (or
|     anybody else) might make about the integrity with which
|     knowledge is handled, even under the SAM's own rules.  The TMM
|     requires all TM Applications to make explicit the limits of
|     their support for the SLUO, and that their behaviors be
|     deterministic and predictable, even in multi-source,
|     multi-TM-Application environments.  (The "Subject Location
|     Uniqueness Objective (SLUO)" is the principle that all topics
|     that have the same subject should be merged.)  The support of
|     every TM Application for the SLUO is necessarily limited.  It's
|     important that users are able to know exactly how the SLUO is
|     met by any TM Application(s) they use.

These are all requirements put forward by the RM. They make sense
within the RM, but I am not sure that they have any applicability to
the SAM. HTML does not do this either, nor does RDF. Why should the
SAM? 

|     The SAM, as currently written, doesn't state the limits of its
|     support for the SLUO.  At least one of the things that the SAM
|     does needs an especially detailed disclosure: the SAM allows the
|     reification of subjects to be controlled, not by the inherent
|     logic of the SAM, but rather by syntactic constructs that are
|     used in a given interchangeable instance.  This makes the
|     merging responsibilities of implementations ambiguous.

This is untrue. Once the XTM and CXTM specifications are in place we
can build a set of automated conformance tests which can be applied to
any topic map implementation to verify whether it conforms to the XTM
specification or not. (Similarly for TMCL and TMQL when they are
ready.) How it is possible to get more specific and unambiguous than
that I honestly don't know.

|     It becomes impossible, in the general case, to preserve the
|     integrity of topic maps across merging operations with other
|     topic maps, because if a subject is reified in one topic map,
|     and unreified in another, the two topic maps cannot be merged
|     into a single topic map that preserves the integrity of both
|     originals.

Again that is incorrect. If you reify a base name in one topic map and
that base name is equal to another base name (on the same topic) in
another topic map the two base names will merge and the reification
will still be there in the merged topic map. If you fail to do this
the automated conformance tests will catch you out.

|     If we decide that the SAM really should be designed in such a
|     way that its implementations are exempted from respecting the
|     SLUO in this way, then we must disclose the fact, and we must
|     say exactly how all SAM implementations will uniformly resolve
|     all the ensuing ambiguities.  Again, the TMM doesn't care how
|     much or how little a TM Application respects the SLUO; it merely
|     demands that the limits be disclosed.

I think the SAM as already written does do this, but that it is the
way in which it is written that prevents you from seeing that it does
do so. Whether that is a failing of the prose or of something else I
cannot really judge.

[1] This is an analogy. I'm making a comparison to try to express a
    particular way of seeing it. Please resist the temptation to tell
    me all the other ways this could be seen and try to understand
    what I am trying to tell you. Thank you.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >