[topicmaps-comment] RE: [sc34wg3] Re: PMTM4 and XTM Layer 1.0

Wed, 10 Oct 2001 21:49:50 -0500

[Graham Moore:]
> TopicMaps typically make the metamodel, i.e. classes and assoc templates
> part of the runtime environment, UML typically never has the metamodel
> as part of a run time environment. Thats because the run time env is
> often java or python etc, which just dont have these features. Perhaps
> the closest we get to having access to the metamodel at runtime is
> something like smalltalk.

Right.

> Secondly, topicmaps at the moment priviledge certain things in order to
> achieve commonlaity and interchange. These things are names and identity
> structures. Occurrence I can quite happily see as Assocs in the model.
> Because these things are priviledge in the data model in the layer 1.0
> model, there is no ambiguity about them - interchange and
> interoperability is strengthened.

> One concern I have about PMTM4 is that becuase everything is so regular,
> these distinctions are lost from the basic data model. PMTM4 is reliant
> on the fact that people interpret a Data driven model with the correct
> semantics rather than the semantics of the model (when I use semantics
> here I mean the ability to find a topics name etc, the ability to find
> out about the topic map from the data model.) being an integral,
> unambiguous, part of the model. 

Right.

> An example, if names are modelled as assocs in PMTM it requires that
> there are PSIs, probably, to identify name structures, these have to be
> interpreted by SW in order to impose the correct semantic upon these
> regular structures. 

Right.

> If naming is a first class principle then it should
> be there as such.

You seem to be suggesting that the notion of naming should *not* be
the subject of a topic.  Put another way, you seem to be suggesting
that the "naming assertion type" (PMTM4 calls it "the topic-basename
association template") should not appear as an association template,
and the connection between a topic and a name for that topic should
not be an association.  Is that right?

If that's what you're saying, I disagree with you.  I see significant
disadvantages, and no advantages.  The disadvantages include:

* Only one kind of naming.  What if a popular application needs to use
  an altogether different kind of naming (such as naming without the
  topic naming constraint that several people have decried as
  unreasonably burdensome)?  What if a popular application needs to
  use only a limited subclass of the standard kind of naming (see
  below an example of a powerful reason for supporting such
  subclassing)?  How can that be expressed if there's nothing explicit
  to subclass?  Shall we create a completely separate, special
  mechanism just for names?  Ugh.

* Inconsistency.  Naming assertions would be handled differently from
  all other kinds of assertions.  This would create more mechanisms to
  implement and understand, and lots of opportunity for many kinds of
  mischief.  We would be unable to apply the whole power of the
  topic maps paradigm to naming issues and to names.

I can imagine arguments in favor of your proposal that would go
something like this:

  Naming is special.  It's different from everything else.  We need to
  have namespaces for names, and we need to be able to use names as
  the addresses of the topics that they name.  We need to be able to
  take advantage of the special features of languages (e.g. Python)
  that provide convenience features for storing and looking up names.
  All such paradigms make a big distinction between naming and
  everything else.  Why should Topic Maps be any different?

I'm unsympathetic to these arguments.

Every assertion type is special, not just topic-basename assertions.
There is an unbounded number of assertion types, each of which can
impose an unbounded amount of processing complexity when used in
support of the applications for which they were created.

The basic attraction of the Topic Maps paradigm is that it provides a
way to simplify and unify the expression, interchange, merging, and
infoglut control of relationships between arbitrary subjects.  Surely
the concept of naming is itself a subject.  Surely, since I can say,
"The name 'Graham' is a perfectly nice name," the name "Graham" is
itself at least potentially a subject.  Simplicity and ease of
implementation are not well served by exempting certain subjects from
the same discipline to which all other subjects are, uh, subject.  (To
those readers for whom English is not their native language, I
apologize for the previous sentence, which uses the word "subject" in
two different senses.)

Nothing prevents the exploitation of the special features of various
languages (like Python) and systems (like RDBMSs) when implementing
the processing complexity of naming, or of any other assertion type.

It should not bother us that Topic Maps is different from every other
paradigm.  (Otherwise, why use Topic Maps?  People should use the
paradigms that suit their purposes.  If they need a paradigm that can
absolutely collate everything known about any given set of subjects,
then they might want to think about using Topic Maps.  And we'd better
make sure that Topic Maps can really, really do that.)  The purpose of
Topic Maps is to bring everything under one umbrella -- to facilitate
the merging of arbitrary knowledge from diverse sources.  If we make
Topic Maps a system of two umbrellas, one for names, and the other for
everything else, we will have violated the fundamental purpose of
Topic Maps.

> Returning to the UML / TM comparison, identity is *in* UML, but it
> doesnt make the distinction between resolvable topics and conceptual
> ones. However, being UML it could create an class to do this - the point
> is that it isnt standardised. I think this is one of the best things TM
> offers the world.
> 
> Another UML reference. PMTM4 see everything as a topic, including
> strings and I assume integers - this brings it very close to the UML
> model that just connects Objects togther. 
> 
> As the full PMTM4 model isnt described yet - I dont see what in a data
> model a Topic now looks like :
> 
> e.g.
> 
> <topic>
> 	<baseName>
> 		<baseNameString>Graham</baseNameString>
> 	</baseName>
> </topic>
> 
> Generate 2 topics in PMTM4
> 
> 1. topic for the topic named 'graham'
> 
> and 
> 
> 2. a topic for the string 'graham'
> 
> Now, what I ask steve and michel is, how does this micro part of the
> model pan out, does for example a topic have a type - like string, int,
> Object, Topic?
> 
> Topic.value => "graham" ??
> Topic.type  => "String"

When I read your words above, I have this queasy feeling that Ted
Nelson calls "paradigm warp" -- a phenomenon that occurs when subtle
differences between world views make communication between people
extra challenging.

For me, topics don't have "values".  In the universe of Platonic
forms, they have subjects.  That's the closest thing I can think of to
the idea that a topic has a "value".  In computer-processable terms,
the closest thing I can think of to topics having "values" is the fact
that topics have subject identity points (subject indicators, plus
zero or one subject constituter).

Topics can have any number of types, not just one type, and each type
(or class) is itself a topic.  The type of a topic is never a string,
such as "String".  It's a topic, which, again, has subject indicators,
plus zero or one subject constituter.

The nature of a subject indicator is not constrained, except that it
must be addressable information.  However, the nature of a subject
indicator for a topic whose subject is a topic name *can* be
considered to be constrained by the XTM syntax.  XTM provides a
special syntax just for this kind of subject indicator.  In XTM, as
you point out, it's normally a string which is the literal content of
a <baseNameString> element.

I know you know all this, Graham.  This is all a warmup so I can get
past my paradigm warp problem.  Please be patient with me.  Remember I
was a classroom teacher for over two decades; some habits never die.

Let's do a little exercise, now, that I hope will be helpful.

If it's true that, as PMTM4 claims, all the features of any syntax for
topic maps ultimately boils down to a set of assertion types, no more
and no less, and if it's true that topics always boil down to subject
identity points, no more and no less, then it must be true that we can
use a small subset of the XTM syntax -- just the association syntax --
to say exactly the same thing that any specialized feature of XTM
syntax allows us to say.  Let's test this claim by seeing if we can
really do without the very specialized <baseNameString> feature of
XTM.  (Note: I'm *not* proposing that we get rid of <baseNameString>.
This is just an exercise that I hope will be revealing.)

For example, how can we say that a topic, such as a specific dog, has
the base name "Maxwell", without using <baseNameString>?

First of all, there has to be a subject indicator for the dog.  Let's
imagine that we've chosen the entry for the dog whose name is Maxwell
in the American Kennel Club (AKC) registration records.  (We can
address this entry by means of its AKC registration number, for
example, but that's just a detail having to do with the mechanics of
addressing this entry in its own proper context.  If the AKC sees fit
to make these entries uniquely addressable via some system of URIs, so
much the better; we can then do this on the Web.  Whether the
registration is available by means of a URI on the Web or not doesn't
matter for purposes of this discussion.)

Since we're trying to attach a name to the topic whose subject is the
dog, and since we're not allowing ourselves to use <baseNameString>,
we must use an association that has, as its template, the
"topic-basename" association template.  The topic whose subject
indicator is the AKC registration entry is addressed as the
role-player of the "topic" role.  (Actually, as you know very well,
there doesn't have to be a <topic> element that explicitly addresses
the AKC registration as its subject indicator.  We can address the AKC
registration entry directly from the <association> element, using a
<subjectIndicatorRef>.  The fact that we make this reference has the
effect of demanding the existence of a topic node, among whose subject
indicators is the AKC registration entry.)

How will we make the name "Maxwell" play the "basename" role in our
"topic-basename" association?  For that purpose, we use a
<subjectIndicatorRef> that contains the address of the string,
"Maxwell".  It doesn't matter where the string actually is, as long as
we say that this string is in fact to be treated as subject indicator
of the topic that is the role-player of the "name" role.

At this point, some questions arise.

(1) Why should we use a <subjectIndicatorRef> rather than a
    <resourceRef> to address the string, "Maxwell"?

    If we used <resourceRef>, we'd be saying that only the particular
    instance of the string, "Maxwell", that we happen to be addressing
    is in fact the name of the dog.  This wouldn't make sense.  It
    doesn't matter where the string "Maxwell" occurs; wherever it
    occurs, it is one and the same name.  The name "Maxwell" is an
    abstract subject; it exists as a single unique Platonic form in
    the Universe of Platonic Forms.

(2) What if somebody wants to regard that particular instance of the
    string, "Maxwell", as a subject constituter?

    No problem.  That's a different subject.  Every addressable piece
    of information can be regarded as the subject identity point for
    exactly two distinct subjects:

     (i) the subject that is somehow compellingly *indicated* by this
         piece of information, including consideration of its context,
         and

    (ii) the subject that *is* (i.e., *is constituted by*) this piece
         of information itself, including consideration of its
         context.  (If another copy of it appears elsewhere, that copy
         is not the same subject constituter.)

(3) Look, OK, I have a topic whose subject indicator is the string,
    "Maxwell".  How am I supposed to know that this is a subject
    indicator for a name?  In other words, how do I know that the
    subject of the topic is the *name* "Maxwell", rather than being,
    for example, the concept of maximum health, or the deepest well in
    the world, or Jack Benny's infamous 1925 Maxwell brougham, or the
    name of a house that's full of coffee?

    Very good question.

      (i) First of all, I could try to duck this issue by invoking the
          fact that subject indicators have whatever meaning they have
          to whomever perceives them.  This is a very unsatisfactory
          answer.  I hate this answer.  The very essence of the issue
          we're discussing is: Whence cometh a name's name-ishness?
          If we say, "It's all in the mind of the beholder," we
          retreat into some sort of unimplementable philosophical
          fantasy land.  Computers don't have minds, and they don't
          behold anything.

     (ii) Well, then, what about the fact that this addressed string
          plays the "name" role in a "topic-basename" association?
          Doesn't that establish that the string is in fact a name?
          Well, I would say "Yes", except that I hate this answer,
          too.  The reason I hate this answer is that it relies on a
          doctrine that I believe to be inimical to information
          interchange via topic maps.  I have always strongly resisted
          this dangerous and false doctrine, and I'm still resisting it, even in this
          case.  The doctrine is:

             We should be able to tell what the subject of a topic is
             by analyzing its characteristics (i.e., in PMTM4 terms,
             by analyzing all the associations in which it plays
             roles).

          Here is my argument against this doctrine:

          (a) Computers aren't smart enough to do that kind of
              analysis reliably.

          (b) People aren't smart enough, either.  There isn't
              necessarily enough information to make such an analysis
              possible, much less reliable.  A topic can exist even if
              it has no characteristics at all (PMTM4: even if it
              plays no roles in any associations).

          (c) The usefulness of the whole Topic Maps paradigm rests on
              the assumption that there is exactly one utterly
              changeless subject at the heart of every topic.  If Joe
              makes a topic in his topic map and fails to provide it
              with a compelling, precise, and unambiguous subject
              indicator, and Betty comes along and adds another
              assertion in which Joe's topic plays a role, how can
              Betty do that without knowing what Joe was really
              regarding as the subject of that topic?  Now Natalie
              comes along, and sees both Betty's and Joe's assertions
              about this topic, she makes her own assumption about the
              subject of the topic, based on everything Joe and Betty
              said, and adds some more assertions.  See the problem?
              Here we have a topic with no real anchor.  There is no
              longer one subject at the heart of it, unless Betty and
              Natalie have psychic powers and can read Joe's mind.
              (By the way, Joe wrote his topic map and immediately
              died, so nobody can ask him what he was thinking about.
              What a mess.)  We aren't likely to have information
              interchange here.  We're much more likely to have
              confusion interchange, and it might be very dangerous.
              If we want Topic Maps to work reliably in the real
              world, we must do better than this.  We can't blithely
              assume that people can tell what we're talking about
              just because they can see what we said about it.  We
              might not say very much.

    (iii) The string "Maxwell" is, by itself, a lousy subject
          indicator.  It is not precise, nor unambiguous, nor
          compelling.  It sucks.  However (and this is a very big
          however): the string "Maxwell" has context, and the context
          of a subject indicator can be extremely significant.  For
          example, the string "Maxwell", when appearing between
          <baseNameString> tags, is very precise, unambiguous, and
          compelling.  We know it's a name, and we know that it's a
          name completely independently of any assertions in which the
          topic that is the name "Maxwell" plays any role.  (In XTM
          syntax, we also know which topic it's the name of, by virtue
          of the <topic> element in which the <baseNameString>
          appears, but I'm trying to ignore that for the moment.  I'm
          trying to show that, when all is said and done, we only need
          an assertion, like any other assertion, to represent the
          fact that some particular dog has the name "Maxwell".)
          Similarly, all by itself, the entry for a particular dog in
          the AKC registry would be a lousy subject indicator, but the
          same information is a *great* subject indicator when its
          context is known to be the AKC registry.  Let's imagine that
          there is a field in such registry entries for the common
          nickname of each dog (as opposed to the weird and lengthy
          unique names that each dog is given in the AKC registry,
          such as "Leed-A-Way Honey Girl" and "Smokey Wind Jerry").
          If we address the content of that entry, and the content is
          "Maxwell", we have a name, and even a computer can know that
          it's a name.

(4) OK, Steve, you've won a lot of points here, but there's a fatal
    flaw in your vision.  Suppose there are two topics, and they both
    have the same subject -- the dog whose name is Maxwell -- and each
    of them plays the "topic" role in a "topic-basename" assertion,
    and in both of these "topic-basename" assertions, the player of
    the "basename" role is the string, "Maxwell".  Unfortunately,
    though, the string "Maxwell" is referenced by one of the
    "topic-basename" assertions in the context of <baseNameString>
    markup, and the string "Maxwell" is referenced by the other
    "topic-basename" assertion in the context of the AKC registry
    entry for this dog.  If these two strings really indicate the same
    subject (the name "Maxwell"), then they must both be subject
    indicators for one and the same topic, after all merging has been
    completed.  How is this magic merging supposed to happen?

    Well, if this is a "fatal flaw", it is the fatal flaw of the Topic
    Maps paradigm as a whole, not just of the topic naming problem
    we're discussing.  We have never claimed that computers would
    always be able to detect situations in which two different subject
    indicators actually indicate the same subject.  We have always
    said that this kind of problem can only be attacked with
    heuristics and human sweat.  What we *have* claimed is that the
    topic maps paradigm can be exploited in such a way as to preserve
    the value of such hard work, even when, in a topic map that is the
    result of merging other topic maps, we need to replace one of the
    contributing topic maps with a new version of itself.  And this
    claim remains just as valid for name topics as for any other kinds
    of topics.

> I am concerned that by making everything a topic here we are getting
> into a lot of other issues, such as data types etc, and if this is
> the route we want to take then I think that the large similarities
> with UML and the fact that it does all of this stuff already, we
> should consider adapting the UML metamodel model to have the new
> properties of TopicMaps.

How can we make UML reflect the inclusion and full participation of
the model (the taxonomy of topic types and assertion types) in the
data?  Personally, I cannot accept the idea that the model must be
separate from the data.  That would be inconsistent with the central
claim of the topic maps paradigm, which demands that there is exactly
one nexus (i.e., one topic) for any given subject, no matter what it
is, and that that nexus is connected to every single thing that is
known about that subject.

> One last question that got me started thinking about PMTM4 integration
> with a higher abstraction,

> in the above exmaple most people - users? - would imagine they had added
> 1 Topic and 1 String into the system. 

> Asking PMTM4 
> 
> TopicMap.getTopics().size() || TopicMap.topics.length || Card(topic) ||
> etc
> 
> would yield *2*
> 
> Asking a higher level of abstraction would yield *1*
> 
> At the moment I see this as a BIG stumbling block to getting a
> integrated model. If I hope i've missed something obvious.

I think I understand what you're saying.  I think it's the same
issue that Martin Bryan and I have been discussing in another
sequence of notes.  It's a reasonable and necessary requirement that
we not create a situation in which our users discover, to their dismay
and chagrin, that they have created topics (and, for that matter,
associations) that they didn't intend to create.  If a user creates a
topic, and he gives it a name, then he doesn't think in terms of
having two topics.  He thinks he has a topic that has a name, full
stop.  (And, at that level of abstraction, he's absolutely right!)

The issue here should not be considered in terms of what's really in
the topic map.  The real issue is how the user *views* what's in the
topic map, and, if we believe the ancient SGML dogma that information
always turns out to have unforeseen and unforeseeable uses, we need to
fully protect the flexibility available to *all* kinds of applications
that might someday be used to create such arbitrary views.  

This doesn't diminish the importance of your more specific concern
(and Martin's).  We also need to meet the reasonable requirement that
authors be able to provide guidance to viewing applications that will
indicate what the author thought users should see and not see.  In
other words, an author should have the privilege of making
distinctions between the topics that users will normally be expected
to see, and the topics that users will not normally be expected to
see.  And, as it happens, we already have a way to differentiate
topics from each other in any way (and for any reasons) whatsoever:
it's called "associations."  There are at least two good ways to make
the distinction that we're talking about here, using associations.

Personally, I prefer the technique in which we assign the semantic of
"visibility/invisibility to users" on the basis of the roles played in
various association types.  For example, we could say that if a topic
plays the "basename" role in one or more topic-basename associations,
and it plays no other role in any association type that demands
visibility, then it stays invisible, according to any application that
respects that distinction.  So, a user would get a "1" in your
example, given an application that respects the distinction intended
by the author.  Note that the technique I'm proposing here requires
the ability for association templates to be subclassable by
applications, and specifically that the topic-basename association
template be subclassable.  The subclass would add the
visibility/invisibility semantic to its "basename" role.  I'm not
aware of any model of topic maps other than PMTM4 that offers this
feature.  I think it's crucial.

Let me wrap up, now.

In PMTM4, a name can itself be a topic, while at the same time being a
name in every way.  This means that a single (set of) subject identity
point(s) is the single nexus of everything to do with that name,
including but not limited to its topic-name-ishness.  Anyone can say
anything about that name in the usual way, by means of any kind of
assertion (association).  If, contrary to the grand simplification
proposed by PMTM4, we say that topic names are not topics, then we
can't do that, and, consequently, topic maps are not fully mergeable.
Ugh.  Similarly, if we say that a name *can* be a topic, but that the
idea of a particular name of a particular topic *can't* be *exactly
the same thing* as a topic whose subject happens to be the same name
of the same topic, then, again, we can't truly merge topic maps,
because:

  * We can't tell, from the perspective of the name-ishness of a name,
    what other kinds of things are being said about the name itself,
    and

  * Conversely, we can't tell, from the perspective of the other
    things that are being said about the name, that it is also the
    name of a topic.

PMTM4 fully rationalizes this problem.  A topic name is always itself
the subject of another topic.  Thus, topic maps really work, and
there's no problem.  There is nothing we can't talk about, and
anything we do talk about has everything we say about it directly
connected to it.

-Steve

--
Steven R. Newcomb, Consultant
srn@coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA