[sc34wg3] RM4TM SLUO : Objective or Requirement?

Sat, 23 Nov 2002 02:42:27 -0500 (EST)

On Fri, 22 Nov 2002, Bernard Vatant wrote:

> Some thoughts about SLUO ...
> 
> In the introduction:
> 
> "Many of the key advantages of the Topic Maps paradigm derive from the
> achievement of its primary objective, the "Subject Location Uniqueness
> Objective", which is to make everything known about every subject in a
> topic space accessible from a single location within that space."

This is a more rigorous way of stating the objectives of the original use
case for topic maps, which was to enable master indexes for independently
maintained and constantly changing technical manuals. In the days when
indexes were created on "index cards":

    Some indexers wait until all the cards are done. Then, on one card,
    they:

    * Combine duplicate entries
    * Eliminate synonomous entries
    * List all page numbers for an entry in correct order

So we see that seeking the Subject Location Uniqueness Objective is
independent of medium (or, to put this another way, it would be possible
to create a topic map application entirely on index cards (!) -- a
suggestion I owe entirely to Patrick Durusau). All uncontroversial, I
should think...

> And further on:
> 
> 3.4.1   One subject for each node

[Because the statements below are referred to later, I'm highlighting
them with asterisks.]

*****[1]*********
> "In topic map graphs, only nodes can represent subjects, and every node represents a
> single subject." [1]
*****************

> Question en paasant : Why not use the word "topic" instead of "node"
> throughout? 

Steve will correct me, but in the same way we tried to reserve association
for the SAM, and assertion for the RM, we tried to reserve topic for the
SAM, and node for the RM. Also, graphs have nodes (unless they have
vertices ;-)

> To what extent is the above different of all the existing
> prose in ISO 13250, XTM 1.0, Published Subjects TC ... "In a topic
> map, a topic is the formal representation of a single subject". The
                  ^^^ if it is "the", rather than "a", then that
                      is seeking the SLUO, is it not?    

> notion of having nodes in the TMG representing "implicit" subjects
> that are not topics in the corresponding topic map is IMO extremely
> confusing and hard to grasp.

"Confusing and hard to grasp" is a good description of set theor[y|ies],
at least to me -- a communication problem to be remedied with editorial
tools. Suggestions as to sections to change?

> Now a core issue ...
> 
> If I understand well the SLUO, out of RM prose and recent Steve
> Newcomb's comments, SLUO is not expressed by 3.4.1 but by the reverse:

We haven't written the glossary yet ...

****[2]********
> "In a topic map graph, every subject is represented by a single node"
> [2]
***************

> That I can't find anywhere explicitly expressed in the document - did
> I miss it? I assume 3.4.1. means what it says. If it is intended to
> express also the SLUO, it's a bug to be fixed.

3.4.1 does mean what it says -- "every node represents a single subject."

See also 3.9:

   In a well-formed topic map graph, every node represents a single
   subject, but some subjects may be represented by more than one node. In
   a fully merged topic map graph, every subject is represented by a
   single node. 

In my view, only a fully-merged topic map graph would *fully* meet the
Subject Location Uniqueness Objective.

NOTE: A well-formed topic map graph IS STILL EXTREMELY USEFUL. 

An objective like the SLUO is just that -- a goal, an end, a telos, that
for which we strive. In the real world of engineeering constraints,
trade-offs must be made, but that doesn't mean that the overall objective
is not sound.

****[3]**********
> OTOH if SLUO is only an Objective, it should be expressed by:
> 
> Recommendation: "In a topic map graph, every subject *should be* (as
> far as possible) represented by a single node" [3]
******************

It is the purpose of clause 5 to give applications the tools to define,
operationally, what "as far as possible" means. Such being the case, I
don't see the reason for this addition.

Let me take the position that even if "should" were to be used as in [3],
"shall" should be used for the SLUO (though it is, now, not).

If you don't have the Subject Location Uniqueness Objective, you don't
have a topic map, any more than the indexer would if she had two cards for
the same entry.

SGML provides an instructive precedent. See for example 8879, clause
15.2.3:

    A conforming SGML applications' documentation *shall* meet the
    requirements of this international standard. (See 15.5)

And 8879 clause 15.5:

    The *objectives* [clause 0.2] of this international standard will be
    met most effectively if users, at all levels, are aware that SGML
    documents conform to an International Standard that is independent of
    any application or parser. The documentation pf a conforming SGML
    system or application *shall* further such awareness.

[italics mine] So, what do you mean, "only" an objective? The objective
is the raison d'etre of the entire standard -- it needs a "shall" not a
"should."

> But it seems that the SLUO is indeed a fundamental Requirement (in
> other words, an Axiom of the model), 

I am not sure "axiom" is the right word. An objective is a goal, someting
to achieve through effort. I don't think that equations can be said to
have goals, or to achieve, since as I understand, they do not exist in
time.

> if I understand well various Steve Newcomb's recent comments. If it
> is, it has to be written clearly as such, and the various other
> requirements of the RM somehow derived from or at least proven
> consistent with it.

Again, I am not comfortable with the mathematical language. It is hard
enough to prove a program consistent, let alone an international
standard. Maybe we could use a formal specification language, like Z,
which would have the merit of offering a "proof," but would violate
Biezunski's principle -- "there's no point writing a standard that no one
can understand."

As for writing clearly -- there is always room for improvement.

> Well, now my view on that:
> 
> [1] has to be taken as a pragmatic definition. A node, like a topic,
> is intended to represent a single subject. But what this subject *is*
> no one can really tell. This is IMO the common pragmatic approach in
> real-world TM applications so far.

"is" is one of those philosophical words, and Zen masters and Plato, on
whatever plane they exist today, can argue about it....

In fact, RM4TM takes a pragmatic approach -- the SIDPs are there so that
*for the purposes of an application* what a subject "is" *can* be known.
If we grant the use of the word "ontology", the topic map ontologies (see
note 27) define the "is"-ness of subjects.

> For [2] ... in controlled environments where TM have been developed,
> the one-to-one correspondence topic-subject is assumed, 

I don't think it ought to be "assumed" -- I think it ought to be specified
in an application definition.

> but people are well aware of the fact that identifying the same
> subject from distributed sources is difficult to achieve, even if
> those sources are ontologies from the same industry.

And? 

Another way of saying "difficult to achieve" is "opportunity for
profit" is it not?

Again, realize that the RM does NOT say that a topic map that does not
fully achieve the SLUO is not useful, or not valid. 

> So [2] seems not only impossible to
> achieve in practice, but seems to express a fundamentalist approach of
> subjects 

Not so. ("fundamentalist," forsooth?!?) See section 4.3.2:

    4.3.2    Subject identity is the values of SIDPs

    All merging rules defined by a TM Application must serve the Subject
    Location Uniqueness Objective, and all must be expressed entirely in
    terms of the values of the SIDPs defined by that TM Application. TM
    Applications must define sufficient SIDPs, and constrain the
    calculations and assignments of their values, in sufficient detail to
    support all of the merging rules defined by the TM Application. 

"... defined by a TM Application ... defined by a TM Application. TM
Applications must define .. defined by the Application."

One way of looking at the RM, is as enabling the practice that will
achieve [2] ("every subject is represented by a single node") at the
application ("pragmatic") level.

> ... There is *no way* to make sure that two distinct topics
> (nodes) do not *represent in fact the same subject* because of the
> above remark. 

In a "philosphical" sense, no -- the chasm, Plato, etc. In fact, with
subejcts defining SIDPs, there is a way.

> Subjects that are considered implicitly distinct in a
> given topic map, on the basis that they are represented by distinct
> topics with distinct SIDPs, might be considered identical by another
> topic map on the basis of new discovered properties ...
> This is a frequent process in progress of knowledge that subjects
> considered as distinct at some point are discovered as being the same
> later on. Think about various historical apparitions of Halley's comet
> before Halley's discovery that they were the same one returning ...

> So the RM has to ensure that merging does not split existing subjects,

No, applications have to do that.

> but it has to allow merging of subjects considered previously as
> distinct, and admit that in many cases, the same subject will be
> represented by different nodes, because the identity of subject for
> those nodes has not yet been discovered ... In that spirit, SLUO
> should be considered only as a pragmatic guideline, and not an
> absolute Requirement.

It's an objective. (See clause 0.2 in 8879 for a good example of what an
objective is). It isn't an "absolute requirement." If it were, we would
have said that well formed graphs that aren't fully merged are invalid, or
borken, or lesser citizens, or whatever. They aren't, and we don't.

So I hope the above discussion about the pragmatic nature of the RM
relieves some of your concerns on these points.

> This would lead hopefully to relax a certain number of convoluted
> constraints discussed lately.

As I took issue with the word "weird", and the word "fundamentalist", so I
take issue with the word "convoluted." Words like that are a little too
abstract for my simple mind to grasp ...

Sam Hunting
eTopicality, Inc.

---------------------------------------------------------------------------
"Turn your searching experience into a finding experience."(tm)

Topic map consulting and training: www.etopicality.com
Free open source topic map tools:  www.gooseworks.org

XML Topic Maps: Creating and Using Topic Maps for the Web.
Addison-Wesley, ISBN 0-201-74960-2.
---------------------------------------------------------------------------