[sc34wg3] Draft Reference Model

16 Nov 2002 20:32:13 -0600

Sam Hunting <shunting@etopicality.com> writes:

[Bernard Vatant]
> > 2. Subject Identity Discriminating Properties
> > (SIDPs) vs Other Properties (OPs)

> > It is, if I get it well, the coolest thing in all
> > the proposal, and the way to settle all the
> > identity-names-scope debate.

I'm very glad you like this.  The Reference Model is
intended to help us communicate with each other very
clearly about what each of us thinks the SAM really
should be, so that when consensus is achieved, the
intent of the consensus, as expressed, cannot be
misinterpreted by any of us.  The RM4TM provides a
discipline that requires every SAM design decision to
be documented from several perspectives.  (I hope
everybody feels that that's a good and valuable thing.
Personally, I think it's essential.)

> > 3. Minor various comments

> > > 3.5.1.3   Well-formed node Case 3 ("a-node")
> > > 3.5.1.3.1.2   The node serves as the A endpoint of two or more AC arcs.

> > Why "two or more"? There are many cases of assertions with a single role type (take
> > "sibling" for example)

> Jan and Steve can jump in here and rectify any
> confusion I cause. However:

> We felt that a relation with only one member was, by
> definition, not meaningful. [...]

> > Is this case ruled out by the model? I would
> > suggest "one or more" here

We've had numerous discussions about this, and the
following viewpoint has (so far, anyway) prevailed
consistently:

  A relationship *type* that has only one possible
  membership isn't a type of relationship at all.  In
  an instance of such a relationship type, nothing can
  have a relationship with anything else, by
  definition, so it's not a relationship.

  In the realm of syntax, however, it's frequently the
  case that a link points somewhere else to establish a
  relationship between the link and whatever the link
  is pointing at.  Just by looking at the syntax of the
  link, we might be fooled into thinking that,
  structurally speaking, there's only one role player
  and only one role type.  But, in fact, that's not the
  case, really.  The relationship is between two
  things: (1) the thing pointed at, and (2) the link
  that's doing the pointing.  In RM4TM terms, this
  relationship would have two role types, one for the
  link, and one for the thing pointed at.  So, it's a
  two-role relationship, even if it appears, in the
  syntax, to have only one role type and one role
  player.  (The other role type and role player are
  implicit in the syntax, but everything must be
  explicit in the topic map graph, so, in the graph,
  you must have two role types with a role player for
  each one.)

If a relationship type has two possible role types, and
an instance of that relationship type has a role player
for only one of the role types, that's OK.  In fact,
it's a frequent state of affairs in the real world: the
relationship exists, and one of the role players
exists, but there's no role player for the other role.
The RM4TM has no problem with this.

> > > 3.5.1.4   Well-formed node Case 4 ("c-node")
> > > 3.5.1.4.1.3   The node serves as the C endpoint of a single CR arc.

> > That means role type is mandatory. I'm very happy
> > with that, vs <roleSpec> being optional in XTM 1.0.
> > OTOH assertion type is still optional ...

Yes.  The syntax doesn't necessarily have to change,
however.  There are several ways to handle this, and we
should consider all of them carefully.  For example,
the Syntax Processing Model for XTM can supply an
"unspecified role type" role type, but this may have
undesirable consequences for the merging of assertions.
Not to worry, though; there are other approaches, too.

> > > 3.5.1.3   Well-formed node Case 3 ("a-node")
> > > 3.5.1.3.1.3 The node may or may not serve as the
> > >             A endpoint of one AT arc.

> > I'm curious about the rationale making role type
> > mandatory and assertion type optional.  (BTW both
> > are mandatory in Mondeca ITM)

Here's the rationale (at least as I see it):

(1) In the graph, all role types are mandatory because
    without them, there's literally no way to tell
    which role player is playing which role.

(2) Assertion types *are* mandatory for all assertions
    that determine the subjects of any of their role
    players.  

      Here are three examples of assertion types that a
      TM Application *might* define as determining the
      subjects of one of their role players:

      (a) set-member.  Each of these affects the
          subject of the "set" role player.

      (b) class-instance.  The subject of the
          "instance" role player is (I think probably)
          affected.

      (c) topic-subjectIndicator.  The subject of the
          "topic" role player is determined by the
          information that plays the "subjectIndicator"
          role type.

    The RM4TM constrains the definitions of assertion
    types: it requires them to say how the values of
    the Subject Identity Discriminating Properties
    (SIDPs) of their role-playing nodes are affected.

    Assertion types that affect the subjects of their
    role players must be defined, and their definitions
    must meet the criteria established in the RM4TM.
    If they don't, there won't be any way for the
    merging process to know which nodes must be merged,
    and the Subject Location Uniqueness Objective
    cannot be achieved.

(3) Assertion types are *not* mandatory (but are still
    a very good idea, I think) for assertions that
    don't determine the subjects of their role players.
    As far as the RM4TM is concerned, if an assertion
    has no impact on the achievement of the Subject
    Location Uniqueness Objective, it's not very
    interesting to the RM4TM, and the RM4TM doesn't
    care very much about it.

    There is one very interesting implication of an
    assertion's being "untyped" (i.e., about the fact
    that an assertion's type is unspecified): such
    assertions can never merge, even if they have the
    same role players playing the same role types.
    This is because, if their types are unknown, it
    cannot be known whether they are really the same.

> > > 3.5.1.6.3   Subjects of Case 6 nodes

> > > The subject of a t-node is a class of
> > > relationship, including the roles that can be
> > > played in instances of the class, and the values
> > > that are conferred on the properties of role
> > > players by virtue of their situations as players
> > > of specific roles in instances of the class.

> > Those are
> > "assertionPattern-role-rolePlayerConstraints" of
> > the Draft Reference Model?  Have they been put out
> > of the graph? Or does RM4TM leaves free the way of
> > expressing those constraints?

> In order: Yes, conceptually. The RM4TM doesn't put
> them in the graph, but it's likely that applications
> will ahve to as part of bootstrapping. "Free" within
> the constraints of defining an application.

Basically, under this draft RM4TM, the SAM can be
anything we want it to be, except ambiguous.  Whatever
we decide the SAM is, we have to document it, and the
RM4TM establishes criteria that such documentation must
meet.  But the RM4TM basically doesn't constrain what's
being documented.

The SAM is free to establish one or more assertion
types that determine the subjects of assertion types.
The SAM is also free *not* to do that, and instead to
provide its own limited set of built-in assertion
types.  If the SAM allows topic map authors to create
association types, however, as the HyQ and XTM syntaxes
do, I think the SAM probably must define some
assertion/association types that allow the subjects of
assertion types to be determined by means of them.  One
such association/assertion type could be the one that
you're mentioning:
"assertionPattern-role-rolePlayerConstraints".

> > > 3.6.4.2   Semantics of role playing
> > > 3.6.4.2.1 No multiple role players of a single
> > >           role type
> > > Note 21: However, the subject of a role player
> > >          can be a group of subjects ...

> > I'm uneasy with that. Having several subjects
> > playing the same role in an assertion looks to me
> > more natural than having to create first a subject
> > which is a group of subjects ...  If I think I am
> > linked to my children by a "father-child"
> > relationship, have I to consider them first as a
> > group? Or if I don't want that, split this
> > assertion is so many assertions that I have
> > children.

> > I would like the rationale of Note 21 to be
> > expanded. On this father-child relationship, for
> > example.

Here is the text of Note 21:

Note 21: However, the subject of a role player can be a
         group of subjects, if the governing TM
         Application defines the assertion types
         required to allow the subjects of nodes to be
         groups of subjects.

         No grouping semantics of any kind are defined
         by this RM4TM. This RM4TM requires all groups
         to be explicitly represented as nodes. Any
         other approach would open the possibility for
         knowledge about a group to fail to be
         connected to the single node whose subject is
         the group, and that would be contrary to the
         Subject Location Uniqueness Objective.

First of all, remember:

  ***************************************************
  *                                                 *
  *  The Subject Location Uniqueness Objective is   *
  *  to have one single subject per node, and for   *
  *  every participating subject to have one        *
  *  single node, even after any number of diverse  *
  *  topic maps have been merged together.          *
  *                                                 *
  ***************************************************

The first paragraph of the note says that if TM
Applications need to allow groups of subjects to play
single roles in single assertions, then they must
define assertion types, like "set-member", that are
used in the graph to establish the memberships of
subjects in groups.

The second paragraph of the node says that, in the case
of your "group of children" example, a node that has
the group as its subject must exist, because if it
doesn't, the Subject Location Uniqueness Objective
cannot be achieved.  Consider this scenario: The group
of children that is a role player in your assertion can
also be a role player in another assertion that you
don't know about, but which appears in another topic
map.  When that other topic map is merged with your
topic map, perhaps by a third party unknown to you,
what should happen?  The result can't be two subjects
(the group of children, twice) when there is really
only one subject (the group of children, once).  And we
can't have a subject (the group of children) that
participates in a topic map, but which has no node,
because every subject that participates in a topic map
graph must have a corresponding topic node.  

The syntax can hide all this, of course.  (In the XTM
syntax, all this is certainly well-hidden.  And that's
probably why the XTM syntax is so intuitive for so many
people!)  Under the draft RM4TM, the Syntax Processing
Model defined by the SAM for XTM must specify how to
make what was only implicit in the XTM instance, such
as groups that play roles collectively, explicit in the
graph.

> Here I must defer as well, but with a few comments:

> 1. Personally, I don't think it's harder in the
> implementations I've done, but it does take a bit of
> mental reversal.

I think we'd all like topic maps to be fully
interchangeable, unambiguous, and platform-neutral,
because such a state of affairs will provide a platform
on which a huge knowledge-integration industry can
flourish.  I think of this draft of the RM4TM as the
first draft of a Java-like specification that describes
something like the instruction set for the Java Virtual
Machine.

It would not be entirely incorrect to regard the RM4TM
as establishing something like an assembly language for
a metaphorical "topic maps machine" that has only eight
instructions.  I would characterize the development of
a Syntax Processing Model, such as the Syntax
Processing Model for XTM, as being similar to the
development of a compiler for a high level programming
language that outputs code suitable for a RISC machine
that has only eight instructions -- the Eight Forms of
Connectedness.  It's not hard, really, but the idiom of
the "Eight Forms" assembly language, even with its
"assertion subgraph" macros, may take a bit of getting
used to.  That's always true for assembly languages.
Assembly languages must always make everything
explicit.  That's why we use compilers: high-level
languages allow us to ignore the complexity of managing
the total explicitness required by CPUs.

> 2. "create first" suggests to me that you are
> thinking in terms of the process by which the graph
> structure is created, rather than the graph structure
> itself (front end vs. back end).

The RM4TM doesn't constrain how graphs are created, or
in what order their nodes and arcs are added to them
while they are under construction.  It only defines the
terms "not well-formed," "well-formed" and "fully
merged", saying when each of those terms can be used to
characterize a topic map graph.

-- Steve

Steven R. Newcomb, Consultant
srn@coolheads.com

Coolheads Consulting
http://www.coolheads.com

voice: +1 972 359 8160
fax:   +1 972 359 0270

1527 Northaven Drive
Allen, Texas 75002-1648 USA