[sc34wg3] Illustrating SIDPs

Tue, 11 May 2004 12:43:17 +0200

On Mon, May 10, 2004 at 07:58:04AM -0400, Patrick Durusau wrote:
> I think yet another confusion is about to bite the dust! Read on:

Yes, we are getting closer! But the mails are getting longer. :-)

> The difference with your suggestions based on TMCL, is that the TMRM 
> would allow the topic map proper, separate and apart from TMCL, to 
> define such identities. That is to say that identity is defined in the 
> topic map and not some additional mechanism.

Well, unfortunately, TMRM does NOT provide any means for expressing
'disclosure' (read 'identity determination and merging rules'). It
only seems to say that all applications have to do it in some way. And
it is using simply properties (and functions thereof) to express when
topics should be regarded to be about the same subject.

> But the TMRM says that it is precisely that sort of identity that
> should be able to be captured.

> >TMRM is 'property' oriented. That is nice and easy for many cases, but this
> >cannot be regarded as 'arbitrarily complex'.
> >
> 
> Here is the source of confusion I alluded to above. When you say 
> "property," it appears that you are excluding "functions over topic 
> properties". Yes?

> The TMRM is using 'property' in the sense that includes "functions over 
> topic properties", including a topic's participation on various parts of 
> assertions in the topic map.

No, no, I understand that it also includes function over properties.
What disturbes me from an architectural viewpoint is that it has a
built-in sound barrier.

Putting topics into equivalence classes can be more general than just
looking at the properties (or combinations thereof). Not that I would
expect TMCL to cover things like these, but consider a class-inducing
rule like:

 "all topics which are associated with the same number of topics (which are
  themselves not instances of persons) having an even number of properties not
  matching particular values .........are to be regarded the same"

Absurd maybe, but probably not expressible as function of topic
properties.

--

Some people might reasonably argue why we allowed this to happen: a
TMRM island (with non-formalized half-bridge) and a TM?L island with a
completely separate, incompatible formalism.  I would not be able to
argue that.

> Formalism would help, but only if we also have agreed upon terms to 
> describe what appears in the formalism.

That's my point. The lack of a proper formalism is the source of
confusion.  We as TM community have - whatever the history might be -
allowed for too long to work in fluff-space. This costs enormous
amounts of energy to explain ourselves to each other.  So our
advantage that most of us are pragmatists also may turn against us.

> >>The TMRM is not defining a syntax but the rules for a disclosure 
> >>statement, on which a syntax would be based.
> >>
> >>In other words, the TMRM allows disclosure of the basis for identity 
> >>that must underlie any equivalence or other functions.
> >
> >
> >... of course you say that. But to me this sound as saying "the nice

This should have been:

 "Of course you CAN say that."

--

> No, to some degree it is a question of 'where' it is done. Why repair 
> the weakness of a topic map to declare rules for identity by adding 
> TMCL? Why not simply have a model for topic maps that avoids the 
> weakness altogether?

This would mean that we leave identity (and connected merging rules) out
of the TMRM? I would certainly buy that!

TMRM would then capture all possible forms topic maps could
potentially have, without any constraint. TMRM would describe the
fundamental set of all possible models (in the mathematical sense,
now).

TMCL (probably a more primitive version of it) could then remedy the
lack of notation. TMCL itself would be directed to the end user
(ontology engineer).

> Certainly, some people may wish to have topic maps that require the use 
> of TMCL in order to have the identity rules they require. I don't see 
> the rationale for forcing that choice on everyone who wants to use topic 
> maps.

But according to your model you are forcing everyone into two
completely incompatible boats. Worse, you force TM vendors into
supporting a 'disclosure policy' AND supporting a TMCL implementation
which basically could do the same. The only argument we could have in
this area is that we say "property based identity and merging is so
common, that implementors should bias their implementations". But
supporting two different ways .... I would not implement that.

> I read "more formal, explicit way" to say you are designing a
> syntactical solution.

No, TMCL will have a syntax (maybe several, because different
communities are involved) AND it will have to have a semantics. The
latter will define which TMCL statements for which concrete maps
(models in the mathematical sense, not in the sense people use here)
will be true or false (or undecidable if TMCL is too expressive).
This semantics may be defined in terms of TMRM, but only if it is
MUCH, MUCH simpler than it is now.

A similar case can be done for TMQL.

> Sure, but you have to have some model upon which that syntax is
> being developed. There is always redundancy between a model and a
> syntax based upon it but that does not mean that the model is a
> second class citizen or not needed in some way.

Obviously, you are using a different concept of 'model'. As I said,
formalization helps.

> The other problem, as I noted above, is why not allow a topic map to 
> define the very rules of identity that you want to put into TMCL?

Just as a clarification: The plan is not to put the identity concept
into TMCL, but to allow the application developer to express the
identity with a TMCL statement.

> Other than following one notion of how topic maps ought to be 
> implemented, what is the advantage in that approach?

If it succeeds, then we have ONE formalism to express what a 'TM application'
is. And not 1 + 0.5 formalisms.

> NOTE: We should have a standard for implementing topic maps a la XTM. 
> But, that standard should also have a reference model that allows the 
> construction of topic maps and topic maps software that do not rely upon 
> XTM. XTM and its predecessor, HyTM, are, afterall, only interchange 
> syntaxes that represent a way to exchange topic maps. You can process 
> topic maps using those syntaxes but that does not mean they define what 
> it means to be a topic map. (And yes, topic maps based on a reference 
> model would have topics, associations, occurrences, etc., but also the 
> robust identity rules that you want to place in TMCL.)
> 
> 
> >Yes, you are right that we are mixing levels, i.e. using TMCL now for
> >parts where TMRM has put in a claim. Maybe it is TMCL which has to be
> >at two levels:
> >
> >  - level one as a language to define 
> >
> >    - what actually properties are in terms of associations. So, for
> >      instance, a property 'email'
> >
> >      $t.email   <=>    $t -> entity \ has-email-address / email
> >
> >      (saying that any topic which is involved in a 'has-email-address' 
> >      association
> >      with the proper roles can be regarded to have an 'email' property)
> >
> >    - what derived properties are:
> >
> >      $t.age           <=>   now - $t.born
> >
> >    - what derived identity can be:
> >
> >      ident ($a, $b)   <=>   $a.email eq $b.email
> >
> >    - what identity can also be:
> >
> >      ident ($a, $b)   <=>   ...
> >
> 
> Rather than saying that you are using TMCL where the TMRM has put in a 
> claim I would say that TMCL is addressing a weakness in the current 
> model that the TMRM does not have.

Mumble :-)

> Yes, the TMRM deliberately lacks a syntax (at the urging of the WG as I 
> recall) and nothing in it compells someone to construct the identity 
> rules entirely in a topic map.

This sounds like a rather cumbersome process: You would have to use a
syntactic structure as a topic map to define application specific
rules. No operators, no quantifiers, all the millions of men-years
developing logic systems is simply ignored.

> You could build something like TMCL to 
> handle some parts (or all I suppose) of the identity question.

Yup.

> Stepping aside from the TMRM for a moment, recall that we discussed in 
> Amsterdam the need to have a "reference model" as the common basis for 
> TMCL/TMQL, etc., and have a workshop set for Montreal. What I would 
> suggest is that a "reference model" that provides the framework for 
> however one wishes to allocate the resolution of the identity issue is 
> the goal of that exercise.

I personally would love to be there, but I can't.

> Certainly, anyone can use TMCL to enforce identity rules if they like, 
> but I have yet to hear an argument that such rules could not properly be 
> part of a topic map or topic maps software.

That is a killer argument. Any computable recipe can be put into a
program of a Turing-equivalent language. But why are then there
languages like XPath, or OWL? They are convenient, designed for a
particular job and not Turing equivalent.

And implementing something in a Java program is not what I would see
as 'disclosure'. Java programs are information-sinks by definition :-)

> Well, ultimately in any software process all you can do is compare strings.

And in ANY formalism you can ultimately only compare symbols.

> I think we are very close on the notion of what goes into determining 
> identity but fairly far apart on where that should be happening. My 
> preference is to have a reference model that enables that question to be 
> resolved as fits a particular situation.

I completely agree with the goal....

> Having a syntax/formalism is helpful but only just, since as proposed 
> (TMCL), it presumes a weakness that is not inherent in the notion of 
> topic maps.
> 
> Why not abstract out the formalism of TMCL that is not based on 
> remedying that weakness and propose it to the 'reference model' 
> workshop?

....but have no idea what this precisely would mean.

--

My suggestion is still (modulo some technicalities):

  (a) to factor out of TMRM the 'topic-mappish' way to represent information,
    this provides us with all possible forms topic maps could take

  (b) to recapture the TMRM 'disclosure' requirements and formalize it into
    a "low-level" TMCL variant

  (c) to build the (high-level) TMCL semantics either by defining it
    either directly based on (a) or in terms of (b), or via some other
    abstraction mechanism.

\rho