[sc34wg3] Re: FYI: Yet another TMRM Formalization (well, not really)

Robert Barta sc34wg3@isotopicmaps.org
Sat, 17 Jul 2004 10:51:21 +1000


On Fri, Jul 16, 2004 at 06:32:43AM -0400, Patrick Durusau wrote:
> >>Reasoning that TMCL would give one the ability to impose whatever 
> >>constraints are thought necessary but would not impose a requirement 
> >>that particular constraints be expressed.
> >
> >I do not quite understand this sentence.
> >
> What I was trying to say, perhaps in a clumsy fashion, was that in order 
> to have disclosure, we have to say what must be disclosed.

Patrick,

The way I think this may work is:

  - There is topic map data. We do not care, where it is coming from,
    could be manually written, could come directly out of a RDB.
    The only requirement is that it is in assertion-form or is simulated
    by software to be in that form.

  - There is an ontological description of that data in some TMCL. That
    includes type hierarchies and other constraints on the structure. That
    also includes information _when_ particular things are supposed to be
    seen 'identified', so that they are supposed to be mergeable.

    Humans look at this TMCL statement(s) and say "Ahh, now I know, what
    this is about". And software looks at the TMCL statement(s) and says
    "Ahh, now I know when to merge!".

I do not see it that way that the data itself carries the information
how to 'disclose'. It is actually a particular application which looks
at the data through the eyes of a TMCL statement. Other applications
may have different views on the same data.

Modern databases use this concept heavily.

> For example, you refer below to topic maps that follow the SAME TMCL 
> statements. How am I to determine that two (or more) topic maps follow 
> the same TMCL statements?

Whoever defines the language TMCL has to define how a machine can
determine whether a given map m satisfies the conditions C given with
a set of TMCL statements. The designers of TMCL will (explicitely or
implicitely) define a relation |=, so that one can determine whether

    C |= m

is true or not.
 
> That is to say, I can look at any topic map and determine everytime, 
> what TMCL statement it follows?

That is the plan, yes.

As an aside: To determine whether a given map (called an
'interpretation' in logics) satisfies a given (set of) condition(s) is
a comparable 'cheap' operation. To _find_ a topic maps for a given set
of constraints is much harder. This is the difference between 'validation'
and 'finding proofs' TBL (Tim Berners Lee) is talking about.

> >If your two maps DO NOT follow the same TMCL statement, so that the
> >maps are incompatible, then one (or both) maps have to be transformed
> >into forms which are compatible. Transformations (also virtual ones)
> >you can do with TMQL.
> >
> >That way the process is very controlled and controllable.
> >
> But cf the question of how do I "know" two topic maps are following the 
> same TMCL statement.

And now we only have to test

    C |= m1   and    C |= m2


===

> So, how does a subject, whose identity may be determined by more than 
> "ONE thing," have its identity expressed?

\tau has no subjects, so this is not an issue. Complext identity is not
expressed inside a \tau map.

===

> There is another problem, one which I think is more responsible for
> the difficulty of the TMRM .... When you say that members contain a
> pair <r, p>, it has a certain simplicity that is attractive but
> there is a problem.

> Let's start with <r, p>.
> 
> I assume that 'p' represents a subject, in the sense we use the term in 
> TM land?

For the p's we only know that these are distinguishable things. No
internal structure is known, although they could be as big as another
universe.  For us they behave as if they were names only.

> Well, what if I want to talk about the subject of the relationship
> between "r, p"?  Note, not talking about 'r' or 'p' but the
> relationship between them.

Yes? If a particular r and a particular p have a relationship, then
there must be an assertion for that. A fully-fledged assertion. And if
I want to talk about that as a subject, I give that assertion an
identity.

> And, if I can talk about that, I can also talk about the relationship 
> between the two tokens that arise from that relationship, and so on.

Tokens?

You probably refer to the TMRM feature that allows to reify the fact
that a particular topic is casted into a particular role?

I have always questioned this 'feature'. It introduces an
inconsistency into TMRM in the sense as it violates the general
spirit, that 'assertion' is the main paradigm. With this feature it is
possible to 'single out' a casting __IRRESPECTIVELY__ of the assertion
in which this casting is in.

And then I would also ask, why only a single casting is becoming a
subject?  Why not a combination of two, or three, or some?

For me, such a casting makes only sense in the context of the WHOLE
assertion. And that can be reified anyway.

> The problem as I see it, is that in normal speech and even in viewing 
> examples, we all elide over subjects that are not really relevant to the 
> current conversation. That is to say that at some level, we realize 
> there is a subject of the relationship between <r, p> but we don't 
> express it because it does not seem to be relevant to the current 
> conversation.
> 
> The danger in that elision, however, we may arrive at a system that 
> inhibits statements about subjects that we elided over in the design.

A very valid concern. OTOH, we always loose information, there is NO
way around it because 'abstraction' is exactly this: loosing
information.  To avoid any information loss is a dead end, I would
think.

> Personally that is how I view the notion of subject identity as treated 
> in the TMDM. For my part, subject identity (we can all thank Steve 
> Pepper for the more meaningful and not to mention pronounceable, SIP, 

[ Indeed. :-) ]

> subject identity property) must be wholly left up to the topic map 
> designer. To do otherwise, is to privilege some notions of subject 
> identity over others, which TMAs may well do, but which ultimately 
> limits the reach of topic maps.

I would agree, but I would also fully accept a built-in bias towards
_some_ things which can become subjects (or actually are first-hand
subjects) and others which don't.

There reasoning behind this is that 'all data is effectively
teleological', that is 'all data is built with a purpose, a
goal'. Goals for different people may conflict. There is nothing like
the ultra-abstract, world-spanning, application independent database
scheme.

Saying this, it also means that - whatever you may foresee in your
model - people will ignore it. If, at some later stage, they
realize "oh, oh, we have built that in a stupid way, we cannot make
statements about this and that", then nothing should stop them to
convert the map.

And this is where TMQL kicks in again. It can extract information from
one (or more) existing maps and can create a new map. That may be much
more clever that its predecessor. They could even co-exist (virtual
maps).

> Note that most syntaxes will make those choices in advance and
> properly so. That will allow them to be optimized for particular
> areas and I see no reason why that should not be allowed.

Yes, a syntax where everything is possible at everytime is hard to
use, hard to parse, hard to process and hard to handle. 

> >Maybe. :-) As I mentioned to Jan already, we did not see the necessity
> >to hard-code this into the formalism. If we need it later (and I like the
> >cleanness of the idea), then we can easily add it.
> >
> We may be using the terms formalism and formal model differently. Are 
> you saying that that \tau model is a formalism (means of expression) or 
> a formal model (means of expression + somethign being expressed)?

The 'means of expression' of the \tau model is - of course - set
theory. The structures we can build ontop of sets are models for topic
maps (abstract representations if you want). The extension of basic
maths by the rules given in \tau is the \tau formalism.

> For example, the TMDM has rules for merging of topic information items.
> If I use another set of rules for merging topic information items, am I 
> required to disclose those in a TMCL statement?

I would put the question the other way round (I slowly begin to
understand your way of thinking):

If the application cares that the basic merging rules are not
sufficient or are inappropriate for this particular application, then
the application has to do something against it. It is the application
which interprets the data.

The data itself is completely dumb.

> What if I took the route that Kal does and have implied topics? Part of 
> the software and I suspect you get different behavior/capabilities from 
> Kal's software than software that does not have that ability on the same 
> topic map.

I can only guess what 'implied topics' are (maybe it is the same we
call here 'virtual maps'), but yes, the meaning of the data is very much
determined by the application.

> >>You don't say how the pairs of objects and literals arise. I assume that 
> >>is intentional? (Thinking here of the TMRM's distinction between 
> >>built-in versus conferred properties.)
>
> >Yes. This is a postulated set.
>
> Ok, so are you planning on saying how the postulated set arises?

No, because there is no need to do so. There are - at least at this
stage - no further requirements on this set.

> So you are saying there is no general notion of identity?

No general one, no. Identity (and equivalence) are only 'natural' at
the literal level. "Rumsti" and "Rumsti" are usually regarded the same.
That may not true for a compiler for a programming language:

  var   a = "Rumsti";
  const b = "Rumsti";

Should the string "Rumsti" for a variable be the same allocated memory
as that for the constant? One is immutable, the other may be not.

> That is to say if I wanted to return to my issue above of the identity 
> of a subject being defined by more than "ONE thing," your most likely 
> response is that such rules for identity are defined elsewhere?

Yup. Not only because identity could be induced by two or more 'properties'
of the object under comparison; also because identity/equivalence may depend
on things _outside_ the two objects. The weather, the government, ...

> That is to say that the reference model only says that subjects have 
> identities and those are defined for particular applications?

Yes, the 'reference model' (I would not call \tau at this stage a
reference model, just an 'idea') would use a name (= label) to define
'atomic identity' and particular applications would use that and
additionally some app-specific equivalence induction.

--

Phew. :=)

\rho