[sc34wg3] Re: FYI: Yet another TMRM Formalization (well, not really)

Robert Barta sc34wg3@isotopicmaps.org
Thu, 15 Jul 2004 08:00:47 +1000


On Wed, Jul 14, 2004 at 02:13:59PM +0200, Jan Algermissen wrote:
> Robert Barta wrote:
> > 
> > On Wed, Jul 14, 2004 at 07:24:08PM +1000, Robert Barta wrote:
> > > Hi all,
> > >
> > > FYI, this is our current working paper here to capture "the essence of
> > > TMs".
> 
> Robert,
> 
> some comments/questions based on a short read:
> 
> 1.1:
> - Do you mean that Tau has two elements, the set of names and the set of literals
>   or that Tau is the union of them?

\mathcal{I} is a 'mixed' set yes.

[ BTW this is not a tau, it is a calligraphic I. ]

> 
> - Are the predefined identifiers (id, instance, class, superclass subclass)
>   names or literals or a third kind?

'Predefined Identifiers' are names. A literal would be a number (42) or
a quoted string ("Rumsti").

> - In assume that all names come from a single namespace, right?

Flat, yes.

>   What about the
>   literals? If you use literals for the purpose of identity, all literals
>   must also come from a single namespace, or they must carry around with them
>   an implicit namespace. OTH, you say they don't, they are just numbers or
>   quoted strings. So, how does "grroop" provide identity?

Literals have an identity. "grroop" is the same as "grroop" and
different from "Rumsti". If I need name spaces, then that would
eventually be mapped to a flat set anyway. So why make things
complicated in the beginning?

> - You say N is a collection, so can the same name be contained in N twice?

It says in the first sentence:

"...two sets of objects: names and literals." So the collection N here is a set,
so it cant have two copies of the same thing.

>   What are the implications for Tau?

None.

> - This is pretty close to RDF (single namespace for names (URIs) plus literals)
>   OTH, RDF combines literals with domains to enable them to provide idetity.

Yes, in previous versions of this we even had one which was actually RDFish.

> - Why are the names in a special collection they are also just literals, or?

The trick is that members contain a pair < r, p >. r is a name, p is either
a name or a literal. < basename, "Jan" > would be an example.

> 1.3:
> 
> - You write that "we do not want to build in any particular merging rules into
>   our model at this stage,..."

>   You are proposing an assertion model that allows a single role to be played
>   more than once

Yes.

>   (which is the only difference from the TMRM assertion model)

No.

>   and while this seems fine now[1] I promise you that the problems will
>   immediately arise when you try to *implement* merging (which requires to
>   detect assertion equality, which is hard to do efficiently in the case of
>   multiple role players[2]). 

<evil-laughter>Har har har</evil-laughter>.

In my opinion merging __on a formal basis__ should not be in a model
with this abstraction. If I want to merge

     "two people with the same birthdate and the same name and the
      same country of origin, but only from middle east countries"

   [ that actually happened, the man was arrested in the US ]

then this does belong to a TMCL.

This is NOT about efficient implementation. If I ask you to model the
natural numbers, {1, 2, 3, 4, .....} then you would have no idea how
to implement them unless you know what the agenda is: finding a
particular prime. The agenda determines your choice for implementation.

>   I suggest to never set merging aside 'for the beginning'. I did this 3 or 4
>   times only to find out my mistakes when I started to implement merging
>   in the end!

Negative, captain ;-)

I am NOT talking about implementation. In my Perl programs merging is
at the core of the engine. Here I am talking about a formal model. I would
not hesitate a nanosecond to do merging with a (formalized) TMQL:

   m3 := m1 + m2    # non-merged version
   RETURN
     MERGE-OPERATOR (m3, rule_1, rule_2, rule_3)

by providing explicitely the rules when to merge.

This is in NO WAY efficient.

> - I think the issues around merging and values (literals), and how to
>   query for certain values are much more important than the assertion
>   structure. How do you implement: "give me all literals which are
>   numeric and > 4487" without a complete scan of all literals?

Again I stress that this is NO IMPLEMENTATION MODEL. It can serve as a
REFERENCE for other things.

You cannot implement natural numbers. You cannot write code that
incorporates the set of natural numbers. But natural numbers are a
VERY useful concept we use every day for reference.

That's how I see the role of a reference model. Not that I take it
as it is and write my TM database based on it.

But I can map TMQL (and maybe later TMCL) on it and have a sound
basis I can communicate with others like you: "This is what I mean
and not what the english prose text may appear to you".

>   If literals are not typed, access paths (indexes) cannot be implemented
>   (besides hasing).

"Hashing", I guess.

>   I really urge us not to ignore all the research that has been done in
>   the RDBMS world for decades.

Should this mean that you think that TMs should be based on RDBMs
theory?  Then why do we have a TM paradigm by itself? Why not see this
as an interesting data structure where the only challenge is to
squeeze this in a set of tables?

Good question, here is my view on things:

Some "kind of data" is very tabular. Students who have IDs (no names,
please), students enrolled in courses, students enrolled in courses in
special degrees....

Some data is not so tabular. If you have that you can move into
OODBMS to give you more flexibility.

If your data has more variation, then you might want to use an
XML DB.

If your data has even more variations, then....you would put it
into an RDF or TM DB.

What is behind all that is a tradeoff between the entropical degree of
'structure': In a table the next row has EXACTLY the same structure as
the current row. No surprises here (hence 0 entropy).

In OODBMs different objects may be a bit different, but not much.

In XML one <chapter> may be quite a bit different from the other
<chapter>.

And in "grey matter" information more surprises may happen.

This is a fundamental trade-off:

   Speed vs. Flexibility

If I do not expect any surprises, I can exploit any structural
information completely. This is why RDBMS are so 'fast'. The higher
you move up in the structure entropy, the more surprises, so there
are inherent limits on performance.

A XML DB **CANNOT** be as fast as a RDBMS for the same kind of data.

> - Interestingly, what you describe is an *Application* of the RM.

Uhm, hopefully not. :-)

>   You define (although implicitly) a set of properties and a certain
>   assertion structure (also only a set of properties as in the
>   assertion structire propsed by the RM). In essence, you defnine a
>   TMA (and operations on the properties provided by this TMA).

We never talk about properties. What is not defined does not exist.
'Properties' are completely assimilated by assertions:

   a1 = { < object, xyz-007>, < basename, "Robert" > } 

   a2 = { < object, xyz-007>, < shoesize, 2004 > }

>   While you say "with a faint similarity with TMRM", let me
>   clarify that the purpose of the RM is to provide a means to
>   express TMAs (such as yours, or the TMDM) and to enable
>   interoperability between them.

I never postulated this purpose, but now that you mention it, ....

>   In fact, the RM enables you
>   to write a mapping TMA between yours and the TMDM.

I cannot see this. I know that this was the intention, but the TMRM
never had any formalism, language, .... to actually express a TMA. It
had only the framework. As I had mentioned earlier, this is like
building houses without a roof.

>   Rather simplified you can also put it that way: The RM enables
>   the definition of TM schemas and your paper defines such a
>   schema, just not in RM language (in essence: TMA == Schema).

For this argument I would see the \tau model at the same level as
TMRM. It does not predefine any 'application-specific' names and
also not a single rule.

My thinking is that a TMA is nothing else as than a proper TMCL
statement.  Here I would define

  - what kinds of things do I have,
  - how are they structured (properties, ....)
  - what is my understanding when two things are the same
  - what app-specific rule must they follow....

If TMCL is based on something which is compatible with the \tau model,
then that would cover the TMA relationship you, Patrick and Steve,
et.al.  envisioned.

But it would have a language and a sound formalism. That's the
difference for me. The TMRM, as it stands, is very difficult to digest
for outside people (withholding 3rd party comments here).

> I hope this clarifies matters a bit. If my language sounds too
> offensive, please take my apologies, I do not mean to sound rude.
> I maybe just got carried away by the issues.

Ah, I *always* prefer men with passion and vision over those with
overpolite poker faces. The latter always survive but they ruined
the planet.

\rho