[sc34wg3] Re: FYI: Yet another TMRM Formalization (well, not really)

Jan Algermissen sc34wg3@isotopicmaps.org
Thu, 15 Jul 2004 13:52:12 +0200


Robert Barta wrote:
>
> [ BTW this is not a tau, it is a calligraphic I. ]

Ah....my Linux Acrobat Reader didn't really reveal that :-)
 
> >
> > - Are the predefined identifiers (id, instance, class, superclass subclass)
> >   names or literals or a third kind?
> 
> 'Predefined Identifiers' are names. A literal would be a number (42) or
> a quoted string ("Rumsti").
> 
> > - In assume that all names come from a single namespace, right?
> 
> Flat, yes.
> 
> >   What about the
> >   literals? If you use literals for the purpose of identity, all literals
> >   must also come from a single namespace, or they must carry around with them
> >   an implicit namespace. OTH, you say they don't, they are just numbers or
> >   quoted strings. So, how does "grroop" provide identity?
> 
> Literals have an identity. "grroop" is the same as "grroop" and
> different from "Rumsti". If I need name spaces, then that would
> eventually be mapped to a flat set anyway. So why make things
> complicated in the beginning?


Just to clarify:

Your names *provide* identity and the literals *have* identity - yes?


> > - Why are the names in a special collection they are also just literals, or?
> 
> The trick is that members contain a pair < r, p >. r is a name, p is either
> a name or a literal. < basename, "Jan" > would be an example.

Ok. Again, regarding the id (the identifier for assertions): is the set of ids
a subset of N? IOW, do they share the same namespace?


> > 1.3:
> >
> > - You write that "we do not want to build in any particular merging rules into
> >   our model at this stage,..."
> 
> >   You are proposing an assertion model that allows a single role to be played
> >   more than once
> 
> Yes.
> 
> >   (which is the only difference from the TMRM assertion model)
> 
> No.

Why 'no'? Can you tell me what is different? Note that the 'Assertion Type'
in the RM is really just the group of roles (the t-sidp's value is the
group of roles, and t-sidp is what gives the T-topic the identity).

> 
> >   and while this seems fine now[1] I promise you that the problems will
> >   immediately arise when you try to *implement* merging (which requires to
> >   detect assertion equality, which is hard to do efficiently in the case of
> >   multiple role players[2]).
> 
> <evil-laughter>Har har har</evil-laughter>.
> 
> In my opinion merging __on a formal basis__ should not be in a model
> with this abstraction. If I want to merge
> 
>      "two people with the same birthdate and the same name and the
>       same country of origin, but only from middle east countries"
> 
>    [ that actually happened, the man was arrested in the US ]
> 
> then this does belong to a TMCL.


Well, you model does not allow for r-s or p-s to be merged without
loss of information (unless the to-be-merged p-s or r-s are the same name
or literal. Well, you could use assertions to preserve that information
(in the way the RDF can handle such cases via owl:same).

 
> This is NOT about efficient implementation. 

Ok, let's put it the other way round: What is it about? What I mean is
that any model we develop can only be evaluated against its suitability
for the purpose Topic Maps aim to fullfil. There is really no use in
providing models that express interpretations of 'Topic Maps'. First
we need a clear statement what the purpose of Topic Maps is (what
problem do they solve) and then we can discuss what models is suited
best. And at this point, implementation efficiency is of critical
importance - or are we doing this as a <quote>scientific excercise</quote>?

So, let me restate this (IMHO) fundamental question, because I think
the major cause for all the disagreements is that it has not been
answered yet:


          WHAT IS THE PROBLEM THAT TOPIC MAPS SOLVE?


(I have a clear answer for myself (need not be the right answer of course)
but I'll give it later on as not to influence you :-)


> > - I think the issues around merging and values (literals), and how to
> >   query for certain values are much more important than the assertion
> >   structure. How do you implement: "give me all literals which are
> >   numeric and > 4487" without a complete scan of all literals?
> 
> Again I stress that this is NO IMPLEMENTATION MODEL. It can serve as a
> REFERENCE for other things.

For what things?

 
> That's how I see the role of a reference model. Not that I take it
> as it is and write my TM database based on it.

To me (personally!) the role of the reference model is that it provides
an abstract, self-contained, logical definition of the objects, operators
and so forth that together constitute an abstract machine with which
users interact when accessing data.


> >   I really urge us not to ignore all the research that has been done in
> >   the RDBMS world for decades.
> 
> Should this mean that you think that TMs should be based on RDBMs
> theory?  

NO! But I resist the idea of a general datatype 'string'. Why not
use the notion of typed literals? What do you think why values in
the relational model are typed and not only opaque strings?


Then why do we have a TM paradigm by itself? 

Yeah, that is the question that needs to be answered first!
(see above)

 
> Some "kind of data" is very tabular. Students who have IDs (no names,
> please), students enrolled in courses, students enrolled in courses in
> special degrees....
> 
> Some data is not so tabular. If you have that you can move into
> OODBMS to give you more flexibility.
> 
> If your data has more variation, then you might want to use an
> XML DB.
> 
> If your data has even more variations, then....you would put it
> into an RDF or TM DB.
> 
> What is behind all that is a tradeoff between the entropical degree of
> 'structure': In a table the next row has EXACTLY the same structure as
> the current row. No surprises here (hence 0 entropy).
> 
> In OODBMs different objects may be a bit different, but not much.
> 
> In XML one <chapter> may be quite a bit different from the other
> <chapter>.
> 
> And in "grey matter" information more surprises may happen.
> 
> This is a fundamental trade-off:
> 
>    Speed vs. Flexibility

Ok, so why do we trade the speed for flexibility? (What are
Topic Maps for that justifies this trade?)

> 
> If I do not expect any surprises, I can exploit any structural
> information completely. This is why RDBMS are so 'fast'. The higher
> you move up in the structure entropy, the more surprises, so there
> are inherent limits on performance.
> 
> A XML DB **CANNOT** be as fast as a RDBMS for the same kind of data.
> 
> > - Interestingly, what you describe is an *Application* of the RM.
> 
> Uhm, hopefully not. :-)
> 
> >   You define (although implicitly) a set of properties and a certain
> >   assertion structure (also only a set of properties as in the
> >   assertion structire propsed by the RM). In essence, you defnine a
> >   TMA (and operations on the properties provided by this TMA).
> 
> We never talk about properties. What is not defined does not exist.

I said 'in essence'. Although there are no handles (topics) in your
model, your names/literals work like SIDPs.

- SIDP1: Name
- SIDP2: LiteralValue

Both provide the ability to talk about 'something'.


> 'Properties' are completely assimilated by assertions:
> 
>    a1 = { < object, xyz-007>, < basename, "Robert" > }
> 
>    a2 = { < object, xyz-007>, < shoesize, 2004 > }
> 
> >   While you say "with a faint similarity with TMRM", let me
> >   clarify that the purpose of the RM is to provide a means to
> >   express TMAs (such as yours, or the TMDM) and to enable
> >   interoperability between them.
> 
> I never postulated this purpose, but now that you mention it, ....
> 
> >   In fact, the RM enables you
> >   to write a mapping TMA between yours and the TMDM.
> 
> I cannot see this. I know that this was the intention, but the TMRM
> never had any formalism, language, .... to actually express a TMA. 

Well, it has been left out on purpose. No problem to add it in. My position
is that such a syntax depends on the overall technological environment
that  Topic Maps are deployed in. If we deploy them in an HyTime/XML/Web context,
markup is likely to be the syntax of choice, but the RM does not constrain
Topic Maps to a particluar technological environment. So why should it
include a syntax?



It
> had only the framework. As I had mentioned earlier, this is like
> building houses without a roof.

Hmm.. will you proposal include such a mechanism in the end?

> 
> >   Rather simplified you can also put it that way: The RM enables
> >   the definition of TM schemas and your paper defines such a
> >   schema, just not in RM language (in essence: TMA == Schema).
> 
> For this argument I would see the \tau model at the same level as
> TMRM. It does not predefine any 'application-specific' names and
> also not a single rule.

Well, it defines Names and Literals.....note that the RM works without
doing so! It is one level of abstraction below.

> 
> My thinking is that a TMA is nothing else as than a proper TMCL
> statement.  Here I would define
> 
>   - what kinds of things do I have,

Why 'kinds'?

>   - how are they structured (properties, ....)

yes
>   - what is my understanding when two things are the same

yes
>   - what app-specific rule must they follow....

??? what do you mean by app-specific?
> 
> If TMCL is based on something which is compatible with the \tau model,
> then that would cover the TMA relationship you, Patrick and Steve,
> et.al.  envisioned.

Do you have any idea how that will look like? Can you sketch what you mean?

> 
> But it would have a language and a sound formalism. That's the
> difference for me. The TMRM, as it stands, is very difficult to digest
> for outside people (withholding 3rd party comments here).

Yes, I agree that there needs work to be done on it. OTH, (see above)
we need the answer to the question what the purpose of Topic Maps are
to have a clear guidance. If we can tell users: "this is what TMs
want to help with" the it is easier to say "and that's why they are how
they are".

I claim that noone has ever come up with a compelling reason why any of
the models need to be as they are.


> Ah, I *always* prefer men with passion and vision over those with
> overpolite poker faces. The latter always survive but they ruined
> the planet.

Yes. I hope you have the time/energy to continue this argument, it is
(at least to me) very revealing.

Jan


> 
> \rho
> _______________________________________________
> sc34wg3 mailing list
> sc34wg3@isotopicmaps.org
> http://www.isotopicmaps.org/mailman/listinfo/sc34wg3

-- 
Jan Algermissen                           http://www.topicmapping.com
Consultant & Programmer	                  http://www.gooseworks.org