[sc34wg3] Re: Going beyond SIPs?

Robert Barta sc34wg3@isotopicmaps.org
Sat, 4 Sep 2004 18:54:05 +1000


On Fri, Sep 03, 2004 at 10:05:03AM -0400, Steven R. Newcomb wrote:
> Robert Barta <rho@bigpond.net.au> writes:
> > But I can top it. Now I would like to identify as equal all persons
> > who are involved in more than 5 associations of type "knows". Slightly
> > bizarre, but possible. Or what about "all persons with the same
> > distance from G.W. Bush should be regarded equal" (not sure how
> > bizarre that is).
> > 
> > In these cases there is no property involved.

> But the last example, too, is completely amenable to this approach.
> .....  In this example, each of the "knows" associations triggers the
> operation of a conferral rule that confers a value component on each
> of the SIPs of each of its role players.

That's exactly what I find unnecessary and actually rather un-topicmappish:

First someone builds a map concentrating on relationships, because it is them,
which carry most of the "meaning" for us humans. The topics are as boring as
bookmarks (if we ignore for a second the "exciting" task to connect them with
the reality).  Then someone else comes along and says "yes, I know that all your
relationships are completely unbiased, but now I pull some topics out of this
map and dogmatically say 'one is now a property of the other'".

The strange thing with this exercise is that this only happens to "create these
properties for the sake of being combined and then compared", just to figure out
whether two topics are now the same or not. Maybe these 'conferred properties'
are just thrown away, after they have served their purpose.  What is the benefit
(for me, for the formalism, ...) that I have to make two steps instead of just
one?

Unless TMRM frees itself from this ambivalence "sometimes I only look at the
assertions, sometimes at the properties" it will not be a minimal model. \tau is
minimal in this sense. Only assertions, topics are reduced to - infinitesimal
small - points.

For me this sounds as if there are two layers, amalgated into one: The lower
layer, TMRM-A of completely neutral assertions. They only connect things, but do
not have any bias, what is the "object" and what is the "property".

And there is the second (OO) layer, TMRM-O which blesses some things to be more
equal than the others by making them objects. Here you can declare existing
connections as properties or even define derived (conferred?)  properties which
are combinations of other properties.

---

> In TMRM-land, an "Application" governs a "topic map view".  It does *not*
> govern any [... uh, phooey, I can't use the term "information resource" the
> way I used to any more.  Hmmm.  Let's try this:] A TMA does *not* govern any
> representations of information resources.  It only governs *explicit
> understandings of representations of information resources* (here, "explicit
> understandings" are sets of subject proxies).  This means that we can have
> different *understandings* (i.e., different topic map views [TMVs]) of the
> same representation of a given information resource, and that those different
> TMVs can be governed by different TMAs.

I have to use a Rorschach method to guess what this could mean. I could imagine
that this means that someone is looking at a 'information resource', say, an
Excel sheet about bank accounts, with TMRM-A glasses on. The only things which
are visible are the connections between the things and with - more or less -
certain identification of proxy with the real world. Theoretically, a programmer
could sit down and write an API which wraps the Excel sheet and offers an
"assertion-only" view to the surrounding application(s).

This is the "bare bones view" of the resource, pretty boring and
unsophisticated.

Now the bank manager wants to have the information interpreted in a particular
way. He wants to see "customers" (with names and addresses), "bank accounts"
(with balances and numbers). And some information should be completely
ignored. Is this what you call a TMV?

If so, then there only needs to be a language to map a TMRM-A instance into a
TMRM-O instance. Piece of cake. Maybe ...

> (For this discussion, let's call such rules "Topic Map View Construction
> Rules" (TMVCRs).)

...this could be its name. Sounds scary enough to me!

> An additional but important detail: If the two TMVs are governed by different
> TMAs, then, obviously, the rules used to understand the representation of the
> information resource must also be different, because they produce subject
> proxies that have different properties.

Accordingly, we would have two different instances of TMRM-O, but only one
TMRM-A. You would need two of these mappings then, yes.

> It's also possible, however, for two different sets of TMVCRs, both of which
> produce TMVs that are governed by the same TMA, and both of which know how to
> handle the same kind of representation of an information resource, to produce
> different TMVs, even though both TMVs are governed by the same TMA and both
> TMVs were created from the same representation of an information resource.

Mumble. :-)

> So there are several things here that we need to distinguish among, if
> we're going to understand each other:
> 
> (1) A representation of an information resource (which may or may not
>     be represented in a syntax that we call a "topic map syntax" -- in
>     the TMRM, that distinction doesn't matter).

In my framework this would be the Excel sheet. Does not look like a TM, but
certainly can be "thought that way". This is still _outside_ the TM framework.

> (2) An understanding of a representation of an information resource (a
>     "topic map view" -- TMV -- a set of subject proxies).

In my 'framework' this would be the a representation of a resource in form of
assertions only. No bias here, just the facts. Like the neurons in a brain.

In the Excel sheet example above this would be captured by a "low-level
ontology". That simply says, we have these roles and these association types and
everything has to be this and that way. \tau path expressions (or something
which is derived from them) should be able to do that. Let us call this ontology
level OA (Ontology for assertions).

> (3) The schema and rules that govern the TMV -- the kinds of
>     properties that the TMV's subject proxies have, its conferral
>     rules etc.  -- a "Topic Map Application (TMA)".

OK, this is just another "description how your data looks like", i.e.  another
ontology. Let us call it 'level OO' (Ontology for Objectified view).

> (4) The rules under which a representation of an information resource
>     is understood as a TMV -- the Topic Map View Construction Rules
>     (TMVCRs).  Every such set of rules is necessarily TMA-specific,
>     but TMAs are not TMVCR-specific.  I.e., there is no limit on the
>     number of sets of TMVCRs that can be used to understand different
>     (or even the same) kinds of representations of information
>     resources as TMVs governed by exactly the same TMA.

Then the rules are simply (!, well) a mapping between OA and OO. If you
are curious how this works, then have a look at

   http://ausweb.scu.edu.au/aw03/papers/barta2/paper.html

There I have mapped literature references (according to one ontology) into a
more object-oriented ontology for BibTeX. This also was the motivation for me to
argue that TMQL has an "transformation component" between topicmappish data.

> ............In an effort to move things forward, I propose to try to avoid
> using the unqualified term "topic map" in our conversation.  If by "topic map"
> I mean "a representation of an information resource that is in XTM notation",
> I'll try to remember to use the term "representation of an information
> resource" instead of "topic map".  If by "topic map" I mean the information
> that a topic map document presumably conveys, I'll try to remember to use the
> term "topic map view".

In my framework, I do not need to distinguish between the two. An "XTM File" is
at the same level like the Excel sheet above. Yes, maybe the work to be done
with an XTM file is less, but both are resources, have a syntax and can be
interpreted as containing topicmappish data.

Why make differences for things which are the same? Actually my software just
uses different "drivers" for different data. If you say

   my $tm = new TM (tau => 'file:map.xtm + file:map.atm');

then it is the task of the selected drivers (XTM in the first place, and then
AsTMa in the second) to _provide a TM view_. It also would work for an Excel
sheet:

   my $tm = new TM (tau => 'file:map.xtm + file:accounts.xls');

> I think we need to make a distinction between "constraints" that are:
> 
>   (a) criteria of subject sameness detection that, when subject
>       sameness is detected, require subject proxies to be merged (or,
>       more precisely, to appear to have been merged), and
> 
>   (b) criteria of semantic, stylistic, etc. validity, such as SteveP's
>       exemplary constraint that the names of countries must include at
>       least one that is in the primary language of the county.

I cannot yet see, why this distinction must exist in the choice of the language
to describe it. Yes, the distinction is there, what you want to achieve
(teleologic distinction), but you would not use two different versions of, say,
Java, only because you write two different applications.

> We need to distinguish these two kinds of "constraints" because one of them
> actually determines the shape of the TMV, while the other only determines our
> attitude toward the TMV -- whether we think the TMV (or, more likely, the
> source materials from which the TMV was constructed) meets certain criteria of
> consistency, etc.

I would mean that "shape we want to see there" and "attitude we want to take"
are too close to call. I would have to see a very good example why both should
not be captured with an ontology.

> > My view is that this is a "constraint": It says "if I had my way,
> > then no two topics in a map exist which have a sufficiently
> > congruent geographic range".
> 
> You always get to have your way; there's no "if" about it.  All you
> need to do is to make either the TMA, or the TMVCRs, or both, be such
> that the TMV is the way you want it to be.

Yes, this is exactly how I meant it.

> >   (b) what must TM software do to make this constraint TRUE, i.e.
> >       change the map in such a way that it does not violate the
> >       constraint?
> > 
> > For (b), the answer seems easy: The software must remove all
> > situations which would conflict with the constraint. It will detect
> > the topics and will - without further control - merge them into
> > one. (A query/transformation language could actually put more control
> > on how the 'merged' map looks like)
> 
> Wait.  Please be patient with me, Robert; I'm having more
> communications problems here.
> 
> Before we can transform something in any rigorously deterministic way,
> we must first understand its existence in a rigorous way.  Until we
> know, with comprehensive explicitness, exactly what we're
> transforming, we are not in a position to be explicit about any
> transformation processes, or about the results of any such process.

Perfectly correct. Let's have a diagram (hopefully it will not be messed up in
transit):

Level           | Data Stack App1       |   Data Stack App2   "is-described-by"   Ontology Stack
================+=======================+=================================== =========================
                |                       |
----------------+-----------------------+----------------------------------- -------------------------
TMRM-O          | think that as objects |   think that as objects                 OO_1 and OO_2
                | for bank manager      |   for internal review people
----------------+-----------------------+----------------------------------- -------------------------
TMRM-A          |              assertions only                                         OA
                |
================+=========================================================== =========================
Resource Level  |                Excel Sheet                                    [ syntax by microsoft,
                                                                                  data by Excel clerk ]
At the resource level (outside the TM space) there is the resource. Just inside
the TM space someone described this resource via a OA: without any bias like what is
a property and what is an object. That is all done in the OO layer.

If so someone wants to transform data from the OA level into OO_1, then - as I
assume both are described using a computer interpretable language - a
transformation can be done. It can actually be done in one or in some cases also
in both directions. The latter you need when you deal with resources which are
not read-only.

--

What does this now mean? I think it means that we should keep it architecturally
simple:

  - adopt the concept of "data is governed by an ontology"
      - this the same as for XML: there is data and there are schemas

  - define an appropriate ontology language which
      - allows us to define "sameness" (or reciprocally "distinction")
      - allows us to define what the structure of a map is
          - equivalent: how an application could "talk" to the map
          - can be object-focussed, but does not need to be (if it
            helps, why not?)
      - unlike in XML: one schema language rules all

  - have a language to transform map
      - similar to XML: XSLT, but using _semantic transformations_

Everything else follows from that. A "view" to a map is provided by
either

   (a) a second ontology: then somehow it has to be figured out
       how a map should be translated (uni- or bi-directional), or

   (b) by a transformation mechanism itself: then the ontology follows
       from what this outputs.

See http://ausweb.scu.edu.au/aw03/papers/barta2/mediate.gif .

\rho