[sc34wg3] Re: Going beyond SIPs?

Sun, 19 Sep 2004 13:34:52 +1000

On Mon, Sep 06, 2004 at 04:31:43PM -0400, Steven R. Newcomb wrote:
> In the scenario I was talking about, the person who asserted the
> relationships knew all along what the implications of asserting them
> would be.  Those implications, including the conferral rules, were
> comprehensively disclosed by the TMA.

Alright. In my terminology I would have said "the assertions which
someone has created are governed by an ontology". "Disclosing" would
be "the ontology contains all the necessary rules to describe the
data (=assertions)".

>  If the person who asserted the
> relationships didn't want those implications, he should have made
> assertions that were governed by a different TMA.

Alright. Also in my terminology it is no problem to describe one and the
same data with varying ontologies.

Now on to the discussion "objects, what they are and what not":

> > The strange thing with this exercise is that this only happens to
> > "create these properties for the sake of being combined and then
> > compared", just to figure out whether two topics are now the same or
> > not. Maybe these 'conferred properties' are just thrown away, after
> > they have served their purpose.  What is the benefit (for me, for
> > the formalism, ...) that I have to make two steps instead of just
> > one?
> 
> You seem to be saying that, in topic maps, there is something more
> important than "figuring out whether two topics have the same
> subject".  I'd be interested to know what that more important thing
> might be.

Well, I want to store information there or, to be more precise, use
the paradigm to look at information at a particular way. But, yes, if
I can get this "identification trick" for free, then this is nice.

>  What, for you, are topic maps all about?  For me, topic > maps have
>  always been about facilitating the collation of information >
>  around subjects.  Thus, for me at least, in topic maps there is >
>  nothing more important than discovering whether two assertions are
>  > being made about the same subject.

Well, whether two proxies stand for the same subject or not may depend
on the application. This is not axiomatic for me. In this sense,
"discovering whether two assertions are being made about the same
subject" is something which may or may not be an issue.

My usual example here is that of criminal investigation: If a Mr.
Unknown is involved in a murder and at the same time a Mr.
Sleazo received a lump sum from a known Mafia boss, then I might
suspect that Unknown = Sleazo.

> There are several benefits in making the subject identity properties
> of subject proxies explicit.  I mentioned one of them in my last note:
> facilitating users' ability to point the finger of blame.  (If I had
> to say the same thing in one word, that word would be "auditability".)

No doubt, blaming is always a good motivation :-)

> Another benefit of making SIPs explicit is that such explicitness
> makes it possible for subject proxies to appear simultaneously in
> multiple "semantic coordinate spaces"

Oh, a new word!

> (or, to use the philosophical lingo that I find most intuitive, but
> which seems to make most people cringe: multiple "universes of
> discourse").

And another!

>  It wouldn't be easy for > the semantic address "Q".....

And another!

<rest deleted>

I think, I cringe here for a while.

> Let me put the benefit in another way: If we require that the values
> of subject proxies are always explicit, then TMAs can be modular.

Why?

[ I assume that "value of subject proxies" here stands for "what subject
this thing really". ]

Why is it necessary for modularity that I describe all subject proxies in
that I specify what subjects they are really about?

If we could agree that an ontology is a set of rules which (positively
and negatively) describe the 'nature of the resource' I am looking at,
then the 'set of rules' give a very natural way to modularize. 

> There doesn't have to be one monolithic TMA.  The Topic Maps paradigm
> becomes applicable to maps of all kinds of universes of discourse.

Of course there should _never_ be the need to have one, global TMA.

>   Aside: Personally, I think the modularity of TMAs will turn out to
>          be essential for the success of topic maps, for the same
>          reason that it has been essential for the success of XML that
>          XML documents don't all have to conform to one monolithic
>          Document Type.

Yes, this is the essence of ontology engineering.

>          (But if we're comparing XML with Topic Maps, it's important
>          to recognize a big difference between the two paradigms.  XML
>          does not directly facilitate the collation of information
>          about subjects of conversation, whereas the Topic Maps
>          paradigm does (or at least is supposed to).  In other words,
>          it has never been important for XML documents to provide
>          explicit information about how they identify the subjects
>          that they're talking about, but topic maps really *must*
>          provide such information if, as has always been claimed,
>          independently-created topic maps will merge meaningfully (or
>          at least predictably).)

Yes: _IF_ they are merged. If they are not, then forcing someone to
disclose "identities of subjects" is a fruitless exercise.

> > Unless TMRM frees itself from this ambivalence "sometimes I only
> > look at the assertions, sometimes at the properties" it will not be
> > a minimal model. \tau is minimal in this sense. Only assertions,
> > topics are reduced to - infinitesimal small - points.
> 
> > For me this sounds as if there are two layers, amalgated into one:
> > The lower layer, TMRM-A of completely neutral assertions. They only
> > connect things, but do not have any bias, what is the "object" and
> > what is the "property".
> 
> TMRM has a two-layer vision, too.  Properties are the low level layer;
> properties aren't objects, but they are what objects consist of.  (If
> we're willing to call "subject proxies" "objects".)

Yes, 'properties' connect other values to a node. You may want to call
that node an 'object'. So properties are just special assertions where
there are two node connected.

> > And there is the second (OO) layer, TMRM-O which blesses some things
> > to be more equal than the others by making them objects. Here you
> > can declare existing connections as properties or even define
> > derived (conferred?)  properties which are combinations of other
> > properties.
> 
> Sounds good to me, Robert.  I can see how a TMRM "property instance"
> might be modeled as a \tau "assertion".  Am I getting your point, or
> missing it?

Yes, a specific instance of a property, say 'has-color' would be
a \tau assertion

  { <color, red> , <object, rumpelstilzkin> }

> > >..................................................... Hmmm.  Let's
> > > try this:] A TMA does *not* govern any representations of
> > > information resources.  It only governs *explicit understandings
> > > of representations of information resources*

Hmm, yes.

> > > (here, "explicit understandings" are sets of subject proxies).

Hmmm, maybe.

> > >  This means that we
> > > can have different *understandings* (i.e., different topic map
> > > views [TMVs]) of the same representation of a given information
> > > resource, and that those different TMVs can be governed by
> > > different TMAs.

So be it.

> > If so, then there only needs to be a language to map a TMRM-A
> > instance into a TMRM-O instance. Piece of cake. Maybe ...
> 
> Don't we also have to say something about the connection types that
> are needed?  If there is only one kind of low-level connection, then
> how do we distinguish between the connections available from a given,
> uh, object?  

The idea is that ontology TMRM-A defines 'connection types' and
whatelse is needed and TMRM-O does the same (although with properties
this may look a bit different, superficially).

> > > (For this discussion, let's call such rules "Topic Map View
> > > Construction Rules" (TMVCRs).)
> > 
> > ...this could be its name. Sounds scary enough to me!
> 
> Touch?.  I'm cursed with a propensity to create names that scare
> people.  But, so far at least, I have resisted the temptation to use
> Greek letters as names.  (I'm not sure which approach is scarier,
> actually. (;^)

It is the number of letters in an acronym which make it scary. :-)

> > > among, if we're going to understand each other:
> > > 
> > > (1) A representation of an information resource (which may or may
> > >     not be represented in a syntax that we call a "topic map syntax"
> > >     -- in the TMRM, that distinction doesn't matter).
> > 
> > In my framework this would be the Excel sheet. Does not look like a
> > TM, but certainly can be "thought that way". This is still _outside_
> > the TM framework.
> 
> Outside.  Yes.

So far so good.

> 
> > > (2) An understanding of a representation of an information
> > >     resource (a "topic map view" -- TMV -- a set of subject proxies).
> > 
> > In my 'framework' this would be the a representation of a resource
> > in form of assertions only. No bias here, just the facts. Like the
> > neurons in a brain.
> 
> Good.  (Well... it's not really "facts".  Facts are unattainable.
> Assertions of presumed facts, yes.  Facts, no.)

Sure, assertions of presumed facts.

> > In the Excel sheet example above this would be captured by a
> > "low-level ontology". That simply says, we have these roles and
> > these association types and everything has to be this and that way.
> 
> This is where I get confused about what you are saying.  Here's why: 
> 
> * On the one hand, at the lowest level, you say there's nothing but
>   connections.
> 
> * On the other hand, you say that at the lowest level there are roles.
>   I don't understand how these can exist at that level, except as
>   things whose existence is independent of the existence of individual
>   connections.

Well, yes, at the lowest level there are only connections. But the
"topicmappish" thing with these connections is, that they are all
labelled with roles. At that level there is no concept of an 'object',
so the labels are .... labels.

So your last paragraph above might hit the point.

> > \tau path expressions (or something which is derived from them)
> > should be able to do that. Let us call this ontology level OA
> > (Ontology for assertions).
> 
> Here I know I'm completely lost.  I thought I had a glimmer of
> understanding as to what a \tau path expression was, but the above
> statement persuades me that I don't have a clue, really.  How can a
> \tau path expression be used to define or declare an assertion type?

A \tau expression p is an expression which I can apply to a map m and
which returns something:

   m * p = ...something...

If I use a path expression to ask for "all things which are an
instance of 'cat'", then I will be able to get a list of cats (labels,
names, length of tails, etc.).

OTOH, I can also interpret such a query as a constraint. I ask for
cats and I _expect_ to get at least one. If the map does not contain
any instance of a cat, then the map "does not conform to my
expectation". In other words, the map m violates the constraint p.

The above constraint is non-conditional in that it says "I WANT CATS".
In more realistic cases you would introduce 'conditional rules', such
as "for all cats I want to see an assertion which connects that cat
with a length of its tail". So, first it is checked where the cats are,
and then whether the constraint can be satisfied.

Obviously, it is _incredible_ powerful to reduce queries and
constraints to one common mechanism. But it also puts some pressure on
path expression to be sufficiently rich to express quite a few
constraints, while being 'containable' in terms of computational
complexity.

> *** SIDE-ISSUE DISCUSSION BELOW: ASSERTION REIFICATION ***
> 
> I have a remark about a side-issue raised by your paper,
> http://ausweb.scu.edu.au/aw03/papers/barta2/paper.html, which says:

I will factor this out to another mail.

> *** SIDE-ISSUE DISCUSSION ENDS HERE ***

> > Why make differences for things which are the same? Actually my
> > software just uses different "drivers" for different data. If you
> > say
> > 
> >    my $tm = new TM (tau => 'file:map.xtm + file:map.atm');
> > 
> > then it is the task of the selected drivers (XTM in the first place,
> > and then AsTMa in the second) to _provide a TM view_. It also would
> > work for an Excel sheet:
> > 
> >    my $tm = new TM (tau => 'file:map.xtm + file:accounts.xls');
> 
> Good.  We envision similar requirements.  I vigorously applaud the
> modularity of your vision!  I hope we can persuade more people of the
> overwhelming importance of such modularity.

Well, the best way to convince people is to have _running code_.

> > > I think we need to make a distinction between "constraints" that
> > > are:
> > > 
> > >   (a) criteria of subject sameness detection that, when subject
> > >       sameness is detected, require subject proxies to be merged
> > >       (or, more precisely, to appear to have been merged), and
> > > 
> 
> > >   (b) criteria of semantic, stylistic, etc. validity, such as 
> > >       SteveP's exemplary constraint that the names of countries
> > >       must include at least one that is in the primary language of
> > >       the county.
> 
> > I cannot yet see, why this distinction must exist in the choice of
> > the language to describe it.
> > ....
> ....
> My only objection to using TMCL for both is that it gives two very
> different missions to the TMCL effort.

I would be more concerned if these efforts are split. I see ONE
constraint language and as I have pointed out several times, I also
think that this language should directly or indirectly allow to define
"sameness".

> Each of the two missions is extremely important, quite independently
> of the other.  Personally, I feel that each of the two missions
> deserves the full attention of a dedicated editorial team, and each
> should proceed without being hung up by the problems and
> requirements encountered by the other.  I don't see why the two
> missions are necessarily inseparable, either.

Languages have a tendency to be holistic things, and more often than
not they are things which cannot be easily factorized. Taking out the
concept of 'variables' or 'if-statements' out of a programming
language does not make sense.

Consolidating two languages into one is more work or may even prove
impossible. ISNOGOOD, METHINKS.

> (1) Ontologies constrain what's sayable, and they state the
>     implications -- for the "shape of the view" -- of saying it.

Right.

> (2) On the other hand, it's quite possible, in many topic maps, for
>     conflicting assertions to be made, and it's not the ontology's
>     business to prevent that from happening.  On the contrary: the
>     ontology's business is normally to make sure that it's possible
>     for that to happen!  It's normally also possible (and necessary)
>     for a topic map to be able to contain incomplete information.  The
>     detection of conflicts and incompletenesses are the proper
>     business of something other than an ontology language.  

Yes. No. Maybe.

(a) It is perfectly ok to have any number of ontologies (even 0) for
    a particular (set of) mappish data.

(b) It is perfectly ok to have these ontologies to be consistent or
    to be inconsistent (conflicting) with each other.

(c) You may or may not model in your ontology what you consider to
    be 'consistency' in your data.

    You may have cats with two or more tails or heads if you leave
    that open. It would look pretty inconsistent to me, but maybe
    not for cats at Alpha-centauri.

Consistency in the mappish data does not necessarily have to do
anything with the consistency of different ontologies which describe
that data.

> SteveP's memorable example of a constraint statement makes no
> ontological commitments (it assumes plenty of them, but it makes
> none): "The names of countries should include at least one whose
> language is the principal language of the country."  This "constraint
> statement" is a query with the added semantic that, if there are any
> results (any countries whose sets of names don't include one that's in
> the country's principal language), the topic map is considered not to
> have met the declared constraint.

Looks perfectly ok to me. I cannot see why I would have to
'ontologically commit' myself to things at the outside of a map, when
my only concern is to express my understanding of how /countries/ are
to be connected with other things in my context.

> This constraint statement is only meaningful in terms of an ontology
> that allows for proxies whose subjects are countries, for those
> proxies to have names and principal languages, and for the names to
> have language properties.

Well, the above statement only implies that there are things like
countries, names and languages and that these have specific
relationships in a given map following that constraint. Not more, not
less.

> But all that happens elsewhere -- wherever the ontology is declared.
> SteveP's constraint statement is something quite different; it's not
> constraining or allowing things to be said, and it doesn't have any
> impact on the view.  It's only detecting and reporting on specific
> combinations of things said in the view.

It is expressing an expectation which he has on this view. This is
a perfectly declarative way.

> But I think that saying "No two topics exist which have a sufficiently
> congruent geographic range" doesn't say how the ontological constraint
> is supposed to be achieved.

Uhm, why should it? There are gazillion ways to resolve this issue. In
that case of the cities, bombing it from the face of the earth is only
one of them.

Merging is an obvious one, but not the only way. Maybe in my
application this is a NONO to have such in a map and I have detected
an unresolvable inconsistency.

> If our standard is going to be meaningful, it must allow us to view
> representations of information resources as topic maps in a
> *deterministic* way, so that, given the same resource, and the same
> rules, the same topic map results.

But I would not necessarily want the standard to have one single
'resolution mechanism' hardcoded. I'm not overly religious about this,
but I do not see the necessity to patronize others. Yet.

> I therefore would prefer a statement more like the following:
> "Whenever two topics have a sufficiently congruent geographic range,
> they merge, and the result of their merger is a topic that exhibits
> the union of their geographic ranges as its geographic range."  (Or
> something like that -- something that makes the operation of viewing
> deterministic and implementation-neutral, anyway.)

Well, Bingo.

In your own use case "merging by union" is DEFINITELY NOT useful and
meaningfull. What I end up with is a topic which now contains 2
geographic ranges. The most natural solution here would have been to
create a new range which is the smallest containing the original.

Merge-by-union is ok for many things, I might even say it is ok
at the TMDM-level, but for a _foundational_ model....

> Are you saying that you want the core model to provide a limited
> repertoire of property types, which you call "assertions"?  Is it your
> vision that these property types are then used to assemble objects in
> various ways?  If so, that's a very interesting approach!

What I think is that 'properties' are just special 'assertions'. And -
for a CORE MODEL - I think that the number of concepts should be
minimal. Why should I talk about 'properties' if these can be mapped
into 'assertions' easily?

> > What does this now mean? I think it means that we should keep it
> > architecturally simple:
> > 
> >   - adopt the concept of "data is governed by an ontology"
> 
> >       - this the same as for XML: there is data and there are
> >         schemas
> 
> I like this, but I think you put it too simply.  There's also the
> possibility that the same data can be viewed through multiple
> ontologies.  Also, at your lowest "assertions-only" level, the same
> data can be seen as multiple different sets of assertions, given a
> different set of rules.

I never ruled out that for particular data there may be any number
of ontologies. Same as for XML.

> >       - allows us to define what the structure of a map is
> 
> >           - equivalent: how an application could "talk" to the map
> 
> >           - can be object-focussed, but does not need to be (if it
> >             helps, why not?)
> 
> What else could it be?  I'm baffled by your assertions-only level,
> frankly.  Aren't the role players there, too?  Is an assertion *not*
> an object?  Why isn't it?

I know why I hate that notion 'object'. It is sooooo inflationary and
so meaningless and so overloaded. It should be forbidden, but human
are so focussed on objects that they ignore that the relationships
rule this universe.

Of course, an assertion is an 'object' in the sense as it exists as a
'thingy' (now it is my turn to invent impressive new terms :-). But an
assertion is NOT an object in the object oriented sense. So, with
attributes that have attribute names and values and that blabla.

It may have these, but not necessarily so.

> >   - have a language to transform map
> >       - similar to XML: XSLT, but using _semantic transformations_
> 
> Are semantic transformations something other than data
> transformations?

Everything what can be done with a computer is a data
transformation. The difference between XSLT-based transformations and
_semantic transformations_ is that the former are purely syntactic. I
can build a bit of semantical understanding about a domain into the
XML documents (to allow applications and humans to interpret it
later), but the XSLT trafo is based on the tree.

A _semantical transformation_ would, for instance, honor is-a and
is-subclass-of relationships without me (as user) explicitly coding
this either into the data or into the transformation. It would also
honor other rules which are part of the ontology.

> > Everything else follows from that. A "view" to a map is provided by
> > either
> > 
> >    (a) a second ontology: then somehow it has to be figured out
> >        how a map should be translated (uni- or bi-directional), or
> > 
> >    (b) by a transformation mechanism itself: then the ontology follows
> >        from what this outputs.
> > 
> > See http://ausweb.scu.edu.au/aw03/papers/barta2/mediate.gif .
> 
> Robert, I'm still not getting it, but I'm trying.  I will be more
> convinced that I understand what you're saying when I feel confident
> that I understand exactly what's going on at your assertions-only
> level: 
> 
> * What the semantics of it are and aren't, 

Well, that is exactly what the paper describes:

   http://astma.it.bond.edu.au/junk/tau-model.pdf

> * Why you say it's minimal, 

Because I am not aware that the N concepts we use there can be reduced
to N-1 concepts without loosing the 'topicmappishness'.

> * What the benefits of such minimality are, and

To understand minimality, consider the 2-dimensional cartesian plane.
Every point on this plane can be addressed by a vector, pointing from
the (0,0) to the point. You have an infinite number of such vectors.

That vector can be written as a linear combination of two unit vectors
x_vector, y_vector. So every vector in the plane is of the form

   a * x_vector + b y_vector

If you take away either of the two unit vectors, you loose the
'plane'ish nature of your mathematical structure. The system is 'more
minimal' than that with infinitely many vectors because now you deal
with just two which are stretched appropriately.

If a 'foundational model' is not 'minimal', then it is not
foundational. Then it is just what the author came up ad-hocish
without striving to reduce the number of concepts so long that any
reduction would loose what you want in the first place.  In many
communities this is seen as a major defect.

If, in the vector example above would use 3 unit vectors I would not
be minimal anymore. One of the 3 can be expressed in terms of the
other two.

The benefit of minimality is simplicity. And there cannot be any more
benefit than that.

> * What was traded away in order to achieve that minimality (there's no
>   benefit without cost, I suppose).

As far as I compare it with TMRM, nothing was 'traded' away. I did not
include this - yet undisclosed - disclosure business into \tau,
because we can express it (or at least that part which I did guess
from the discussions) with \tau path expressions.

\rho