[sc34wg3] Re: Going beyond SIPs?

Steven R. Newcomb sc34wg3@isotopicmaps.org
03 Sep 2004 10:05:03 -0400


Robert Barta <rho@bigpond.net.au> writes:

> On Fri, Aug 27, 2004 at 02:17:56PM -0400, Patrick Durusau wrote:
> > You have said more than once that the \tau can go beyond SIPs (hard to 
> > think you can go beyond what is already unlimited but the \tau is your 
> > model so your entitled to your opinion).
> 
> Patrick, et.al.
> 
> What some of us are arguing (I think to remember that Dmitry was with
> me here), is that 'properties' alone are just one way to pinpoint the
> identity of a subject.
> 
> Let us reconsider the example with the cities which are "identified"
> by their geographical coordinates. What happens here:
> 
>   - First someone writes a topic (subject proxy, if you wish) about
>     the city 'konstantinopel'. He creates associations around it, such that
>     it borders the Posperus, but also adds X and Y coordinates (and
>     maybe a radius R) to approximate the location of Konstantinopel.
> 
>   - then another topic about Instanbul is added, similar X/Y, similar R
> 
>   - and then another 'Byzantium' is added. Again, similar X/Y, similar R
> 
> Now, as already discussed, "it depends" whether the three topics
> should be identified (= "regarded as equivalent") or not. A historian
> would probably want to see all three of them, or maybe not.
> Obviously, it is unwise to somehow code this equivalence into the map
> itself. It is something which the "application", i.e. the context how
> the map is used, is supposed to define.

> This is where the TMRM concept 'TMA' enters the stage. Here "identity"
> is defined as "whatever topics have a circle, sufficiently congruent
> shall be regarded the same".

> My view is that this is a "constraint": It says "if I had my way, then
> no two topics in a map exist which have a sufficiently congruent
> geographic range".

> My view is that this is a "constraint": It says "if I had my way,
> then no two topics in a map exist which have a sufficiently
> congruent geographic range".

> There are two questions:
> 
>   (a) what is the formal language (if any) to express this?
> 
>   (b) what must TM software do to make this constraint TRUE, i.e.
>       change the map in such a way that it does not violate the
>       constraint?
> 
> For (b), the answer seems easy: The software must remove all
> situations which would conflict with the constraint. It will detect
> the topics and will - without further control - merge them into
> one. (A query/transformation language could actually put more control
> on how the 'merged' map looks like)

> For (a) the answer is not so simple. Every language has a certain
> degree of expressitivity. The higher that is, the more complex things
> one can express. But the price of this is that at some stage you loose
> the ability to "stay in control". For instance, it may be
> theoretically impossible to always decide whether two constraints are
> effectively saying the same (and are just using a different way to
> do so), or whether two constraints are just contradicting each other
> (making it impossible for maps to satisfy both constraints).
> 
> So this is a thin path to walk.

> ---
> 
> What you guys suggest within TMRM is to consider only properties of
> the subjects in question to be part of such a constraint. This is all
> well and good for many cases like the one above or "all persons having
> the same email address should be regarded the same".
> 
> But it is rather arbitrary. What about the following "constraint":
> 
>   Two "cities" (i.e. this constraint only applies to instances of the
>   concept "city") are to be regarded the same if both are directly or
>   indirectly linked to a geographic item (river, mountain, bay, sea,
>   lake, ...).

> In our case above, all three cities may be linked to the "Posperus",
> so they may be candidates for a merge. Note, that we have left
> completely unspecified the exact nature of linking. This could be
> "is-bordering-at", or "on-the-banks-of".
> 
> Now you could argue that this is still to be done with properties,
> maybe properties which have to be defined ad-hoc-ishly for this very
> purpose.
> 
> But I can top it. Now I would like to identify as equal all persons
> who are involved in more than 5 associations of type "knows". Slightly
> bizarre, but possible. Or what about "all persons with the same
> distance from G.W. Bush should be regarded equal" (not sure how
> bizarre that is).
> 
> In these cases there is no property involved.

As you have correctly understood, TMRM proposes that the basis for
merging is always explicit in the subject identity properties (SIPs)
of the subject proxies to be merged.  And we both agree that your
examples of bases for merging, except, perhaps, for the last example
above, are amenable to this approach.

But the last example, too, is completely amenable to this approach.
Although that example is convoluted and extremely unlikely (it seems
to me to be quite far from any normal idea of what constitutes a
unique subject identity), it is just as amenable as the other examples
to an approach in which the only basis of merging is the values of
SIPs.  In this example, each of the "knows" associations triggers the
operation of a conferral rule that confers a value component on each
of the SIPs of each of its role players.  The subject-sameness
detection rule then sees which of those role players has more than
five such value components.  Q. E. D.

> I am not saying that a future TMCL should have these features, what I
> am saying is that we should orient ourselves towards the expressivity
> and not an artificial selection of what is going to be compared.

I don't see how expressivity is limited -- in any way! -- by requiring
that the basis of merging be explicit in the subject proxies.  What
is, in fact, limited by this requirement is the potential for
confusion about why things were merged, and the potential for
underspecification of merging rules -- rules that must be specified in
order for us to know what a representation of an information resource
is supposed to mean when it is viewed as a topic map.  Making the SIPs
explicit is an essential aid to users of topic maps when different
systems yield different results; it allows users to point the finger
of blame.  

***

Now I want to go back over certain parts of your note.

> Now, as already discussed, "it depends" whether the three topics
> should be identified (= "regarded as equivalent") or not. A historian
> would probably want to see all three of them, or maybe not.
> Obviously, it is unwise to somehow code this equivalence into the map
> itself. It is something which the "application", i.e. the context how
> the map is used, is supposed to define.

I want to use what you have just said to highlight what I
regard as an important reason for miscommunication in our community,
and then make some distinctions that I think will eventually give us
room on the ground to stand together.

In TMRM-land, an "Application" governs a "topic map view".  It does
*not* govern any [... uh, phooey, I can't use the term "information
resource" the way I used to any more.  Hmmm.  Let's try this:] A TMA
does *not* govern any representations of information resources.  It
only governs *explicit understandings of representations of
information resources* (here, "explicit understandings" are sets of
subject proxies).  This means that we can have different
*understandings* (i.e., different topic map views [TMVs]) of the same
representation of a given information resource, and that those
different TMVs can be governed by different TMAs.  

An additional but important detail: If the two TMVs are governed by
different TMAs, then, obviously, the rules used to understand the
representation of the information resource must also be different,
because they produce subject proxies that have different properties.
(For this discussion, let's call such rules "Topic Map View
Construction Rules" (TMVCRs).)  It's also possible, however, for two
different sets of TMVCRs, both of which produce TMVs that are governed
by the same TMA, and both of which know how to handle the same kind of
representation of an information resource, to produce different TMVs,
even though both TMVs are governed by the same TMA and both TMVs were
created from the same representation of an information resource.  

So there are several things here that we need to distinguish among, if
we're going to understand each other:

(1) A representation of an information resource (which may or may not
    be represented in a syntax that we call a "topic map syntax" -- in
    the TMRM, that distinction doesn't matter).

(2) An understanding of a representation of an information resource (a
    "topic map view" -- TMV -- a set of subject proxies).

(3) The schema and rules that govern the TMV -- the kinds of
    properties that the TMV's subject proxies have, its conferral
    rules etc.  -- a "Topic Map Application (TMA)".

(4) The rules under which a representation of an information resource
    is understood as a TMV -- the Topic Map View Construction Rules
    (TMVCRs).  Every such set of rules is necessarily TMA-specific,
    but TMAs are not TMVCR-specific.  I.e., there is no limit on the
    number of sets of TMVCRs that can be used to understand different
    (or even the same) kinds of representations of information
    resources as TMVs governed by exactly the same TMA.

So, now, you may well ask, "What's a 'topic map'?"  Our use of this
term to mean different things has led to lots of misunderstandings and
general paralysis.  In an effort to move things forward, I propose to
try to avoid using the unqualified term "topic map" in our
conversation.  If by "topic map" I mean "a representation of an
information resource that is in XTM notation", I'll try to remember to
use the term "representation of an information resource" instead of
"topic map".  If by "topic map" I mean the information that a topic
map document presumably conveys, I'll try to remember to use the term
"topic map view".

***

> This is where the TMRM concept 'TMA' enters the stage. Here "identity"
> is defined as "whatever topics have a circle, sufficiently congruent
> shall be regarded the same".

> My view is that this is a "constraint": It says "if I had my way, then
> no two topics in a map exist which have a sufficiently congruent
> geographic range".

I think we need to make a distinction between "constraints" that are:

  (a) criteria of subject sameness detection that, when subject
      sameness is detected, require subject proxies to be merged (or,
      more precisely, to appear to have been merged), and

  (b) criteria of semantic, stylistic, etc. validity, such as SteveP's
      exemplary constraint that the names of countries must include at
      least one that is in the primary language of the county.

We need to distinguish these two kinds of "constraints" because one of
them actually determines the shape of the TMV, while the other only
determines our attitude toward the TMV -- whether we think the TMV
(or, more likely, the source materials from which the TMV was
constructed) meets certain criteria of consistency, etc.  I think that
if we continue to call both of these kinds of things "constraints", we
will continue to create avoidable misunderstandings and paralysis.

Sorry, now let me repeat your statement so I can make just one more
remark about it:

> My view is that this is a "constraint": It says "if I had my way,
> then no two topics in a map exist which have a sufficiently
> congruent geographic range".

You always get to have your way; there's no "if" about it.  All you
need to do is to make either the TMA, or the TMVCRs, or both, be such
that the TMV is the way you want it to be.

***

> There are two questions:
> 
>   (a) what is the formal language (if any) to express this?
> 
>   (b) what must TM software do to make this constraint TRUE, i.e.
>       change the map in such a way that it does not violate the
>       constraint?
> 
> For (b), the answer seems easy: The software must remove all
> situations which would conflict with the constraint. It will detect
> the topics and will - without further control - merge them into
> one. (A query/transformation language could actually put more control
> on how the 'merged' map looks like)

Wait.  Please be patient with me, Robert; I'm having more
communications problems here.

Before we can transform something in any rigorously deterministic way,
we must first understand its existence in a rigorous way.  Until we
know, with comprehensive explicitness, exactly what we're
transforming, we are not in a position to be explicit about any
transformation processes, or about the results of any such process.

So when we talk about creating a language whose purpose is to specify
merging rules, we need to recognize that such a thing is not a
language for "transforming" topic map views.  It is a language for
declaring what a particular kind of topic map view *is* in the first
place.  Having first established what a TMV *is*, *then* we are in a
position to talk about transforming it.  We cannot say or imply that
declaring what a topic map view *is* is the same thing as declaring a
*transformation* of it.  The two things are necessarily distinct, and
existence is prior to tranformation.

-- Steve

Steven R. Newcomb, Consultant
Coolheads Consulting

Co-editor, Topic Maps International Standard (ISO 13250)
Co-drafter, Topic Maps Reference Model 
  (http://www.coolheads.com/SRNPUBS/ontolog040610)

srn@coolheads.com
http://www.coolheads.com

direct: +1 540 951 9773
main:   +1 540 951 9774
fax:    +1 540 951 9775

208 Highview Drive
Blacksburg, Virginia 24060 USA