[sc34wg3] Re: paradigmatic PSIs

Lars Marius Garshol sc34wg3@isotopicmaps.org
10 Apr 2002 11:03:00 +0200


(Added the SC34WG3 list to the receivers. I think this thread is
sufficiently important to warrant that. Suggest the rest of you do the
same with your postings.)

* Steven R. Newcomb
| 
| As presently constituted, at least to the best of my understanding
| of it, the Topic Maps paradigm uses pieces of addressable
| information as binding points for specific semantics.
| 
| [...]
| 
| Since Web servers aren't currently able to report whether two
| addresses address the same location, merging processes normally fall
| back on the simple heuristic that two identical URIs address the
| same thing, and two different URIs are assumed to address different
| things.  This heuristic works pretty well, except that some merging
| opportunities are missed when the same subject indicator is
| addressed by two different addressing expression.  (That's not good,
| but, so far, it hasn't been a killer problem.  It's probably more
| tolerable when topics *don't* merge that *should* merge, than when
| topics *do* merge that *shouldn't* merge.)

I agree with all of what you wrote here wholeheartedly.

There is also another reason why merging processes (as well as
processes that react to particular PSIs and treat them specially) do
not attempt to resolve URIs: it is *very* slow. Merging large topic
maps with large numbers of subject indicators in this way would just
not be feasible. (And in some cases, as Jim pointed out, it would not
even be possible, since one might be in a location where the URIs
cannot be resolved.)

This is also why the PubSubj TC introduced a new term, BTW: published
subject *identifier*, meaning the URI of a published subject
*indicator*. For merging and identification of subjects, the
identifier is used, while humans use the indicator. Bernard pointed
out that in the future computers might also use the indicator, and I
agree with that, even if we don't know of any such uses now.
 
| Some people might claim that PSIs must be deliberately created for
| all subjects on which merging is supposed to occur, and that each
| such PSI must list its own canonical addressing expression.  That's
| a good idea, as far as it goes, but it does not account for reality.
| [...]

I even agree with this part. If we could use any piece of text
anywhere as a subject indicator in a reliable way that would be
great. Current web infrastructure makes this very hard, though.

| The simple heuristic of comparing addressing-expression strings is
| what's behind the idea that XML namespaces, for example, can be
| identified by fictitious and non-resolvable URIs.  However, such
| simple, unsubstantiated string comparisons cannot form a trustworthy
| basis for the semantic merging features of the Topic Maps paradigm.
| There would be no unbroken chain of responsibility; the rules of
| evidence would be violated.  The same fictitious addressing
| expression could be created by two different authors to represent
| two different subjects, without having any way to know about each
| other's use of the same string. The use of a fictitious binding
| point is an irresponsible act that endangers the value not only of
| the topic map that utters it, but also of all the topic maps with
| which it may be someday be merged.

I am not sure what you mean to say with this. Stupidity is real, and
cannot be outlawed, but I am not sure I consider this to be that much
of a problem.
 
| So we have a serious dilemma.  
| 
| * If there's no common "place" (or no common list of
|   equivalent "places") that is used as a surrogate for
|   a given subject, then there's no firm basis for
|   responsible topic map authoring in support of
|   universal mergeability.

Agreed. This is a problem for the PubSubj TC as well. If we publish
the subject indicators both as HTML and XTM, which document is the
"common place"?
 
| Here are some alternatives for resolving this dilemma:
| 
| (1) We can say that for all topic maps, forevermore,
|     there are these few *exceptional* topics which,
|     unlike any others, are found in *all* topic maps,
|     but, curiously enough, have no binding points,
|     themselves.  Every topic map processing system must
|     treat these particular topics differently from all
|     others, merging them even in the absence of any
|     binding points.

This sounds like a kluge to me. I would prefer not to do this.
 
| (2) We can use non-Web addressing, such as HyTime
|     <bibloc>, or SGML FPIs, to point at the relevant
|     parts of the Standard, considering the addressed
|     components as offline resources.  We can even
|     create a special IETF addressing "scheme" (like
|     HTTP, except that the addressing expressions are
|     unresolvable and there's no protocol).

This sounds like a symbolic URN scheme. There are quite a few
proposals for such schemes floating around, some of them very
interesting, and possibly useful for this.

<URL: http://www.taguri.org >
<URL: http://www.ietf.org/internet-drafts/draft-palmer-esl-uri-00.txt >

(Alternative for the first URI:
<URL: http://lists.w3.org/Archives/Public/uri/2001Apr/0013.html >)

These URI schemes could be used, methinks, especially the tag one.

| (3) We can put the relevant components of the standard
|     on the Web, advertise their existence as PSIs, and
|     urge that their canonical addresses (which we will
|     provide) be used as the basis of merging.

We could do this, but the only advantage of (3) over (2) is that here
the PS identifiers actually resolve to the PS indicators. The
disadvantage is that at some point in the future they will almost
certainly *not* do so any more.

I think I prefer (2).
 
| I've been assuming that (2) is an unworkable idea, [...]

Well, no. We can do it with URIs.

| Perhaps even more to the point, if we choose (2), we sacrifice both
| mergability and the trustworthiness of merged topic maps.

Why? What do you mean?

| So, as I see it, if we choose to be neither self-destructive nor
| self-deluded, we're left with the third choice, which is to put the
| paradigmatic PSIs on the Web.  Now the question becomes, "How best
| to put these things on the Web," rather than "Whether to put these
| things on the Web."

I'm not going to oppose this if it's what we choose, but I am not sure
we need to go this route.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >