[sc34wg3] RE: paradigmatic PSIs

Mason, James David (MXM) sc34wg3@isotopicmaps.org
Tue, 9 Apr 2002 16:16:02 -0400


I am leaving Steve's long and thoughtful message intact below for reference.

However, I am about to disagree with it philosophically, practically,
legalistically, and poetically.

I believe that anything in the standard is there by definition and not by
being addressable. It is therefore not necessary for it to *be anywhere*. It
simply *is*. An implementer just writes it directly into a program (assuming
the program is doing something that is needs that part of the standard).

Merging, and other TM operations, should require tracing things back just as
far as necesssary and no further. What that point is will depend on what's
being merged, but it will always be domain-specific. A consequence is that,
unless you're writing a commentary on the standard, it should never be
necessary to trace things all the way back to *any* subject in the standard,
and therefore there is no need for any PSIs in the standard except for such
references. 

We should stop well short of infinite regress. (I clearly do not believe
that what Steve calls "universal mergeability" is an achievable goal. If
someone wants to merge Pepper's Operamap with my map on Nuclear Assembly
Systems (assuming someone could even get at that latter map), fat luck; I
certainly hope there are no points in common and therefore no PSIs that can
be used for merging.)

My choice from Steve's alternatives below is obviously (1). 

I think this is the only workable approach, even if it leaves Steve less
than philosophically satisfied. If merging and other operations depend on
pointing all the way back to some PSIs from the standard placed in some
addressable spot, then I can't use Topic Maps. There is no way that I'm
going to be able to put those PSIs in all the places I'm going to need them.
In particular, having to address some network address is going to prevent me
from doing processing on my classified system which has no connection to any
network when I'm up in development mode. 

I shouldn't have to refer to some PSI that defines "association" or "role".
Living without directly addressing anything in the standard is a workable
proposition: I get along quite well in K42 and Omnigator on my stand-alone
box. So neither of them implements absolutely everything in ISO/IEC 13250 --
they do enough for me to get my customer's job done. (Does anyone think
everything in ISO 8879 has been implemented?)

Steve's (1) asserts that "Every topic map processing system must treat these
particular topics differently from all others". Precisely. They're in a
standard. That's what being in a standard means. They are intrinsically
different from anything any user could put into any practical map (except
for a commentary . . .). They're like bootstrap code.

Aside from my practical inclination to take Occam's entity remover to (2)
and (3), they're outside the scope of SC34 (except <bibloc>) and therefore
outside the scope of ISO/IEC 13250. Therefore, they shouldn't be part of a
discussion of SC34 standards.

I'll leave you with a couple favorite lines from Archibald MacLeish's "Ars
Poetica":

		A poem should not mean
		But be.

		(http://www.poets.org/poems/poems.cfm?prmID=985)

Jim Mason



> -----Original Message-----
> From:	Steven R. Newcomb [SMTP:srn@coolheads.com]
> Sent:	Tuesday, April 09, 2002 8:49 AM
> To:	Mason, James David (MXM) 
> Cc:	shunting@etopicality.com; Michel Biezunski; jan; masonjd@ornl.gov;
> em@w3.org; bernard.vatant@mondeca.com; nogievet@cogx.com;
> larsga@garshol.priv.no
> Subject:	Re: paradigmatic PSIs
> 
> "Mason, James David (MXM) " <mxm@ornl.gov> writes:
> 
> [a lot of good stuff]
> 
> Maybe we have to face a subtle techno-philosophical
> question, right here and right now.
> 
> As presently constituted, at least to the best of my
> understanding of it, the Topic Maps paradigm uses
> pieces of addressable information as binding points for
> specific semantics.  
> 
> If we regard a component of an ISO standard as a
> published subject indicator (PSI), in what sense is
> that component "a piece of addressable information"?
> 
> In ISO standard is not a piece of addressable
> information in the sense that computers can be expected
> to retrieve it from any single canonical address.
> That's a fact, and we can't change it.
> 
> More to the point, an ISO standard is not a piece of
> addressable information in the sense that, if two
> address-resolution processes arrive at it, they will be
> able to tell that they have both arrived at the same
> place.  The reason is that there's no *place* there for
> the two processes to meet, shake hands, and perhaps
> tell each other the different routes they took in order
> to arrive at that same place.  The necessity that
> binding points must really exist comes from the need to
> know whether or not two different addressing
> expressions (such as two non-identical URIs) address
> the same or different things.  Accurate merging depends
> on knowing the identities -- the unique locations -- of
> the binding points.
> 
> Since Web servers aren't currently able to report
> whether two addresses address the same location,
> merging processes normally fall back on the simple
> heuristic that two identical URIs address the same
> thing, and two different URIs are assumed to address
> different things.  This heuristic works pretty well,
> except that some merging opportunities are missed when
> the same subject indicator is addressed by two
> different addressing expression.  (That's not good,
> but, so far, it hasn't been a killer problem.  It's
> probably more tolerable when topics *don't* merge that
> *should* merge, than when topics *do* merge that
> *shouldn't* merge.)
> 
> Some people might claim that PSIs must be deliberately
> created for all subjects on which merging is supposed
> to occur, and that each such PSI must list its own
> canonical addressing expression.  That's a good idea,
> as far as it goes, but it does not account for reality.
> In reality, the overwhelming majority of subject
> indicators are not marked up as PSIs, and will never be
> marked up as PSIs, but, even so, they are often the
> very best, the most stable, and the most authoritative
> PSIs.  (E.g., product catalog entries, laws and
> regulations, technical data points (such as those
> published by USP, NIST, DIN, etc.), any parts of any
> standards of any kinds published by anybody before
> Topic Maps were invented, etc. etc.)  The reality is
> that all these various kinds of things must ultimately
> be usable as subject indicators -- as binding points
> for subject-based merging.  Eventually, they WILL be
> used as binding points, regardless of whether they are
> marked up as XML.  And, we must not assume that the
> current limitations of Web-based addressing will endure
> forever.  Knowledge, and the basic requirements of
> knowledge management, will far outlast the Web as we
> know it.
> 
> The simple heuristic of comparing addressing-expression
> strings is what's behind the idea that XML namespaces,
> for example, can be identified by fictitious and
> non-resolvable URIs.  However, such simple,
> unsubstantiated string comparisons cannot form a
> trustworthy basis for the semantic merging features of
> the Topic Maps paradigm.  There would be no unbroken
> chain of responsibility; the rules of evidence would be
> violated.  The same fictitious addressing expression
> could be created by two different authors to represent
> two different subjects, without having any way to know
> about each other's use of the same string.  The use of
> a fictitious binding point is an irresponsible act that
> endangers the value not only of the topic map that
> utters it, but also of all the topic maps with which it
> may be someday be merged.
> 
> So we have a serious dilemma.  
> 
> * If there's no common "place" (or no common list of
>   equivalent "places") that is used as a surrogate for
>   a given subject, then there's no firm basis for
>   responsible topic map authoring in support of
>   universal mergeability.
> 
> * ISO 13250 has subjects -- and even PSIs for those
>   subjects -- that are basic for *all* topic maps, and
>   that must *always* be merged whenever two topic maps
>   are merged, but the standard isn't located at any
>   canonical place or places.  It has no binding points.
> 
> Here are some alternatives for resolving this dilemma:
> 
> (1) We can say that for all topic maps, forevermore,
>     there are these few *exceptional* topics which,
>     unlike any others, are found in *all* topic maps,
>     but, curiously enough, have no binding points,
>     themselves.  Every topic map processing system must
>     treat these particular topics differently from all
>     others, merging them even in the absence of any
>     binding points.
> 
> (2) We can use non-Web addressing, such as HyTime
>     <bibloc>, or SGML FPIs, to point at the relevant
>     parts of the Standard, considering the addressed
>     components as offline resources.  We can even
>     create a special IETF addressing "scheme" (like
>     HTTP, except that the addressing expressions are
>     unresolvable and there's no protocol).
> 
> (3) We can put the relevant components of the standard
>     on the Web, advertise their existence as PSIs, and
>     urge that their canonical addresses (which we will
>     provide) be used as the basis of merging.
> 
> If we choose (1), I think we're implicitly saying that
> we're just kidding about Topic Maps; they don't really
> work.  The paradigm can't really handle *all* subjects,
> without any exceptions, because we're making some
> exceptions, right in the Standard itself!  This
> solution would be self-destructive.  We'd have to add a
> special merging rule just for these few topics, instead
> of using the same merging rule that is supposed to
> apply to everything, and that gives the paradigm its
> basic power to do semantic integration.  Also, the
> standard would be perceived as being unable to "eat its
> own dog food," if I may borrow this expression from
> Microsoft's internal development culture.  That
> perception, if accurate, would thoroughly discredit the
> standard.
> 
> I've been assuming that (2) is an unworkable idea,
> because very few people are planning to start using and
> supporting HyTime <bibloc>s, or SGML FPIs, or anything
> like them.  URIs are here to stay.  The reality is that
> "being on the Web" is the operative definition of both
> "being visible" and "being resolvably addressable".  We
> cannot afford to ignore that reality.  Perhaps even
> more to the point, if we choose (2), we sacrifice both
> mergability and the trustworthiness of merged topic
> maps.
> 
> So, as I see it, if we choose to be neither
> self-destructive nor self-deluded, we're left with the
> third choice, which is to put the paradigmatic PSIs on
> the Web.  Now the question becomes, "How best to put
> these things on the Web," rather than "Whether to put
> these things on the Web."
> 
> Maybe I'm missing your point altogether, Jim.  Or,
> maybe I'm being too dogmatic (or dogfood-matic).  But I
> see this as an opportunity to demonstrate how
> communities can protect the integratability of their
> knowledge assets; we should take this opportunity to
> "eat our own dog food", and, in eating it, show others
> how to eat theirs.
> 
> -- Steve
> 
> Steven R. Newcomb, Consultant
> srn@coolheads.com
> 
> Coolheads Consulting
> http://www.coolheads.com
> 
> voice: +1 972 359 8160
> fax:   +1 972 359 0270
> 
> 1527 Northaven Drive
> Allen, Texas 75002-1648 USA