[sc34wg3] TM Data Model issue: prop-subj-address-values

Kal Ahmed sc34wg3@isotopicmaps.org
30 Oct 2003 23:54:08 +0000


On Thu, 2003-10-30 at 21:54, Geir Ove Grønmo wrote:
> * Kal Ahmed
> | My opinion is that [subject locator] has to be a single resource if the
> | concept of a subject-constituting resource is to have any real meaning
> | in a topic map.
> 
> Here we agree 100%.
> 
> | What would it mean for a topic to have two different subject
> | constituting resource locators ? If the two locators resolve to
> | the same resource representation (e.g. mirror sites), then they are
> | *not* the same resource so you have a topic that represents two
> | different subjects and that is not allowed. If you want to assert that
> | the two resources provide representations of the same subject, then
> | you should be using subject indicators.
> 
> I agree. It has to be the exact same resource byte-by-byte. 
> 
> I think there is a balance here being able, in the topic map data model,
> to hold alternative [subject] locators to the same resource without
> forcing all topic map processors to enforce the heuristics of
> identifying them. Either you have to store the equivalent locators in
> the topic map or leave proving-they-are-different-resources to the topic
> map processors. This is the essence I think.
> 
> | More tricky is the case where two locators return the same resource
> | (e.g because of server-side settings that turn
> | http://www.techquila.com/ into http://www.techquila.com/index.html),
> | but in a heuristic (which is what this is), you have to sometimes
> | accept that you need to be inexact to produce something workable.
> 
> I _might_ consider http://example.org/, http://example.org/.,
> http://example.org/foobar/../, http://example.org/nodes.py?id=root,
> http://example.org/index.htm, http://example.org/index.jsp and
> http://example.org/index.html all to reference the same resource even
> though they are different locators. Not sure, but this issue may boil
> down to whether or not the same _resource_ can be referenced by more
> than _one_ URI. This depends on our definition of what a _resource_
> is.

I think that is the key point. The definition of resource is not a
clear-cut thing (not even the relevant RFCs, standards, and TAG
pronouncements seem to match on this). Also there is the issue of what
is a URI - are two URIs equivalent if the resolve to the same resource?
And then you get into a circular trap...

> 
> | If you were to allow multiple subject locators, you would not only
> | allow the arguably correct case of two locators which return the same
> | resource, but also a whole raft of incorrect cases where the two
> | locators return different resources. [subject locator] is the lesser
> | of these two evils.
> 
> If it is incorrect -- then it is the _authors_ fault. No more no
> less. If it is incorrect - that's a human being's fault. Shit in - shit
> out etc. That's life.

No, consider mirrored sites - site A is mirrored by site B, such that
each URL x on A is mirrored to URL x' on B. Most of the time a URL x'
resolves to exactly the same sequence of content bytes as URL x. But
when the resource at x is updated, there is a lag before x' is
synchronised again. So a topic with x and x' as subject indicators
sometimes represents one resource and sometimes represents two
resources.

> 
> Why would we not want to trust topic maps authors?
> 

Its not a question of trust, its a question of the model that we have for 
URIs, resources and subjects and the particular set of heuristics that we
choose. I feel that the original heuristics encoded in XTM 1.0 are right - 
one locator, one resource, one subject.

Cheers,

Kal