[sc34wg3] Occurrences in the data model

08 Jan 2004 07:43:33 +0100

* Patrick Durusau
| 
| In data model terms (checking my understanding here so please
| correct if necessary) the occurrence has the values marked by ()'s:
| 
| 1. [value]: A string or null. (04h27m57.3s)
| 
| 2. [resource]: A locator item or null. (null)

Yep.

| 3. [scope]: A set of topic items. (unconstrained, inferred from the
|     definition of scope in 5.4.4)

The empty set, actually. (Though that does signify the unconstrained
scope.)

| 4. [type]: A topic item, or null. (<topicRef xlink:href="#ra"/>)

Contains the topic item that resulted from the <topic id="ra">
element. Not sure whether that's what you meant.

| 5. [reifier]: A topic item, or null. (null)
| 
| 6. [source locators]: A set of locator items. (none)
| 
| 7. [parent]: An information item. (<topic
|     id="*2MASXiJ0427573+261918">) (computed value)

Yep.

| Assuming the foregoing analysis is correct, then merger of another
| occurrence item would not occur:
| 
| 1. If the second occurrence (same parent) used a resource (#2) to
| point at a resource with the same value;
| 
| 2. If the second occurrence (same parent) had the same value (#1) but
| a different scope (#3);
| 
| 3. If the second occurrence (same parent) used a resource (#2) and had
| a different scope (#3);
| 
| 4. If the second occurrence had a different parent (#7) but all other
|     values were identical.

Absolutely correct.

| I don't find 1-3 (inclusive) troubling but the failure on #4 seems
| problematic.

I'm not surprised. We did a lot of back and forth with SRN on this,
but I *think* he eventually came to see it our way. Maybe you will,
too. :)

| To compell merging of occurrences within a topic to eliminate
| duplicate entries makes sense, although one assumes it will be a
| limited number of cases where duplicates will really be a problem.

Experience seems to indicate that so far, yes.

| The more interesting case arises with location information, such as
| Right Ascension/Declination in astronomy, longitude and latitude in
| GIS systems (and targeting systems), where finding all the
| occurrences that share a point on a particular axis could well be
| important.

I agree that's an important use case, but it turns out that there is a
conceptual reason for not merging here, as you suspected. As the TMDM
CD says occurrences are the *relationships* between subjects and
information resources.

So let's say another topic had the same 'ra': if you then merged the
two occurrence items, you would also merge any topics reifying them,
and you would effectively be asserting that the relationship between
topic X and 'ra' value Y is the same as the relationship between topic
Z and 'ra' value Z, and clearly that is not the case.

The string values are the same, but there's no notion of merging
string values in the model, because strings are primitive, immutable
objects anyway, and so within the model there's really no way to tell
whether two equal strings are the 'same' string or not.

Note also that finding two different topics that have the same value
for the 'ra' occurrence type can be done like this using tolog 1.0:

  select $T1, $T2 from
    ra($T1, $V),
    ra($T2, $V),
    $T1 /= $T2?

In short: you don't have to merge the occurrence items to be able to
do what you want, and there's a conceptual reason for not doing it.

| Note that I don't think making coordinates topics would solve the
| problem as given the fine grained nature of coordinate systems there
| would be a proliferation of topics for any relatively sophisticated
| system of coordinates. Not to mention that coordinates are commonly
| thought to be characteristics of objects/locations and not subjects
| in their own right.

I fully agree.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >