[sc34wg3] Best practices for representing occurrences. or why occurrences are not associations?

23 Feb 2003 23:13:23 +0100

* dmitryv@cogeco.ca
| 
| I think that in many usage scenarios (covered by 80/20 rule)
| information resources have to be explicitly represented in topic map
| as first class entities (as reified information resources).

I agree completely.

| First approach works well if we need to represent additional
| information about occurrence itself (such as "strength", for
| example).  

This is the correct way to do it, as the "strength" concept applies to
the relationship between the subject and the resource (that is, the
occurrence) rather than to the resource itself.

| But I am not sure if it works well for representing such information
| as "Publishing Date" , "Authors(s)" because we typically have
| many-to-many relationship between "domain" objects and information
| resources.

Again you are exactly right. "Publication date" is not a property of
the subject/resource relationship (the occurrence), but of the
resource itself. Therefore the right approach is to reify the resource
rather than the occurrence.

As you hint above doing it the other way would be awkward as you would
then have to attach the publication date information to each
occurrence of the same resource, which would be redundant and bad
practice. 

| In fact, XTM 1.0 has sample in section 3.9.1 which demonstrates this
| approach.

I would say that this example is incorrect. Steve Pepper explains the
background really well in this email:
<URL: http://www.infoloom.com/pipermail/topicmapmail/2003q1/004427.html >

| Personally, I like second approach more, but in this case we are
| kind of "losing" occurrence concept. We in fact introduce new (no
| standard for that?) association which plays role of advanced
| occurrence (between topic and reified information recourse). Because
| it is not a standard association software tools can not really use
| it in a compatible mode.

Actually, they can. If you use

  <topic ...>
    <subjectIdentity>
      <resourceRef xlink:href="..."/>
    ...

software will know that this topic represents the information resource
referred to using the URI. If that information resource is an
occurrence of a topic the software will also know that the two URIs
are in fact referencing the same information resource.

| I am totally with authors regarding these benefits. But I think, it
| is also a good example of how concept of occurrences "disappears"
| with this approach. As you know Omnigator has a special window for
| occurrences. This window allows quickly show all resources available
| about specific topic. In case of "let's introduce associations
| instead of occurrences" all document references are mixed with other
| associations. 

That's an Omnigator issue, really.

| I am not saying that it is bad. I am just saying if a good practice
| is to reify resources as topics and use special "occurrence" kind of
| associations (such as "Mentioned in") why do we need regular
| occurrences?

Strictly speaking we do not need either occurrences or base names,
since both are representable using associations (which is of course
exactly what the RM does). On the other hand, there is a rough
consensus in the TM community that the TAO model of SAM/XTM makes some
distinctions that are very useful for humans and machines alike.

| 1. Define that "occurrence" in XTM is a shortcut (the same way as
| instanceOf ) of association of type "occurrence" which is a subtype
| of most general association type "assertion".

This is essentially what the RM does. 

| 2. Define that <resourceRef> in occurrence element is a shortcut for
| reified as topic resource with subject address equals to URI from
| <resourceRef>

We could do that, but I am hesitant to do so, I must admit.
Essentially it means giving up the TAO model, which has proved to be
very useful.

There's nothing technically wrong in what you say, but I think at the
end of the road you are starting us down lies RM and RDF, both of
which are good models in several ways, but they are not topic maps as
I (and ISO 13250:2002) understand that term. Of course, we shouldn't
be wedded to the concept of occurrences for historical reasons only,
but I do think the occurrence concept is useful both for human beings
and for machines.

That said, I do like your idea, and have been entertaining similar
thoughts at times myself. It's worth thinking this through a couple
more times before we make up our minds.

| We can do the same "trick" with resourceData element. Let's extend
| "subjectIdentity" element and allow "resourceData" inside of this
| element.  We can reify "resourceData" the same way as we did with
| occurrence.

This is closely related to SAM issue strings-as-subjects. Graham and I
will probably propose that this be achieved with

  <topic ...>
    <subjectIdentity>
      <resourceRef xlink:href="data:,42"/>
    ...

or something similar.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >