[sc34wg3] The interpretation of facets

Wed, 23 Apr 2003 14:44:26 +0200

The draft HyTM specification in N391 contains the first complete
attempt to explain the interpretation of facets with respect to
the model implicit in XTM. This was way past due and is a sign
that the restatement of 13250 really is leading to positive
results. However, I don't believe that the interpretation given
in N391 is correct. This posting outlines what I believe is a
more correct interpretation and a couple of problems I see.

Facets were always seen as being somewhat orthogonal to the
basic topic map model. I remember long discussions about whether
they should be in the standard at all. It was claimed by some
that facets were unnecessary once the concept of scope was
introduced to the model (courtesy of Whataburger), but in the
end a compromise was reached and facets made it into 13250:2000.

During the development of XTM, however, facets were finally
thrown out as being redundant. Unfortunately the allegation of
redundancy was never (to my knowledge) fully explained anywhere.
Here's my take on why facets are redundant and how they should
be interpreted when processing HyTM documents.

Facets are essentially a poor man's RDF with hooks into the
topic map model. They are like RDF in that they allow the
assignment of property-value pairs to information resources; the
"hooks" are the ability to represent those properties and values
as topics.

A facet element contains a set of fvalue elements. Each fvalue
element can be regarded as a set of triples, each of which
corresponds exactly to a simple RDF statement. An fvalue element
gives rise to one triple for each locator it contains. In RDF
terms, each triple is composed as follows:

   subject:    locator
   predicate:  facet (type)
   object:     facet value (type)

Here is a simple facet element, taken from the Italian Opera
topic map:

   <facet type="language">
     <fvalue type="norwegian">puccini.htm</fvalue>
   </facet>

This states that the information resource shown has a "language"
property whose value is "norwegian". In this case (because of
the use of 'type' attributes) we know that both "language" and
"norwegian" are (references to) topics.

When 13250 was standardized in 1999/2000 the notion of resources
as subjects, i.e. addressable subjects, was not clear to us. It
should have been, of course, given the definition of "subject",
but it wasn't. It became clear and was spelled out in XTM 1.0:
Resources can also be subjects (since a subject can be any thing
at all) and therefore topics can represent resources.

Once this was realized it became clear that any assertions that
needed to be made about resources (such as their language) could
be done using topics, associations, and occurrences. Facets were
simply not needed. Everything then became much cleaner, much
simpler, and much more concise. We also get closer to the claim
that "the only way to say anything about any thing in topic maps
is by making it a topic".

What then should be the interpretation of a facet element in
terms of the SAM (when deserializing HyTM)?

I believe that each (locator,facet,fvalue) triple should, in
principle, give rise to an association whose type reflects the
facet and where the role playing topics reflect the locator and
the facet value.

I say "in principle" because there are two unresolved issues in
my mind:

(1) If the nature of the property and value are not specified
via 'type' attributes, but rather via 'linktype' and 'facetval'
attributes respectively (or even by GIs), then it could be
argued that they should not be represented by topics, but simply
by strings or tokens. This would work for a facet value, which
could become the [value] of an occurrence item, but it would not
work for facets themselves because association types and
occurrence types must be topics. The resolution of this issue
depends on the resolution of the more general issue of how to
interpret HyTM mnemonics (i.e., HyTM-topic-linktypes etc.).

(2) It is not certain that the semantics of the topic that
represents the facet type is such that it can be used directly
as an association type. In my example, above, would "language"
between the most appropriate subject for typing the association
between a resource and a language? Surely "has-language" would
be more appropriate, with "language" being used to type topics
that represent the corresponding facet values? What this means
is that while it is clear that facet triples should become
associations, it is not necessarily clear that the facet types
should simply be used to type those associations.

Comments?

Steve