[sc34wg3] The interpretation of facets

Sat, 26 Apr 2003 21:39:57 +0200

Martin,

Thank you for your response to my posting. Unfortunately you
misunderstood a central part of my proposal, so your response
was a little off-target. The misunderstanding was partly my
fault for not being as clear and explicit as I could have been,
and for that I apologize.

I have now spent a lot of time formulating this reply, in order
to avoid further misunderstandings. I hope it can contribute to
making the HyTM discussions in London more productive.

To everyone else: This posting is *long*. If you feel that life
is too short for understanding facets as well as everything else
in 13250, please skip it. I will understand. However, I would
appreciate it if those who were originally involved in designing
13250 - especially the other two editors - would take this issue
seriously, because I feel we owe it to the topic map community
to clear up the confusion we have created.

I expressed my basic premise as follows:

Steve P:
 > I believe that each (locator,facet,fvalue) triple should, in
 > principle, give rise to an association whose type reflects the
 > facet and where the role playing topics reflect the locator and
 > the facet value.

To which you replied:

Martin B:
 > The naming of facets is much more complicated than the
 > simplistic model proposed by Steve implies.

I don't dispute this, but...

 > The facet name may be defined in another topic, which has
 > multiple names...

Do you see this as being a problem? I don't think it is.

 > If it is not specified using a reference to a topic the name
 > can be defined either as the value of the linktype attribute
 > or as the name of the generic identifier of the element. For
 > these to be used as part of an association they have to be
 > turned into addressable topics first.

This is what I call the "mnemonics" issue, which I mentioned in
my original posting as one of two unresolved issues:

Steve P:
 > (1) If the nature of the property and value are not specified
 > via 'type' attributes, but rather via 'linktype' and
 > 'facetval' attributes respectively (or even by GIs), then it
 > could be argued that they should not be represented by topics,
 > but simply by strings or tokens...

The reason I didn't go into more detail on this is because I
think we need to resolve the more general issue of mnemonics
first, before applying its resolution to facets. (I guess I
should start another thread on this issue.)

For the time being, all I want to do is establish the basic
premise that

(1) facets are about assigning property-value pairs to
information resources;

(2) the same functionality can in principle (if not in every
detail) be accomplished through binary associations;

(3) in such an association, one topic would represent the
information resource, one topic would represent the facet value,
and the association type would represent the property (in the
sense property class, not property instance, i.e., the facet
itself).

Apparently you misunderstood this, since you wrote

Martin B:
 > It is the fvalue name, not the facet name, that needs to be
 > linked to the location(s) identified by the contents of the
 > fvalue element...

as if you believed I meant something different. Perhaps I was
not as clear as I could have been. I tried to explain this in
terms of RDF and kind of assumed a common understanding of how
to map an RDF statement to an association. Let me try the
example once more, to prove that we are in 100% agreement on
this matter at least!

Here is the original explanation and example:

 > A facet element contains a set of fvalue elements. Each fvalue
 > element can be regarded as a set of triples, each of which
 > corresponds exactly to a simple RDF statement. An fvalue element
 > gives rise to one triple for each locator it contains. In RDF
 > terms, each triple is composed as follows:
 >
 >   subject:    locator
 >   predicate:  facet (type)
 >   object:     facet value (type)
 >
 > Here is a simple facet element, taken from the Italian Opera
 > topic map:
 >
 >   <facet type="language">
 >     <fvalue type="norwegian">puccini.htm</fvalue>
 >   </facet>
 >
 >
 > This states that the information resource shown has a "language"
 > property whose value is "norwegian". In this case (because of
 > the use of 'type' attributes) we know that both "language" and
 > "norwegian" are (references to) topics.

(Note that once again I am deliberately simplifying and assuming
the use of 'type' attributes in order to avoid the extra
complication of mnemonics and GIs.)

Let's now complete the example by turning the facet shown above
into XTM. We need three topics: one for the property
("language"), one for the property value ("norwegian"), and one
for the resource. And we need one association, expressing the
same relationship that is expressed by the facet element.

(Note that on the basis of the facet element shown above we know
*nothing* about any of these topics except the subject address
of the resource and the nature of the relationship that the
facet is expressing. Note also that the IDs of two of these
topics are a given, whereas the ID of the third one has to be
generated.)

Here is the exact same information contained in the <facet>
element above shown in its most verbose XTM form (a less verbose
form is possible, but slightly less self-explanatory):

   -- Example 1 --

   <topic id="language"/>

   <topic id="norwegian"/>

   <topic id="puccini-resource">
     <subjectIdentity>
       <resourceRef xlink:href="puccini.htm"/>
     </subjectIdentity>
   </topic>

   <association>
     <instanceOf><topicRef xlink:href="#language"/></instanceOf>
     <member>
       <topicRef xlink:href="#puccini-resource"/>
     </member>
     <member>
       <topicRef xlink:href="#norwegian"/>
     </member>
   </association>

The association is between the resource whose locator is the
content of the <fvalue> element ("puccini.htm") and the topic
that represents the facet value ("norwegian").

So far I think we are in agreement.[1] If you can forget the
issue of mnemonics for the time being (only for the time being,
I promise!), do you agree that this is the only correct way,
within the terms of the basic Topic Maps model,[2] to represent
the information conveyed by my <facet> element?

I hope so, because then we can move on to address the other
issues, as follows:

Martin B:
 > Once again the fvalue name can be identified
 > in three ways: by pointing to a topic with multiple names, by
 > entering a specific value for the facetval attribute or by
 > using the name of the generic element of the conforming
 > element as the default value name. Again the last two need to
 > be defined as topics before they can form part of an
 > association.

Again, this is the issue of mnemonics, which I will discuss in a
separate thread.

 > As for associating a SAM locator with the facet value (which
 > is what Steve seems to be proposing in his triple), this is
 > possible only if the thing being located is part of a topic
 > map.

As I hope I have now made clear, I am not proposing to associate
a SAM locator with the facet value. The facet value becomes a
topic. (An RDF "subject-predicate-object" triple maps to an
association of the form "predicate(subject, object)".[3])

 > There is no requirement that the location defined in a
 > facet value be within the topic map. A topic map can assign
 > properties to a resource without that resource being an
 > element in a topic map. Like RDF, it is a simple mechanism for
 > association property/value pairs to resources. A topic map can
 > validly contain just facet definitions. In other words it can
 > simply be used, like any RDF resource, to assign metadata to
 > properties. So, for example, I could use facets to distinguish
 > the set of documents at a particular website that were in
 > English from those that were in French. (Remember that topic
 > maps preceded RDF: at the time we did this topic maps were the
 > only way of doing this.)

I agree with all of this. You went to the trouble of writing it
because you misunderstood what I was proposing. I'm sorry I
wasn't clearer in my original posting.

 > So the question comes: should facet-property-name/
 > facet-property-value pairs associated with locations within
 > topic maps be treated differently from those associated with
 > locations that do not refer to topic maps?

May I rephrase this to be sure that I understand you correctly?

If you are asking, does it make any difference what kind of
resource the property-value pair is assigned to (one internal to
the topic map - say, an occurrence element - or something
external to the topic map), then I think the answer should be
'no'.

 > At the present
 > moment N391 treats the location as a string, and then makes
 > this the value of a SAM Occurrence item. The reason for doing
 > this is simply to allow for matching of locations identified
 > by facets with those used by topics, which have the same
 > structure. Basically what facets do is identify occurrences of
 > resources to which a specific property value should be
 > applied, in the same way that occurrence elements identify
 > resources that cover a particular topic within a specific
 > role.

This seems very confused to me, partly because I'm not quite
sure how you are using the terminology. You keep talking about
"location" and I'm never quite sure if you mean "locator" or
"resource". The first mention in this paragraph ("treats the
location as a string") seems to have the sense of "locator". If
that is the case, why treat it as a string and not as a locator?
At the very least, if you insist on deserializing to an
occurrence item, you should use the [resource] property, not the
[value] property.

However, I think deserializing to an occurrence item in this
manner is inappropriate. It certainly does not accord with the
basic approach I have outlined above, of using associations, and
it does not even accord with a potential variant of that
approach in which occurrences are used as a special form of
association.

<digression subject="using occurrences instead of associations">
----------------------------------------------------------------

Before going any further, let me try out this variant approach
of using occurrences instead of associations.

We have agreed that we want to express a relationship between
"puccini.htm" (the resource) and the language "norwegian", and
that the nature of the relationship is "language", right?

In my first XTM example, the resource "puccini.htm" and the
topic "norwegian" are two role players in an association of type
"language". Let's try expressing this association as an
occurrence instead.

For this to work, one of the role players has to be a topic
which has an occurrence (whose type corresponds to the
association type); the resource that the occurrence relates to
that topic corresponds to the other role-playing topic.

Since the facet essentially assigns a characteristic (i.e., a
property/value pair) to the resource "puccini.htm", one's first
thought might be to let the topic represent the resource
"puccini.htm", as follows:

   -- Example 2 --

   <topic id="puccini-resource">
     <subjectIdentity>
       <resourceRef xlink:href="puccini.htm"/>
     </subjectIdentity>
     <occurrence>
       <instanceOf>
         <topicRef xlink:href="#language"/>
       </instanceOf>
       <!-- so far, so good, but what goes here? -->
     </occurrence>
   </topic>

As can be seen, this leads to a problem: How do we get the value
"norwegian" into the occurrence? The answer is that we can't,
because an occurrence relates a topic to a resource, but
"norwegian" is not a resource, it is non-addressable subject.

What are the alternatives for the place where I have inserted
the comment in the syntax example? Here are the ones I can think
of:

1) <resourceData>norwegian</resourceData>

    This is unacceptable because "norwegian" is not just a
    string, it's a topic. (Remember, it came from a 'type'
    attribute on the <fvalue> element.)

2) <resourceRef xlink:href="#norwegian"/>

    Also unacceptable, because it creates a relationship with the
    *topic element* (viewed as a resource) whose ID is
    "norwegian", rather than the topic itself. To create a
    relationship with the topic, we would need to use either
    <topicRef> or <subjectIndicatorRef>, but you can't do that
    with <occurrence> elements.

So this approach to using occurrences doesn't work.

The alternative is to say that since one of the things we want
to relate together *is* a resource (namely, "puccini.htm"),
let's use that as the occurrence and have "norwegian" be the
topic:

   -- Example 3 --

   <topic id="norwegian">
     <occurrence>
       <instanceOf>
         <topicRef xlink:href="#language"/>
       </instanceOf>
       <resourceRef xlink:href="puccini.htm"/>
     </occurrence>
   </topic>

This works, after a fashion (that is, syntactically). Whether it
makes a lot of sense semantically is another matter. Do we
really want to state that the resource "puccini.htm" is an
occurrence of the topic "norwegian"? I have serious doubts.
Occurrences are typically "information that is pertinent to a
subject": Do we really want to regard every resource written in
a particular language as being "pertinent" to that language?

I don't think so. In which case, this approach to using
occurrences to represent facets doesn't make sense either.

</digression>

Having looked at how occurrences might be used instead of
associations to represent facet triples, let us try and figure
out how the approach in N391 would like as XTM.

In 3.12 it states: "Each HyTM facet conformant element requires
the creation of a new SAM Topic item that can be associated with
each of the values assigned to the facet by fvalue conformant
elements."

What does this mean in the context of my example?

You want to create a topic for "language" (i.e., the facet).
Fair enough. That's what I did in my first XTM example above:

   <topic id="language"/>

Now you want to "associate" this with... what? "Each of the
values assigned to the facet." I can only take this to mean the
value "norwegian". Is that what you mean? That you want to
create an association between "language" and "norwegian"? So it
seems. And the type of that association is type-instance.

This does in fact make sense, up to a point. You are assuming
that whenever you have property-value pairs represented using
facets, there exists a type-instance relationship between the
property and the value. That certainly is the case in my
example: "norwegian" could be said to be an instance of
"language", if one assumes the normal semantics associated with
those labels. (Whether this is always the case, I don't yet
know.)

What we have now is the following:

   -- Example 4 --

   <topic id="language"/>

   <topic id="norwegian">
     <instanceOf><topicRef xlink:href="#language"/></instanceOf>
   </topic>

What now with the <fvalue> elements?

In 3.13 it states: "Each HyTM fvalue conformant element requires
the creation of a SAM Topic item to represent the value being
assigned to its parent facet. The locations to which the value
is assigned become Occurrence items for that that topic."

I'm not happy with the phrase "the value being assigned to its
parent facet". Assignment in 13250 is used in the sense "topic
characteristic assignment". Is the value ("norwegian") being
assigned as a characteristic of "language"? Not really. What you
are really doing is creating an association between the two, and
that is subtly different. But never mind. The wording can be
fixed.

The "meat" of 3.13 is that an internal occurrence should be
created for the topic "norwegian", if I understand correctly,
and the locator that is the content of the <fvalue> element
("puccini.htm") is turned into a string which is the value of
that occurrence, as follows:

   -- Example 5 --

   <topic id="norwegian">
     <occurrence>
       <resourceData>puccini.htm</resourceData>
     </occurrence>
   </topic>

Putting Examples 4 and 5 together, this becomes the following:

   -- Example 6 --

   <topic id="language"/>

   <topic id="norwegian">
     <instanceOf><topicRef xlink:href="#language"/></instanceOf>
     <occurrence>
       <resourceData>puccini.htm</resourceData>
     </occurrence>
   </topic>

...which is similar to my suggested possible use of occurrences
in Example 3, above, with the following differences:

(1) The "language" aspect (the facet property) is the class of
the facet value, rather than the type of the occurrence.

(2) The locator "puccini.htm" is now just a string.

I certainly think (2) is a problem, and to my mind, (1) is less
appropriate than the solution I have proposed.

We come now to the last two paragraphs of 3.13. Let's see if I
can understand them:

"If the type attribute contains a valid reference to a topic
within the current topic map the rules defined in Clause 5.1 of
[SAM] are applied to create an association between the facet
value/pair and the topic identified within the attribute value."

The topic identified within the attribute value is "norwegian",
right? So you want to create an association between the topic
"norwegian" and the "facet value/pair". What is that? I will
assume what you actually mean is the topic that represents the
facet (i.e., "language"), since you make it clear that the
association to be created is of type "type-instance". So this
paragraph is basically telling us to do the same thing that 3.12
has already described and that is shown in Example 4.

The final paragraph of 3.13 starts as follows:

"For each fvalue conformant element within a specific facet
conformant element a SAM Association element is created to link
the facet property naming Topic item to the facet value Topic
item."

At first glance, this appears seems to mean that you want to
create *another* association between "language" and "norwegian".
Can that be true? It should be of type "facet-value" and the
role types should be "facet" and "value" respectively. [I'm
using tokens here as shorthand for the PSIs N391 requires.]

In XTM this would be as follows:

   -- Example 7 --

   <association>
     <instanceOf>
       <subjectIndicatorRef
         xlink:href="http://psi.topicmaps.org/hytm/1.0/#facet-value"/>
       </subjectIndicatorRef>
     </instanceOf>
     <member>
       <roleSpec>
         <subjectIndicatorRef
           xlink:href="http://psi.topicmaps.org/hytm/1.0/#facet"/>
         </subjectIndicatorRef>
       </roleSpec>
       <topicRef xlink:href="#language"/>
     </member>
     <member>
       <roleSpec>
         <subjectIndicatorRef
           xlink:href="http://psi.topicmaps.org/hytm/1.0/#value"/>
         </subjectIndicatorRef>
       </roleSpec>
       <topicRef xlink:href="#norwegian"/>
     </member>
   </association>

Now we have an association between the "facet property naming
Topic" and the "facet value Topic", which is what the first two
sentences of the last paragraph of 3.13 seem to require. The
association type and association role types are as described
("facet-value", "facet", and "value").

However, a closer reading of the rest of this paragraph reveals
that the role playing topics should not in fact be "language"
and "norwegian", but rather topics that represent the respective
mnemonics ... which my <facet> element example does not even
have.

As I see it, there are several problems here:

(1) N391 requires me to create an association between role
players that do not exist (since my example does not use
mnemonics).

(2) There is an inconsistency between the first sentence of this
paragraph on the one hand, and the two sentences beginning "The
[role player] property...", regarding what the role players
should be.

(3) Given that any mnemonics (if they existed) would essentially
be providing an alternative to the type attributes "language"
and "norwegian", it seems wierd to have two almost parallel
associations (the one shown in Example 7, and the one implied by
the <instanceOf> element in Example 6).

The last paragraph of 3.13 obviously needs to be clarified.
Since its main purpose seems to be to handle mnemonics, I won't
bother too much about it here. (See my promised mnemonics
posting for that.) Let me conclude these by comparing the N391
approach with the one I have advocated:

   -- My proposal --

   <topic id="language"/>

   <topic id="norwegian"/>

   <topic id="puccini-resource">
     <subjectIdentity>
       <resourceRef xlink:href="puccini.htm"/>
     </subjectIdentity>
   </topic>

   <association>
     <instanceOf><topicRef xlink:href="#language"/></instanceOf>
     <member>
       <topicRef xlink:href="#puccini-resource"/>
     </member>
     <member>
       <topicRef xlink:href="#norwegian"/>
     </member>
   </association>

   -- According to N391 --

   <topic id="language"/>

   <topic id="norwegian">
     <instanceOf><topicRef xlink:href="#language"/></instanceOf>
     <occurrence>
       <resourceData>puccini.htm</resourceData>
     </occurrence>
   </topic>

Although my proposal is slightly more verbose, I believe it more
truly represents the essence of what the <facet> element in my
example is really about. It could be made slightly less verbose,
syntactically, without any loss of information, as follows:

   <topic id="language"/>

   <topic id="norwegian"/>

   <association>
     <instanceOf><topicRef xlink:href="#language"/></instanceOf>
     <member>
       <resourceRef xlink:href="puccini.htm"/>
     </member>
     <member>
       <topicRef xlink:href="#norwegian"/>
     </member>
   </association>

Martin B:

 > I do not see any other way in which we can create associations
 > with items that are not otherwise part of the topic map using
 > the existing set of components in the SAM. I feel it should
 > not be necessary to abuse the SAM Occurrence item in this way,
 > but given the SAM team's refusal to add a proper Facet item to
 > the model the only choice I had was to force the creation of
 > topics for facet-property-name/facet-property-value pairs and
 > to associate these with SAM Occurrence items.

I don't know what you mean by topics for "topics for
facet-property-name/facet-property-value pairs". Perhaps there
is something I have missed in my interpretation of N391, or do
you just mean that you forced the creation of a topic to
represent the facet ("language") and another to represent the
facet value ("norwegian")? If so, I wouldn't regard this as
being forced to do anything at all: Those were already topics
(because they were the values of 'type' attributes).

The example I have given provides a perfectly acceptable way of
representing the assignment of a property/value pair to any kind
of resource, whether external or internal to the topic map. I
don't see what the problem is, unless it is the following:

 > I have always
 > objected against needing to create topics specifically to
 > record the names of facets and the values assigned to them.

You must agree that in the case of my example, you have no
choice. I have used 'type' attributes, therefore I am, by
definition and without any shadow of a doubt, on the basis of
ISO 13250:2000, referring to topics. The only situation where
your objection makes sense is when handling mnemonics. Do we
agree on that point? (If so, let the discussion rest until I
have had time to write that other posting.)

Martin B. concludes:
 > The proposals in N391 are anethema to me, but they are the
 > only way I can see of applying the current set of SAM
 > information items to the recording of facets that allows all
 > possible uses of facets to be covered.

I agree that the way N391 proposes to use occurrences is not
good. At the very least, locators should remain locators. I do
think that my association-based approach is preferable...
assuming that it can be extended to handle mnemonics (about
which, more elsewhere).

What do you think, Martin?

And what do the other editors think about this?

Steve

[1] Transcribing the facet element into XTM like this does lay
bare another issue, however: What (if anything) are the role
types in this association?

[2] By the "Topic Maps model", I mean the basic model of using
topic characteristics to make assertions about the subjects that
topics represent.

[3] "predicate(subject, object)" is pseudo-LTM. The predicate is
the association type, and the subject and object are the role
playing topics.

P.S. If you are still with us, Michel, the length of this
posting (which I believe was necessary) should indicate why I
think 1.5 hours for the discussion on HyTM is far too little.