[sc34wg3] The interpretation of facets

Sun, 27 Apr 2003 13:55:55 +0100

Now to tackle Steve's long response re facets (you should read my response
on mnemonics first as this missive relies on some of the points made in
that.)

Again, I'll start by commenting on something that is at the end of the
missive first, as that is most germaine to the discussion.

Steve wrote:

>    -- According to N391 --
>
>    <topic id="language"/>
>
>    <topic id="norwegian">
>      <instanceOf><topicRef xlink:href="#language"/></instanceOf>
>      <occurrence>
>        <resourceData>puccini.htm</resourceData>
>      </occurrence>
>    </topic>

and

>  -- My proposal --
>   <topic id="language"/>
>   <topic id="norwegian"/>
>   <topic id="puccini-resource">
>     <subjectIdentity>
>     <resourceRef xlink:href="puccini.htm"/>
>     </subjectIdentity>
>   </topic>

>   <association>
>     <instanceOf><topicRef xlink:href="#language"/></instanceOf>
>     <member>
>       <topicRef xlink:href="#puccini-resource"/>
>     </member>
>     <member>
>       <topicRef xlink:href="#norwegian"/>
>     </member>
>   </association>

Steve is proposing that we use associations, and I have proposed that
locations within facet values be treated as occurrences.

The reason for my decision is based on a comment that occurs in a note
associated with the type attribute of the fvalue element. This reads:
"A facet value type topic might exist by virtue of the fact that this fvalue
element is an occurrence (where the occurrence role means 'instance') of a
topic whose subject is the significance of the facet value name"

Taking this as gospel I proposed the creation of "a topic whose subject is
the significance of the facet value name" where the locations were stored as
occurrences of the value in specific resources. To me this is what facets
have always been: a way of adding property name/value pairs to occurrences
of resources and seems a natural interpretation of the standard.

Now let me get on to Steve's other points and again I will only comment on
those parts I disagree with, and will cut out the stuff that I agree with,
or which is irrelevant to the argument to help reduce the size of this
response.

> Thank you for your response to my posting. Unfortunately you
> misunderstood a central part of my proposal, so your response
> was a little off-target. The misunderstanding was partly my
> fault for not being as clear and explicit as I could have been,
> and for that I apologize.

There is no need to apologize. I realised you were mixing locator, location
and location address, and that you were failing to consider the
relationships between facet and facet value properly, but decided I didn't
have time to handle that part of the issue yesterday. Today I realize I will
have to tackle these problems head on.

> I expressed my basic premise as follows:
>
> Steve P:
>  > I believe that each (locator,facet,fvalue) triple should, in
>  > principle, give rise to an association whose type reflects the
>  > facet and where the role playing topics reflect the locator and
>  > the facet value.

Before going any further, let me give my take on the problem of your use of
the term locator in the above. Let me start by reminding you that HyTM is
based on HyTime, which has a "location addressing module". In this we
clearly distinguised between the "address" used to identify a location, and
the "location" itself. For example, www.is-thought.co.uk and
www.sgml.u-net.com for part of last year both pointed to the same location.
Today they point to different locations. What HyTime is based on is the
relationships between addresses, not on the relationship between locations.

SAM, on the other hand, introduces the concept of locator to ISO 13250 for
the first time. It is defined as "a string that references one or more
information resources". It is further stated that "Locators are always
expressed in some notation, which defines their formal syntax and
interpretation. The definition of locator notation is outside the scope of
this Technical Specification."

[Aside: Interesting to see that you still used Technical Specification here.
This is one of the places where the CD text needs to be brought in line with
the decision to publish a multipart standard.]

SAM also introduces "locator items" that "represent locators" These have two
properties, one of which records the notation being used and the other of
which records a string "whose interpretation and syntax is governed by the
value of the [notation] property".

Wherever locators are referenced by another item in SAM the definition given
is "a set of locator items". This implies that it is a set that is made up
of pairs of notation/reference values in such a way that no individually
duplicated notations or references are removed, but that if one or more of
the things being located have the same values for both notation and
reference it will be removed (as it is a set). I find this definition
difficult to swallow personally, but I have to presume Lars Marius et al
know what they are doing.

[Aside. My own take on it is that what the other properties should hold is a
list of references to locator items, but I'm not going to be around to argue
the case for this on Saturday as I must convene the DSDL meeting then.]

It is vital that we are very specific in our use of terms. Facets refer to
addresses of locations, not to locators created by these addresses.

> To which you replied:
>
> Martin B:
>  > The naming of facets is much more complicated than the
>  > simplistic model proposed by Steve implies.
>
> I don't dispute this, but...
>
>  > The facet name may be defined in another topic, which has
>  > multiple names...
>
> Do you see this as being a problem? I don't think it is.

It is a  serious problem. The problem is that the value of the facet is
either a token (string without spaces) or the unique identifier of the
referenced topic. Because I cannot guarantee the uniqueness of the token in
the way I can the uniqueness of the ID, if I have to create a topic to
represent a specific facet/value pair it is safer to do this if I have
facets with IDs than those with tokens as values. But if I am relying on
software to generate the ids of newly created topics then I have to ensure
that these generated IDs are both unique and suitable to identify the facet
aspect of the name/value pair.

> For the time being, all I want to do is establish the basic
> premise that
>
> (1) facets are about assigning property-value pairs to
> information resources;
>
> (2) the same functionality can in principle (if not in every
> detail) be accomplished through binary associations;

And by assigning occurrences to topics created to represent the facets, as
stated in 13250.

> (3) in such an association, one topic would represent the
> information resource, one topic would represent the facet value,
> and the association type would represent the property (in the
> sense property class, not property instance, i.e., the facet
> itself).
>
> Apparently you misunderstood this, since you wrote
>
> Martin B:
>  > It is the fvalue name, not the facet name, that needs to be
>  > linked to the location(s) identified by the contents of the
>  > fvalue element...

I did not misunderstand you so much as disagree with you. You want the
association type to be controlled by the facet type (whether this is
specified by a topic referenced in the type attribute or by a GI or facetval
mnemonic.) But this makes it hard to determine the set of values associated
with a specific facet name. I was pointing out that it was the value that
was associated with the location, not its property.

> Let's now complete the example by turning the facet shown above
> into XTM. We need three topics: one for the property
> ("language"), one for the property value ("norwegian"), and one
> for the resource. And we need one association, expressing the
> same relationship that is expressed by the facet element.

I disagree fundamentally regarding the property value being a separate
topic. The same value may apply to a number of facets. For example, the
value 100 may apply to a "quantity" property and to an "age" property. The
"meaning" of the value, or its occurrences, should not depend only on the
value. It is the name/value pair that determines the topic being discussed.
In your case it is language-norwegian that is the topic. In mine it is
quantity-100 and age-100.

> Here is the exact same information contained in the <facet>
> element above shown in its most verbose XTM form (a less verbose
> form is possible, but slightly less self-explanatory):
>
>    -- Example 1 --
>
>    <topic id="language"/>
>
>    <topic id="norwegian"/>
>
>    <topic id="puccini-resource">
>      <subjectIdentity>
>        <resourceRef xlink:href="puccini.htm"/>
>      </subjectIdentity>
>    </topic>
>
>    <association>
>      <instanceOf><topicRef xlink:href="#language"/></instanceOf>
>      <member>
>        <topicRef xlink:href="#puccini-resource"/>
>      </member>
>      <member>
>        <topicRef xlink:href="#norwegian"/>
>      </member>
>    </association>
>
> The association is between the resource whose locator is the
> content of the <fvalue> element ("puccini.htm") and the topic
> that represents the facet value ("norwegian").
>
> So far I think we are in agreement.[1] If you can forget the
> issue of mnemonics for the time being (only for the time being,
> I promise!), do you agree that this is the only correct way,
> within the terms of the basic Topic Maps model,[2] to represent
> the information conveyed by my <facet> element?

MOST DECIDEDLY IT IS NOT "the only correct way".
As mentioned above, an equally correct way, if not a more correct way, to
interpret the existing text of 13250 is to use occurrences of a topic that
is named language-norwegian.

>  > At the present
>  > moment N391 treats the location as a string, and then makes
>  > this the value of a SAM Occurrence item. The reason for doing
>  > this is simply to allow for matching of locations identified
>  > by facets with those used by topics, which have the same
>  > structure. Basically what facets do is identify occurrences of
>  > resources to which a specific property value should be
>  > applied, in the same way that occurrence elements identify
>  > resources that cover a particular topic within a specific
>  > role.
>
> This seems very confused to me, partly because I'm not quite
> sure how you are using the terminology. You keep talking about
> "location" and I'm never quite sure if you mean "locator" or
> "resource".

Hopefully this has now been clarified above. I was using "resource" in the
formal W3C role that applies to any XML document.

>The first mention in this paragraph ("treats the
> location as a string") seems to have the sense of "locator".

No, it was a reference to the fact that HyTime location address is defined
as a string, and the fact that, because SAM locators are defined as strings
as well, it seems natural that the location address should be recorded
during deserialization as a locator.

> If
> that is the case, why treat it as a string and not as a locator?
> At the very least, if you insist on deserializing to an
> occurrence item, you should use the [resource] property, not the
> [value] property.

I used [value] property for the occurrence item because its definition is
"The string, is present, is the information resource the occurrence connects
with the subject" whereas [resource] is defined as "The locator, if set, is
a reference to the information resource the occurrence connests with the
subject." I was deliberately not requiring the creation of a locator, only
the recording of the location address string.

> However, I think deserializing to an occurrence item in this
> manner is inappropriate.

You're entitled to that opinion: I'm entitled to disagree :-)

> It certainly does not accord with the
> basic approach I have outlined above, of using associations, and
> it does not even accord with a potential variant of that
> approach in which occurrences are used as a special form of
> association.
>
> <digression subject="using occurrences instead of associations">
> ----------------------------------------------------------------
>
> Before going any further, let me try out this variant approach
> of using occurrences instead of associations.
>
> We have agreed that we want to express a relationship between
> "puccini.htm" (the resource) and the language "norwegian", and
> that the nature of the relationship is "language", right?

No. You want to express an associaton between the property
'language="norwegian"' and the location address "puccini.htm". That is what
facets do: associate property name/value pairs with location addresses.

> In my first XTM example, the resource "puccini.htm" and the
> topic "norwegian" are two role players in an association of type
> "language". Let's try expressing this association as an
> occurrence instead.
>
> For this to work, one of the role players has to be a topic
> which has an occurrence (whose type corresponds to the
> association type); the resource that the occurrence relates to
> that topic corresponds to the other role-playing topic.
>
> Since the facet essentially assigns a characteristic (i.e., a
> property/value pair) to the resource "puccini.htm", one's first
> thought might be to let the topic represent the resource
> "puccini.htm", as follows:
>
>    -- Example 2 --
>
>    <topic id="puccini-resource">
>      <subjectIdentity>
>        <resourceRef xlink:href="puccini.htm"/>
>      </subjectIdentity>
>      <occurrence>
>        <instanceOf>
>          <topicRef xlink:href="#language"/>
>        </instanceOf>
>        <!-- so far, so good, but what goes here? -->
>      </occurrence>
>    </topic>
>
> As can be seen, this leads to a problem: How do we get the value
> "norwegian" into the occurrence? The answer is that we can't,
> because an occurrence relates a topic to a resource, but
> "norwegian" is not a resource, it is non-addressable subject.

Again, because you're forcing an unnatural separation of the pair.

> What are the alternatives for the place where I have inserted
> the comment in the syntax example? Here are the ones I can think
> of:
>
> 1) <resourceData>norwegian</resourceData>
>
>     This is unacceptable because "norwegian" is not just a
>     string, it's a topic. (Remember, it came from a 'type'
>     attribute on the <fvalue> element.)
>
> 2) <resourceRef xlink:href="#norwegian"/>
>
>     Also unacceptable, because it creates a relationship with the
>     *topic element* (viewed as a resource) whose ID is
>     "norwegian", rather than the topic itself. To create a
>     relationship with the topic, we would need to use either
>     <topicRef> or <subjectIndicatorRef>, but you can't do that
>     with <occurrence> elements.
>
> So this approach to using occurrences doesn't work.

Only because you insist on seperating the value from the facet Steve. Come
on, time for a rethink methinks.

> The alternative is to say that since one of the things we want
> to relate together *is* a resource (namely, "puccini.htm"),
> let's use that as the occurrence and have "norwegian" be the
> topic:
>
>    -- Example 3 --
>
>    <topic id="norwegian">
>      <occurrence>
>        <instanceOf>
>          <topicRef xlink:href="#language"/>
>        </instanceOf>
>        <resourceRef xlink:href="puccini.htm"/>
>      </occurrence>
>    </topic>
>
> This works, after a fashion (that is, syntactically). Whether it
> makes a lot of sense semantically is another matter. Do we
> really want to state that the resource "puccini.htm" is an
> occurrence of the topic "norwegian"? I have serious doubts.
> Occurrences are typically "information that is pertinent to a
> subject": Do we really want to regard every resource written in
> a particular language as being "pertinent" to that language?
>
> I don't think so. In which case, this approach to using
> occurrences to represent facets doesn't make sense either.
>
> </digression>
>
> Having looked at how occurrences might be used instead of
> associations to represent facet triples, let us try and figure
> out how the approach in N391 would like as XTM.

Please don't mislead people into thinking that N391 has anything whatsoever
to do with XTM. It simply states how to deserialize HyTM into SAM. The
problems of getting from there to XTM have deliberately been ignored for the
time being.

> In 3.12 it states: "Each HyTM facet conformant element requires
> the creation of a new SAM Topic item that can be associated with
> each of the values assigned to the facet by fvalue conformant
> elements."
>
> What does this mean in the context of my example?

It means that there is a topic for the facet name as well as for each pair
of name/values so that users can navigate easily from the topic language to
the value language="norwegian".

> You want to create a topic for "language" (i.e., the facet).
> Fair enough. That's what I did in my first XTM example above:
>
>    <topic id="language"/>
>
> Now you want to "associate" this with... what? "Each of the
> values assigned to the facet." I can only take this to mean the
> value "norwegian". Is that what you mean?

No, for the reasons given above.

>That you want to
> create an association between "language" and "norwegian"? So it
> seems. And the type of that association is type-instance.

Yes.

> This does in fact make sense, up to a point. You are assuming
> that whenever you have property-value pairs represented using
> facets, there exists a type-instance relationship between the
> property and the value. That certainly is the case in my
> example: "norwegian" could be said to be an instance of
> "language", if one assumes the normal semantics associated with
> those labels. (Whether this is always the case, I don't yet
> know.)
>
> What we have now is the following:
>
>    -- Example 4 --
>
>    <topic id="language"/>
>
>    <topic id="norwegian">
>      <instanceOf><topicRef xlink:href="#language"/></instanceOf>
>    </topic>
>
> What now with the <fvalue> elements?
>
> In 3.13 it states: "Each HyTM fvalue conformant element requires
> the creation of a SAM Topic item to represent the value being
> assigned to its parent facet. The locations to which the value
> is assigned become Occurrence items for that that topic."
>
> I'm not happy with the phrase "the value being assigned to its
> parent facet". Assignment in 13250 is used in the sense "topic
> characteristic assignment". Is the value ("norwegian") being
> assigned as a characteristic of "language"? Not really. What you
> are really doing is creating an association between the two, and
> that is subtly different. But never mind. The wording can be
> fixed.
>
> The "meat" of 3.13 is that an internal occurrence should be
> created for the topic "norwegian", if I understand correctly,
> and the locator that is the content of the <fvalue> element
> ("puccini.htm") is turned into a string which is the value of
> that occurrence, as follows:
>
>    -- Example 5 --
>
>    <topic id="norwegian">
>      <occurrence>
>        <resourceData>puccini.htm</resourceData>
>      </occurrence>
>    </topic>

Not that I'm up on XTM, but I would have something along the lines

    <topic id="~facet-language-value-norwegian">
      <occurrence>
        <resourceData>puccini.htm</resourceData>
      </occurrence>
    </topic>

> Putting Examples 4 and 5 together, this becomes the following:
>
>    -- Example 6 --
>
>    <topic id="language"/>
>
>    <topic id="norwegian">
>      <instanceOf><topicRef xlink:href="#language"/></instanceOf>
>      <occurrence>
>        <resourceData>puccini.htm</resourceData>
>      </occurrence>
>    </topic>
>
> ...which is similar to my suggested possible use of occurrences
> in Example 3, above, with the following differences:
>
> (1) The "language" aspect (the facet property) is the class of
> the facet value, rather than the type of the occurrence.
>
> (2) The locator "puccini.htm" is now just a string.
>
> I certainly think (2) is a problem, and to my mind, (1) is less
> appropriate than the solution I have proposed.

I like Example 6 because I do not see the string as a SAM locator but as a
HyTime location address. The only problem I have is with the ID, which is
bound to clash with any other facet that has the value of norwegian.

> We come now to the last two paragraphs of 3.13. Let's see if I
> can understand them:
>
> "If the type attribute contains a valid reference to a topic
> within the current topic map the rules defined in Clause 5.1 of
> [SAM] are applied to create an association between the facet
> value/pair and the topic identified within the attribute value."
>
> The topic identified within the attribute value is "norwegian",
> right?

No. It is one whose ID is ~facet-Ianguage-value-norwegian as clearly
specified in the second para of 3.13

> So you want to create an association between the topic
> "norwegian" and the "facet value/pair". What is that? I will
> assume what you actually mean is the topic that represents the
> facet (i.e., "language"), since you make it clear that the
> association to be created is of type "type-instance". So this
> paragraph is basically telling us to do the same thing that 3.12
> has already described and that is shown in Example 4.

No. Again you misinterpret the text of N391.

> The final paragraph of 3.13 starts as follows:
>
> "For each fvalue conformant element within a specific facet
> conformant element a SAM Association element is created to link
> the facet property naming Topic item to the facet value Topic
> item."
>
> At first glance, this appears seems to mean that you want to
> create *another* association between "language" and "norwegian".
> Can that be true?

Most definitely. I want to map each facet/value pair back to the facet name.

> It should be of type "facet-value" and the
> role types should be "facet" and "value" respectively. [I'm
> using tokens here as shorthand for the PSIs N391 requires.]

Agreed

> In XTM this would be as follows:
>
>    -- Example 7 --
>
>    <association>
>      <instanceOf>
>        <subjectIndicatorRef
>          xlink:href="http://psi.topicmaps.org/hytm/1.0/#facet-value"/>
>        </subjectIndicatorRef>
>      </instanceOf>
>      <member>
>        <roleSpec>
>          <subjectIndicatorRef
>            xlink:href="http://psi.topicmaps.org/hytm/1.0/#facet"/>
>          </subjectIndicatorRef>
>        </roleSpec>
>        <topicRef xlink:href="#language"/>
>      </member>
>      <member>
>        <roleSpec>
>          <subjectIndicatorRef
>            xlink:href="http://psi.topicmaps.org/hytm/1.0/#value"/>
>          </subjectIndicatorRef>
>        </roleSpec>
>        <topicRef xlink:href="#norwegian"/>
>      </member>
>    </association>
>
> Now we have an association between the "facet property naming
> Topic" and the "facet value Topic", which is what the first two
> sentences of the last paragraph of 3.13 seem to require. The
> association type and association role types are as described
> ("facet-value", "facet", and "value").

Apart from the forcing of the SAM deserialization into XTM syntax and the
misnaming of the topicRef for the language/norwegian pair this seems about
right.

> However, a closer reading of the rest of this paragraph reveals
> that the role playing topics should not in fact be "language"
> and "norwegian", but rather topics that represent the respective
> mnemonics ... which my <facet> element example does not even
> have.

Exactly

> I don't know what you mean by topics for "topics for
> facet-property-name/facet-property-value pairs". Perhaps there
> is something I have missed in my interpretation of N391, or do
> you just mean that you forced the creation of a topic to
> represent the facet ("language") and another to represent the
> facet value ("norwegian")? If so, I wouldn't regard this as
> being forced to do anything at all: Those were already topics
> (because they were the values of 'type' attributes).

Please please please reread the second para of 3.13 again (and again and
again)

>  > I have always
>  > objected against needing to create topics specifically to
>  > record the names of facets and the values assigned to them.
>
> You must agree that in the case of my example, you have no
> choice.

I agree that SAM and XTM give me no choice. I would have a choice if I had
access to a model that allowed the recording of facets properly.

> I have used 'type' attributes, therefore I am, by
> definition and without any shadow of a doubt, on the basis of
> ISO 13250:2000, referring to topics. The only situation where
> your objection makes sense is when handling mnemonics. Do we
> agree on that point? (If so, let the discussion rest until I
> have had time to write that other posting.)

Partially. Facets are yet another case where the use of the type concept to
force the definition to a topic does not make sense. In fact, in this case,
it makes the exact opposite of sense because the whole idea of facets is to
allow the specification of properties that are not appropriate for
definition as scopes or associations.

> Martin B. concludes:
>  > The proposals in N391 are anethema to me, but they are the
>  > only way I can see of applying the current set of SAM
>  > information items to the recording of facets that allows all
>  > possible uses of facets to be covered.
>
> I agree that the way N391 proposes to use occurrences is not
> good. At the very least, locators should remain locators. I do
> think that my association-based approach is preferable...
> assuming that it can be extended to handle mnemonics (about
> which, more elsewhere).
>
> What do you think, Martin?

As stated above, the original thinking of the editors of 13250, as clearly
stated in the note in the HyTM definition of fvalue, is that facets are
applied to occurrences.

> And what do the other editors think about this?

Good question.

Martin Bryan