Section 4.3 Subject Identity RE: [sc34wg3] Strawman draft of ISO 13250-1

Murray Altheim
Wed, 05 Nov 2003 12:33:12 +0000

Bernard Vatant wrote:
> Steve, and all
> I guess you won't be surprised by the section on which I have comments :)
> In fact I am striken by the reduction happening in this section, which
> begins very very well, opening the door to a very good and generic
> definition of subject identity (although I would prefer subject
> identification, as explained below), and then restricts it practically to
> the use of subject locator and subject indicator - actually the mechanisms
> used by the Standard Data Model - although the Reference Model, through the
> notion of SIDP, offers the possibility of an unbounded number of other
> identification mechanisms for TM applications. What I propose below is an
> alternative wording that does not close the door to those. This is a first
> cut coming out of my breakfast coffee. Please fix my prose where needed.
> N446 : "Subject identity is a set of properties of a topic that enable
> applications (and humans) to know which subject the topic represents and,
> in particular, to know when two topics represent the same subject and must
> therefore be merged."
> I'm uneasy with this absolute definition of subject identity given by the
> first "is", because I don't think there is something like an absolute
> subject identity, only an interagreement on subject identification process.
> And more uneasy with the use of "know" in this context: We don't and will
> never know what "know" means for humans, and AFAIK it does not mean
> anything for applications :))
> The last "therefore must be merged" is superfluous here IMO. This section
> should focus on how subject identity is established, not on how it is used.
> Applications can do what they want with this information.
> Alt:
> "The core requirement for semantic interoperability of topic map
> applications is interagreement on subject identification mechanisms,
> enabling both humans and applications to establish when and how different
> topics, either from the same topic map or different ones, should be
> interpreted as representing the same subject and processed accordingly.
> Such identification mechanisms use specific set of topic properties,
> defined by a topic map data model, and constituting the "subject identity"
> for applications conformant to that model."
> Note the relative and open nature of this definition. There is nothing here
> as absolute subject identity, since it relies on agreed-upon mechanisms for
> subject identification.

Just as a side note in agreement with Bernard, I'm trying to build into
my application a subject identity based on a Faceted Classification approach,
where subject identity is determined by an accumulation of facets, not a
canonical identifier, which will be assigned to the identity created via FC.
To a great degree, identity is contextual and interpretative, not canonical,
and while I understand the point of creating canonical points of subject
identity, the means by which Topic Maps get to that identity should (IMO)
be open.

> N446 : "In recognition of the distinction between addressable subjects
> (i.e., information resources) and subjects in general (i.e., information
> resources and non-addressable subjects), Topic Maps provides two mechanisms
> for specifying subject identity, both of which use locators."
> Are not those mechanisms specific of a TM data model, even if it is the
> standard one (TMDM)? If so, it should be specified here, replacing "Topic
> Maps" by "the standard Topic Map Data Model"
> And something should be added about the possibility of other identification
> mechanisms in "non-standard" data models, based on any kind of properties
> specific TM applications would agree upon.
> It is a frequent case that no single property can be used for
> identification, but that a set of well-chosen properties provides identity.
> And I think it's a crucial choice for TM standard to decide that
> identification mechanisms should be based only on single property values
> (like subject locators) or could use "ad hoc" identifying set of
> properties. Choosing the latter opens interesting paths, that would be
> forbidden by the former. Suppose for example TM applications dedicated to
> document management agree upon a set of identifying properties being e.g. a
> subset of Dublin Core like:
> {dc:creator, dc:title, dc:publisher, dc:date, dc:format}
> Default unique identifiers like ISBN or other PSI, it makes sense to use
> such a property set as a basis for subject identification: two topics
> represent the same document if those five properties are equal. This
> specific identifying set of properties for this specific class of topics
> (documents) could be formally declared in an ontology. A declaration of
> commitment to this ontology for a given topic map (using any relevant
> language : TMCL, OWL, whatever) would therefore provide specific ways to
> applications who care to establish identity for this specific class of
> topics.

Curiously enough, Dublin Core and Faceted Classification both come out of
the library community, and the means Bernard describes is exactly how I'm
trying to implement FC in XTM. Whenever a set of facets are identitical,
a Topic is created, and *then* a subject identifier is assigned. Not the
other way around.


Murray Altheim               
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

    "The parties themselves, working with the Arab nations, have to find a
    way to co-operate to fight terror, without putting American forces in
    an area where they will become targets."
                              -- White House Press Secretary Ari Fleischer.