Section 4.3 Subject Identity RE: [sc34wg3] Strawman draft of ISO 13250-1

Bernard Vatant sc34wg3@isotopicmaps.org
Wed, 5 Nov 2003 11:59:48 +0100


Steve, and all

I guess you won't be surprised by the section on which I have comments :)

In fact I am striken by the reduction happening in this section, which
begins very very well, opening the door to a very good and generic
definition of subject identity (although I would prefer subject
identification, as explained below), and then restricts it practically to
the use of subject locator and subject indicator - actually the mechanisms
used by the Standard Data Model - although the Reference Model, through the
notion of SIDP, offers the possibility of an unbounded number of other
identification mechanisms for TM applications. What I propose below is an
alternative wording that does not close the door to those. This is a first
cut coming out of my breakfast coffee. Please fix my prose where needed.

N446 : "Subject identity is a set of properties of a topic that enable
applications (and humans) to know which subject the topic represents and,
in particular, to know when two topics represent the same subject and must
therefore be merged."

I'm uneasy with this absolute definition of subject identity given by the
first "is", because I don't think there is something like an absolute
subject identity, only an interagreement on subject identification process.
And more uneasy with the use of "know" in this context: We don't and will
never know what "know" means for humans, and AFAIK it does not mean
anything for applications :))
The last "therefore must be merged" is superfluous here IMO. This section
should focus on how subject identity is established, not on how it is used.
Applications can do what they want with this information.

Alt:

"The core requirement for semantic interoperability of topic map
applications is interagreement on subject identification mechanisms,
enabling both humans and applications to establish when and how different
topics, either from the same topic map or different ones, should be
interpreted as representing the same subject and processed accordingly.
Such identification mechanisms use specific set of topic properties,
defined by a topic map data model, and constituting the "subject identity"
for applications conformant to that model."

Note the relative and open nature of this definition. There is nothing here
as absolute subject identity, since it relies on agreed-upon mechanisms for
subject identification.

N446 : "In recognition of the distinction between addressable subjects
(i.e., information resources) and subjects in general (i.e., information
resources and non-addressable subjects), Topic Maps provides two mechanisms
for specifying subject identity, both of which use locators."

Are not those mechanisms specific of a TM data model, even if it is the
standard one (TMDM)? If so, it should be specified here, replacing "Topic
Maps" by "the standard Topic Map Data Model"

And something should be added about the possibility of other identification
mechanisms in "non-standard" data models, based on any kind of properties
specific TM applications would agree upon.

It is a frequent case that no single property can be used for
identification, but that a set of well-chosen properties provides identity.
And I think it's a crucial choice for TM standard to decide that
identification mechanisms should be based only on single property values
(like subject locators) or could use "ad hoc" identifying set of
properties. Choosing the latter opens interesting paths, that would be
forbidden by the former. Suppose for example TM applications dedicated to
document management agree upon a set of identifying properties being e.g. a
subset of Dublin Core like:

{dc:creator, dc:title, dc:publisher, dc:date, dc:format}

Default unique identifiers like ISBN or other PSI, it makes sense to use
such a property set as a basis for subject identification: two topics
represent the same document if those five properties are equal. This
specific identifying set of properties for this specific class of topics
(documents) could be formally declared in an ontology. A declaration of
commitment to this ontology for a given topic map (using any relevant
language : TMCL, OWL, whatever) would therefore provide specific ways to
applications who care to establish identity for this specific class of
topics.

They'll correct me if I am wrong, but seems to me, from a recent
conversation with Michel and Steve (Newcomb), that this kind of perspective
is what the Reference Model is about.

Bernard

> -----Message d'origine-----
> De : sc34wg3-admin@isotopicmaps.org
> [mailto:sc34wg3-admin@isotopicmaps.org]De la part de Steve Pepper
> Envoye : lundi 3 novembre 2003 20:31
> A : G. Ken Holman - ISO/IEC JTC 1/SC 34 Secretary
> Cc : sc34wg3@isotopicmaps.org
> Objet : [sc34wg3] Strawman draft of ISO 13250-1
>
>
> Attached please find N446, a first Working Draft of
> Part 1 of ISO 13250.
>
> Please read the Editors' Note at the beginning of the
> document *very carefully* before looking at the rest of
> the document!
>
> It explains why we regard this draft as a strawman and
> what issues we would like National Bodies to consider
> before the Philadelphia meeting.
>
>
> Best regards,
>
> Steve Pepper & Motomu Naito
> Editors, Part 1
>
>
> P.S. Ken: The title of this document is different from
> the one I gave you when requesting the document number.
> I'm not sure what the most appropriate is in this case,
> so feel free to change the one or the other before
> distributing the document to NBs. Thanks.
>