[sc34wg3] new working draft of 13250-5 (Reference Model)

Jack Park sc34wg3@isotopicmaps.org
Tue, 9 Nov 2004 09:46:23 -0800


I particularly like Bernards ISBN example, since it hints that what he
called OPs, could, in fact, be "conferred" properties. Here, I am
thinking of an instance where a particular book had, say, two authors
in the first edition, but took on another author in a second edition,
the  new author being conferred into the identity "mix"  with a scoped
assertion (association). I have been wrestling with the notion of
conferred identity since the first draft of TMRM, and now, maybe using
Bernard's example as a hint, there's hope :)

Jack


On Tue, 9 Nov 2004 12:45:39 +0100, Bernard Vatant
<bernard.vatant@mondeca.com> wrote:
> 
> Steve and all
> 
> I've read the new TMRM draft with much interest. Some "thinking aloud" about it.
> 
> You are certainly aware of a current thread of thought on "identification as dynamic
> process" vs "identity as static set of properties" (see various posts about it on
> universimmedia blog below), and my reading of the new draft has been along those lines.
> Please note all of this is quite new thought to me also, and I am not sure yet where it
> goes.
> 
> Anyway I would like here to make the case for a shift from the static notion of "Subject
> Identity Properties" (SIP) to the dynamic one of "Subject Identification Rules" (SIR, yes
> Sir). I already happily notice that the notion of "rules" has explicitly appeared in many
> parts of the draft. I will focus on demanding that TMA disclose, among other things, "the
> rules for determining when multiple proxies are surrogates for the same subject".
> 
> I definitely like this way of putting things, which opens the door to any kind of rules of
> identification. So I wonder why to restrict those rules to disclosure of SIP classes.
> Agreed, a SIP can be as complex as can be, but what happens in most "natural"
> identification process is that various rules are applied on properties which are not
> absolutely SIPs. I will take two examples.
> 
> Exemple 1 : Identifying books :
> How do you make sure that a book X you find on Amazon is the same as the book Y you are
> looking for?
> I will use "X :: Y" to indicate that you decide that they are actually the same (watever
> this sameness means).
> 
> I guess you could apply a kind of following heuristic (succession of rules)
> 
>         if ISBN (X) = ISBN (Y)
>                      then  X :: Y
> 
>         else    if ISBN (X) or ISBN (Y) is not specified
>                 and AuthorName(X) = AuthorName(Y)
>                 and Title(X) = Title (Y)
>                 and PublicationDate(X) = PublicationDate(Y)
>                 and EditorName(X) = EditorName(Y)
>                         then  X :: Y
> 
> In the TMA disclosure, ISBN would be defined as a SIP, whereas AuthorName, Title,
> PublicationDate, EditorName are OPs, although together they "act as" a SIP. Of course you
> can create a complex SIP from this actual combination, but it looks more natural to
> present it like an identification rule rather than a "property" in the object-oriented
> sense of the term, or it's a "complex property", a notion more difficult to explain and
> grasp than the notion of identification rule.
> 
> Exemple 2 : Identifying news :
> How do you make sure that a news X from Reuters today is the same as a news Y from AFP
> yesterday?
> e.g. I already know from X that "Georges W.Bush was re-elected" so I don't care to be said
> by Y that "Bush is the new President of the USA".
> 
> This is much more tricky. Admitting you have defined "news" as a class of documents, maybe
> is this case your TMA includes a text mining engine, applying complex, context-sensitive,
> linguistic analysis rules to infer that X and Y "have the same subject" and therefore
> should be identified as the same. Should the TMA disclosure include all the rules applied
> by the linguistic tool? What would be the classes of SIPs? Or would not the TMA disclose
> simply that it uses the Text Mining Application such and such, or the Google News
> algorithm, to compare news?
> Actually this is not academic. We have in Mondeca a Text Mining partner providing
> technology plugged to ITM through API, and the first application of this coupling we have
> made was for succesfully mining Reuters Financial News, with efficient extraction, storing
> and merging of subjects like companies and their announced relationships (buy, merge,
> partnership, participation, ...).
> 
> So what I am questioning is that "Subject Sameness Detection Rules" (that I would more
> simply put as Subject Identification Rules) should always be linked to a class of SIPs.
> This is just the simplest case, like ISBN in Example 1.
> 
> Bottom line : the word "rule" has so many occurrences in the document that it might
> deserve some definition in the Glossary.
> 
> My 0.02 Euros - currently a little more than 0.02 $ :))
> 
> Bernard
> 
> **********************************************************************************
> 
> Bernard Vatant
> Senior Consultant
> Knowledge Engineering
> bernard.vatant@mondeca.com
> 
> "Making Sense of Content" :  http://www.mondeca.com
> "Everything is a Subject" :  http://universimmedia.blogspot.com
> 
> **********************************************************************************
> 
> > -----Message d'origine-----
> > De : sc34wg3-admin@isotopicmaps.org
> > [mailto:sc34wg3-admin@isotopicmaps.org]De la part de Steven R. Newcomb
> > Envoye : lundi 8 novembre 2004 16:30
> > A : sc34wg3@isotopicmaps.org
> > Objet : [sc34wg3] new working draft of 13250-5 (Reference Model)
> 
> 
> >
> >
> > All -
> >
> > A new working draft of 13250-5, "Topic Maps - Reference Model",
> > is now available at http://www.jtc1sc34.org/repository/0554.htm
> >
> > It's significantly shorter, and we hope and believe it's easier to
> > understand, too.
> >
> > -- Steve
> >
> > Steven R. Newcomb, Consultant
> > Coolheads Consulting
> >
> > Co-editor, Topic Maps International Standard (ISO 13250)
> > Co-drafter, Topic Maps Reference Model
> >
> > srn@coolheads.com
> > http://www.coolheads.com
> >
> > direct: +1 540 951 9773
> > main:   +1 540 951 9774
> > fax:    +1 540 951 9775
> >
> > 208 Highview Drive
> > Blacksburg, Virginia 24060 USA
> >
> > _______________________________________________
> > sc34wg3 mailing list
> > sc34wg3@isotopicmaps.org
> > http://www.isotopicmaps.org/mailman/listinfo/sc34wg3
> >
> 
> _______________________________________________
> sc34wg3 mailing list
> sc34wg3@isotopicmaps.org
> http://www.isotopicmaps.org/mailman/listinfo/sc34wg3
>