[sc34wg3] Documenting merging rules in TMDM

Lars Marius Garshol sc34wg3@isotopicmaps.org
Sun, 14 Mar 2004 13:21:47 +0100

* Patrick Durusau
| 1. Does it make sense to delegate merging rules to a separate part of
|    the Topic Maps standards family?
| If I say yes, then why does the TMDM have any merging rules at all? 

Because some parts of the TMDM have semantics that require merging.
You can't define properties like [subject identifiers] without
requiring merging on their values, because that would conflict with
the semantics of the property.

The same argument applies in reverse to topic names, variant names,
and occurrences: to always require merging would be in conflict with
the semantics assigned to these constructs.

Of course, for certain occurrence types and so on merging does make
sense, but the TMDM doesn't define any such occurrence types. (Though
it does define a type of occurrence types that can be used to express

Now, the reason we want this to go in as part of TMCL is that TMCL is
intended to be used for expressing the constraints on instances of
ontologies, and one constraint among many is "thou shalt only have
unique values for this occurrence type" (that is, if two topics have
the same value you can merge them).

| If the goal is clean separation of merging rules from other parts of
| the standard, wouldn't that be better served by having all the
| merging rules in one place?

To the extent that that makes sense, yes.
| The lack of a mechanism for specifying merging rules is a serious
| problem with the TMDM.

Do you mean "specifying merging rules" or do you mean "specifying
merging rules for information not in conformance with TMDM"?

Also, what's so special about merging rules? Why is it not a serious
problem that there is no mechanism for expressing constraints? Or for
expressing business logic in general?
| Kal's solution is optional, well I suppose all disclosure is optional
| in some sense. One could write up TMCL statements and bind them in a
| fancy binder that had nothing to do with the actual operation of an
| application.

And is that good or bad, from your point of view?
| Would you prefer "undocumented feature" to "vendor lock?" What I am
| concerned about are custom and non-disclosed merging rules embedded
| in an application that a customer relies upon without realizing that
| the rules are in operation. If another vendor comes along, say with
| an implementation that follows only the express rules of the TMDM,
| the customer can't understand why the behavior of their topic map is
| now different. Well, you could certainly say: "It must have had
| different merging rules." but that is hardly of any comfort to the
| customer.

I think what you are writing here has no real connection to reality.
First of all, what sort of application are you thinking of? End-user
applications? Well, either the customer will develop it themselves, or
they will write the contract with the consultant doing it in such a
way that they make sure they own the source code to the application
(though not the TM implementation it is based on, of course).

Now, having done this, could they just switch TM implementation
underneath? (Note that TMDM concerns itself with TM implementations
only; end-user applications are way out of scope.) The answer is:
obviously, no, since application code written for one TM
implementation will not work with another.

So they'll have to rewrite the thing anyway to switch from, say, TM4J
to OKS. In doing so they will have to deal with many more differences
than any differences in "merging rules". In fact, what they'll find is
that the merging rules are one of the few things that *are* the same.

Merging rules beyond what TMDM specifies are application-specific
anyway, and so will have been specified by the customer, and
implemented as part of the application. In short, the idea of vendor
lock-in through magic merging rules built in to the TM engine is
simply hogwash.

I've spent large parts of the last year building topic map
applications for customers and helping other consultancies do so for
their customers, and I don't recognize what you are saying as having
any kind of relevance to any aspect of this.

I agree that being able to describe the merging rules as part of the
definition of the constraints on an ontology using a standard language
like TMCL has value. As for the rest of what you are saying I don't
think it makes any form of sense.

| The TMDM specifies some merging rules and does not say that other
| merging rules have to be specified by TMCL but may are "freely
| allowed." If it is going to have merging rules, then it should have a
| mechanism for disclosing other merging rules that are "freely allowed."

| If TMCL takes, I don't know, another year to complete (random
| guess), does that mean that applications may have non-disclosed
| merging rules embedded in them? Is there a way to avoid
| non-disclosed merging rules?

No, there isn't. We could add to the TMDM a clause that says "every
application that uses this standard must include as part of its
documentation a description of the merging rules of that application
printed double-spaced on A4 paper and signed with the blood of seven
witnesses" but it would make absolutely no difference whatsoever.

Providing a standard mechanism for this is of course valuable, if that
mechanism can be implemented in software, so that those using it can
take their data + schema and move it to a different implementation and
get the same results there. And that is what TMCL will do.

Of course, they'll still have to reimplement their entire user
interface and business logic, which really will give some extent of
vendor lock-in, and whatever mechanism we define for "documenting
merging rules" will not affect that in the slightest.

Now TMQL *will* make a difference in that parts of the business logic
and user interface *can* be implemented with TMQL, and thus be
portable across implementations. TMQL will have precisely defined
behaviour, so that there will be no vendor lock-in there, and any
relation to merging will be carefully defined.

Another thing that will help is TMAPI. TMAPI defines how it interacts
with merging, so again there is no problem with magic undefined
"merging rules", since some will be part of TMAPI and the rest will
have to be in customer logic.

An executive summary of the above would read something like this:

  Patrick, what you are saying makes no sense.

We can continue to play this game for a long while yet, but that is
the reality.

| Other than the reified properties, topics are deemed equal on the
| basis of a variety of locator items.
| Locator items (well locators anyway) are defined as:
| 3.11 locator
| a string conforming to some locator notation that references one or
| more information resources
| That certainly is one implementation strategy for determining the
| identity of a subject. Compare the locators to see if they point to
| the same place. If they do then the two topics are equal.
| Another implementation strategy would be to resolve the locators and
| compare the content or results of operations on the content of what is
| found with the locators.
| Another would be to have additional properties for topics and to base
| merging on those additional properties. (Properties here not being
| locators as defined above.)
| This is not meant to be exhaustive.

This is not "implementation strategy", this is different possible
rules for the standard. The standard says you must follow the first of
these approaches, and if you do anything else (without the application
expressly asking you to do so) you are not conformant.

Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >