... and what about un-merging? RE: [sc34wg3] Documenting merging rules in TMDM

Mon, 15 Mar 2004 10:07:01 +0000

Lars Marius Garshol wrote:

>* Bernard Vatant
>| 
>| 1. Dmitry writes : "TMCL will require only one constraint -
>| haveSameSubject"
>| 
>| I deeply agree with that viewpoint. IMO the standard should simply
>| define the rules under which subjects are to be considered the same
>| or different : distinction between SIDPs and other properties. What
>| the applications will do with that, merging or otherwise, is their
>| own business.
>
>This is precisely my point of view as well, and also what the current
>CD does. (Though it does not use the "SIDP" terminology.)
>
>| So maybe the whole notion of "merging rules" is to be striken from
>| the standard.
>
>Please see the top part of my reply to Patrick for why this is not on
>the cards.
> 
>| 2. There has always been an over-focusing on merging.  
>
>Indeed. :)
>
>| Un-merging has been completely overlooked, as if merging was a
>| monotonous, non-reversible process. Are topics like black holes,
>| that can only grow with time?
>
< cool stuff about TMs as information blackholes snipped for brevity ;-) >

>| IOW : do we make provision for merging to be a reversible process ?
>
>At the moment there is no explicit provision for this. If vendors want
>to add additional constructs to support it, they can. In general I
>think this is an area that needs more theoretical work, and that
>trying to standardize support for this or anything relating to it is
>too early yet.
>

Speaking as one who has done this (TM4J supports unmerging), it is not 
exactly trivial but it is not too difficult to implement. Implementing 
it efficiently on the other hand is more difficult. While it could be 
represented in a data model, I made some fairly arbitrary decisions in 
the implementation - specifically in the "chaining" together of merged 
topics - that would probably not stand up to rigorous scrutiny. In 
addition, there are efficiency issues in testing object equality 
(especially comparisons of sets of merged topics) and a big issue in 
duplicate suppression (in TM4J, there is no duplicate suppression in the 
data model because the dynamic merging means that you could end up 
destroying information).

Cheers,

Kal