[sc34wg3] SAM,SAM+,SAM++ Or how to extend SAM

13 Feb 2003 13:19:24 +0100

* dmitryv@cogeco.ca
| 
| I think that topic map represented in XTM, for example, has "hidden"
| dynamic information in "mergeMap" element and in implicit references
| to other topic maps. When we represent this topic map in SAM we
| "lose" this information.  More accurate, SAM enforces to load ALL
| maps and merge them in "one big step". 

This is true, and it was a deliberate design decision, in the sense
that we wanted the SAM to only have the information that has logical
(or ontological, if you want) impact, whereas we wanted to leave out
lexical information completely.

Part of the reason is that lexical information is horrendously complex
(was that "z" in the base name written with character references, and
was the character reference hex or decimal, and upper or lower case?),
so the line must be drawn somewhere. We chose to put all lexical
information on the other side of the line. (Well, nearly all. I
suppose it could be argued that the source locators are on the wrong
side of that line.)

I see that the XML Information Set has chosen to take in quite a bit
of lexical information (and it's taken a fair amount of criticism
because of that), and I also see customers now wanting to unmerge
their previously merged topic maps.

| So at the beginning we have empty model (no topic map item). We do
| deserialization as "one big step" and we get internal data model at
| the end. In this case we do not lose information. But the price is -
| "ONE big step". 

Actually, thanks to the source locators, we don't necessarily need to
do that. The part of the new XTM syntax specification where it is
specified whether or not you have to do a single step is not yet
finished. We could do it either way, and personally I don't have an
opinion yet.

XTM 1.0 is clear that implementations are free to decide whether to
load external XTM documents or not when they see references to them,
and we haven't yet worked out whether or not the new specification
should say the same.

| Is it compatible with "open world" assumption (which I think is
| extremely important for Topic Maps)? 

| So I think that topic map item in SAM should have a property which
| is equivalent of (or better say is internal model for) "mergeMap".

You are thinking of something like this?

  6. [unmerged documents]: A set of locator items. This is the set
  containing the locators of every external document seen during the
  deserialization process that has not been deserialized.

It's tricky in that it implies that there has been a deserialization
process (there may not have been), and that it will continue at some
as yet unknown point in the future. We'd also need to keep track of
what syntax these external documents are written in since it need not
be XTM.

And, finally, I'm not sure what this buys us. You can already keep
track of this information if you want, so where's the benefit?

* Lars Marius Garshol
|
| I agree with you that this functionality may be desirable in some
| contexts, but if we put it in the SAM we'll require *all*
| implementations to have it, which I think is exactly what you don't
| want. The SAM explicitly says that implementations are allowed to
| maintain additional information beyond what the SAM requires, and I
| think this relationship is an example of that

* dmitryv@cogeco.ca
|
| I think we need some clarification what these extensions could
| be. How it can be done? Any analogy with Java "extensions"? I think
| it is important to describe extension mechanism because very often
| "extensions" become standard in next generation.

I agree that if there were common extensions in use out there we
should perhaps document them somehow or maybe even include them, but
I don't actually know of any SAM extensions out there that are used in
actual engines. In fact, the SAM is already stretching it a bit,
compared with existing engines (source locators, association roles,
base locator on topic map).

But as for how it can be done you can simply make the topic map
implementation you build maintain the extra information you need to
make this work. Add extra methods and attributes, or functions and
fields, or whatever it is best suits your implementation. You will
still conform to the SAM.

* Lars Marius Garshol
|
| The best way to think of the SAM is as an invariant, if you are
| familiar with that concept. At the beginning and end of every
| operation you perform on your topic map it must conform to the SAM,
| including the merging and duplicate removal rules. So long as you
| meet that criterion you can implement any dynamic behaviour you
| want.

* dmitryv@cogeco.ca
|
| Invariant? I like that... But question is what invariant is and what
| operations we take into consideration. 

All of them. :)

| I think that basic operation is "load and merge one topic map with
| what I have already in SAM". 

It is, but only because of the way the specification texts are put
together at the moment. When we have a query language there will be
lots more operations, and the same thing will apply to them. Ideally
this should also apply to all methods in APIs, for example, though
whether you can really expect that is another matter.

| My suggestion is to add my favorite properties to existing SAM
| invariant. So we have existing SAM + "HasToBeMergedWith" +
| "AlreadyMergedWith". Operation (Load and Merge) updates SAM and my
| favorite properties I have all information in extended SAM (SAM+ ?)
| to support process. Take topic map from "HasToBeMergedWith", load
| and merge it with SAM, delete from "HasToBeMergedWith" add to
| "AlreadyMergedWith".  It reminds me Dijkstra's loop invariants...

Hmmmm. Why should we add it to the SAM? That would require all engines
to support this functionality if they are to conform to the SAM, it
would complicate them considerably, and it would also complicate the
XTM syntax specification quite a bit. I also think that it it entirely
appropriate for implementations to *not* support any of this, so we'd
have to come up with some way to make it optional if we did put it in.

And then, finally, there's the question of what we would gain by
adding it. 

* Lars Marius Garshol
|
| I think you can build that on top of the current SAM, but I may not
| have fully understood what you are thinking. What are the problems
| you have in mind that won't be solvable with the current SAM?

* dmitryv@cogeco.ca
|
| Now, we can extend SAM+ (SAM++ ?) and assume that each item in SAM
| has "IsSupportedBy" property which keeps list of IDs of topic
| map-"supporters" .  We can introduce operation "Update one of the
| topic maps loaded in SAM with new version". SAM++ has enough
| internal information to perform this operation. SAM does not have
| enough information to do it efficiently, SAM has to do loading and
| merging from the beginning.  BTW, "HasToBeMergedWith" can be updated
| by this operation. We can combine our two basic operations and keep
| invariant.

Yeah, but what I'm asking is whether you can do this even if SAM is
not extended. The point is that SAM++ is conformant with SAM, so your
implementation will be free to do this. The question is whether we
should force every other implementation to do it. I think it is
entirely legitimate for an implementation *not* to support this. That
needs to be considered, too.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >