[sc34wg3] Re: SAM: comments (part I)

Lars Marius Garshol sc34wg3@isotopicmaps.org
04 Dec 2002 15:31:57 +0100


(Robert sent Graham and me these comments on the SAM because he
couldn't send to the list for some reason. I'm sending this reply to
the list so that people can see both what he wrote and the responses.
I've left out those of his comments which were so obviously correct
that there was nothing to do except incorporate them in the text.

If anyone has comments, please jump in.)


Hi Robert,

* Robert Barta
|=20
| I do not have posting access to the sc34wg3 mailing list (at=20
| least my posts do not get through), so I send this to you.
| Not that my contributions were earthshattering, but for my
| ego I would appreciate rw-access ;-)

I hope the access to the list has been fixed. If not, it certainly
should be.
=20
| Please disregard everything which is settled already. Sorry
| for causing any unnecessary work.

Nono, this is good stuff. It's improved the document in several
places, and helped us see how people read it. As such it really does
help to make this a better standard.
=20
| As usual, I tried to be as aggressive/critical as possible. :-)

Good! No use beating about the bush. :)
=20
| I will have to do some thinking how query/constraint languages
| may benefit from SAM. At the moment I'm not so sure.
=20
The thinking is that those two specifications will be based on the
SAM. They need to be based on something more abstrac than the two
syntaxes, and SAM fits the bill.

Below I repond to those of your comments which seem to require an
answer, skipping those that were accepted without further ado.

| - abstract: general:
| 	You talk about semantics there 'using a formal data model'. I
| 	wonder what 'formal' means. The most 'formal' I see are UML
| 	diagrams although in the last para before 2.1 it says '...for
| 	purpose of illustration'.

Good point. I think we can lose the "formal" part. It's just
marketing, anyway.
=20
| 	Semantics IS NOT provided by UML diagrams either. The only
| 	thing which SAM seems to to is to define nodes/
| 	objects/records with some attribute/fields.  Except the
| 	merging there is NO operation defined on these nodes. Without
| 	operation, no semantics.

Well, that depends on your point of view, and what you mean by
"semantics".=20
=20
| 	Why not define operations like 'add_topic', 'remove_topic'...?
| 	Then the model could define how the changes propagate.
| 	This would provide semantics.

There's been some discussion of this. I've wanted to leave this out,
but to let the model implicitly say what the constraints on operations
would be, through the model itself and the SAM constraints. I think
that works, but people have been suggesting this. Nobody's really come
forward to push the idea, though.

| - sec1:style
| 	'This international...' Why is international important here?

This is the style ISO wants...
=20
| - sec1:style
| 	'This internal standard' Why 'internal'?

Typo for international. :)
=20
| - sec1:general
| 	There was already a discussion about 'application',
| 	'processor', ...
|=20
| 	I would have avoided this completely and simply used the term
| 	and abstract application interface. Like the DOM (which could
| 	to be a template).

This part needs to be discussed. Will try to do that in Baltimore, and
then we'll see.

| - sec2:content
| 	I am not too happy about 'sets being types'. Sets are
| 	algebras (type is only the 'sort' but to make a set a
| 	set you need operations [union, ..] and that is part
| 	of an algebra.
|=20
| 	I guess most readers will be some sort of educated
| 	programmers, so.....hmmm.

I think the choice made in the SAM is defensible. This is not a
mathematical specification, just a guide for developers using language
developers are used to. A mathematical specification would build on
this specification and make its own terminology and representation
choices.=20
=20
| - sec2:content
| 	'...whenever two items in a set are found to be equal...'
|=20
| 	One sentence above you say that this never can happen
| 	in a set.

This is the issue of changes and time. It seems that some editorial
work is needed to make the text not just clear, but also inoffensive
to nitpickers. Will try to do that.
=20
| - sec2:style
| 	I would reconsider the use of 'equal' in favor for
| 	'equivalent' in some cases.

What's the argument for using "equivalent"? What's wrong with "equal"?
=20
| - sec2:content
| 	'null...no value. ....property is unknown'
|=20
| 	Is it the property which is unknown or the value?
|=20
| 	Why should the model make this distinction anyway?
|=20
| 	Either it is there, or not.

Null is there simply to give a simple name to what happens when a
property has no value and to have somewhere to hang the comparison
rule for that case. The "unknown" part is there because some people
read RDBMS semantics into this and thought it meant that the value
*had* to be unknown. That's not the case here, so it was thought best
to spell that out.
=20
| - sec2:content
| 	'Certain properties.....computed properties...'
|=20
| 	Who cares? Either they are properties or not. Implementations
| 	may compute one from the other, or keep them duplicate for
| 	performance reasons.

Actually, this distinction is very important. The fact that these
properties are redundant means that either we must have a complicated
specification of constraints that ensure that they are in sync with
the rest of the model, or we can have them as computed properties.
(Or we can lose them entirely.)

| 	The 'procedure' defining the relationship should rather be
| 	'function', I guess.

It could, but don't we then risk invoking all the connotations of that
term, even if they may not apply?
=20
| - sec2:style
| 	'strictly redundant'. I do not think that there is a 'strict
| 	redundancy'.
|=20
| 	Maybe '.....are redundant'.

There isn't actually a strict redundancy, but strictly speaking (that
is, seen in a strict way), the properties are redundant. I think this
word is actually useful in that sentence.
=20
| - sec2.1:style
| 	'....obviously inconsistent'.
|=20
| 	Are there not obvious inconsistencies? How to deal with them?

That's the task of TMCL. The "obvious inconsistencies" are those that
can be known to be inconsistent even without knowledge of the topic
map's ontology. I'll change the text to say so. (Good catch. :-)
=20
| - sec2.1:style
| 	'...exhaustive...'
|=20
| 	Do you mean 'complete'? And why not complete? What use is a
| 	set of constraints and some are missing.

The word "exhaustive" does mean complete, but is slightly more
appropriate here. The answer to the other questions are those given
immediately above. I'll revisit this whole paragraph.
=20
| - sec2.1:content
| 	'...when and how this detection....'
|=20
| 	'How' is clear but what about the 'when'? An implementation
| 	which - maybe for performance reason - is inconsistent
| 	relative to SAM may then report inconsistencies at _ANY_
| 	arbitrary time?

Yes. Inconsistencies arise when changes are made, and change implies
time, but this specification tries to avoid the issue of time
completely. Other specifications, like API specs, QL specs, syntax
specs etc deal with time and may add requirements as to when
constraint violations should be reported.

I don't really see how we could get a time requirement into this
spec. If you have ideas I'd be happy to hear them.

| - sec2:style
| 	there is no section 2.2. Why 2.1?

To get a nice heading before the bit on constraints. :-)
=20
| - sec3.1:content
| 	'....URI.....RFC2396...'
|=20
| 	Why is this important? Where does SAM reference the internal
| 	structure of an URI? How does that affect the semantics?

It affects the semantics in the sense that unless you know a locator
is a URI you won't know how to interpret it. In the syntax
specifications and other specifications built on top of the SAM this
will be very important (resolution of relative locators will be a key
operation, for example, and to do that you must know the notation).
=20
| 	Would it be possible to transparently pass this through as
| 	String?

Not really, because some locators may be HyTime locators, and others
may use other addressing mechanisms. You have to know the notation to
know how to interpret the locator. Implementations are not required to
support any particular notations, however, though to support certain
syntaxes you will have to support certain notations (HyTime for HyTM,
URIs for LTM/AsTMa/XTM).
=20
| - sec3.2:content
| 	'construct....source locators......whether from inside or
| 	outside.....'
|=20
| 	What is the 'inside' of a map? Is it the TM document? 'Construct'
| 	is syntactical?

It should say 'item' rather than construct, actually. (Thanks for
catching that one.) References inside the map are in the SAM model.
Will try to explain that better.
=20
| 	'....point back....to the source'
|=20
| 	Who would need this? If I load 3 million topics from a DB,
| 	what should I keep to point back? The topics are completely
| 	virtual, created on the fly.
|=20
| 	For XML serializations it is XPointer, XPath?
|=20
| 	For non-XML serializations it is line number?
|=20
| 	This is a nice add-on but in a standard...?

This just adds a capability. You don't _have_ to use it unless you
want to. The syntax specifications do require you to do so, however,
and when you think about it, that makes sense. How can you do correct
loading and merging of topic maps without keeping track of all the
URIs? If you load a topic map with 3 million topics you can't very
well keep the URI map in memory...

Also, source locators are very useful in the QL and CL for referring
to topics, as well as in other contexts, so they do serve a purpose.
We may want to allow processors to have an option to not store all the
SLs, if asked to skip them by the user, however.

| - sec3.3:content
| 	I find the first two paragraphs slightly contradictory:
|=20
|  	- 'topic maps are only containers, they represent only
|  	themself.'
|=20
| 	This means there is an inside and an outside? If I add
| 	a topic, I add it to a particular map?

Yes.
=20
| 	- topic maps can be reified (so they must be a subject) so
| 	  that they can be part of assocs...
|=20
| 	'these statements may provide.....' Why be specific here?
| 	What can we know what people will do with a topic being
| 	a topic map?

We don't. The spec is just giving examples to help people understand
what this construct means. In general I don't like giving examples in
specifications, but I think this concept is difficult to understand
without having some. I'll add the phrase "for example" to make it
clear that these are just examples, though.
=20
| 	And even if so, what about adding a topic to such a map?
| 	Is the author still the same?
|=20
| 	I think there is a confusion between the document (which
| 	undoubtly has an author and all this meta data) and the
| 	inhaled, loaded topic map instance conforming to SAM.
|=20
| 	I not sure whether I understand this part.

Assigning an author to the topic map is a judgement call made by
whoever makes the topic map. I think a TM in a file being loaded into
a SAM instance still has the same author. Whether that remains true
after additional processing happens is application-specific, as indeed
is the whole notion of an author.

Does this part make sense once you realize that this is just an
example, and not something defined or required by the SAM?
=20
| - sec3.4:general
| 	The part about the subject...is it necessary here? Looks
| 	more as an abstract introduction into TMs.

That's what the issue 'term-subject-def' is about. General opinion
seems to be that it doesn't belong here, and general opinion is about
to get what it wants. :)
=20
| - sec3.4.1:style
| 	'...referred from a topic map...'
|=20
| 	Why is 'map' here relevant? Is it not just 'topic'?

Because resources may be referred to from topics, variant names, and
occurrences. It's easier to not be so specific.
=20
| - sec3.4.1:style
| 	' ....a subject identifier is a locator that refers to a
| 	subject indicator....'
|=20
| 	I am not sure what happens here.

Hmmmm. What's the problem? This should be pretty clear, no?
=20
| - sec3.4.3:style
| 	'scope....extent of being valid.....context .... set of subjects...'
|=20
| 	Is there a reason to use many terms for the same thing?
| 	Or is it not the same?

The only duplication I see here is:

  - context/extent, which does seem unnecessary; will try to get rid
    of it,

  - scope/set of elements, where the first is a name for a particular
    use of the second. I think this is useful as an explanation.
=20
| 	'Outside....assignment is not known to be valid....'
|=20
| 	Is there a subtle difference to 'Outside ....not valid' which
| 	might be insinuated?

Not sure what you mean. This sentence is there because there were
questions about whether

  [norway =3D "Norge" / norwegian]

meant that one could conclude that "Norge" was *not* the name of
Norway in any language other than Norwegian. This sentence makes it
clear that that inference is not warranted.

| - sec3.4.3:content
| 	'is the empty set...unlimited'
|=20
| 	Hmmm, I understand that in a text document - as XTM - leaving
| 	the scope 'empty' is interpreted by a processor as
| 	'universal-scope', but for an abstract interface we should be
| 	explicit:
|=20
| 	"the unconstrained scope is represented by
| 	THE-UNIVERSE-AS-WE-KNOW-IT.  If the scope is empty, then the
| 	assignment does not apply at all."

Whether the empty set is the unconstrained scope or the never-valid
scope depends on the scope interpretation used, whether it's
all-subjects or any-subjects. Reading Marc's paper about this (in the
ISO registry) should make this issue quite clear, I think.

  <URL: http://www.y12.doe.gov/sgml/sc34/document/0327.htm >
=20
| - sec3.4.3:content
| 	'Precisely how.....but left for those creating....'
|=20
| 	Is this not an authoring issue?

Sure, but not all topic maps are authored. Some are generated
automatically. I think this wording covers both situations and is
quite clear, but I cleaned it up a bit, so it now only says "those
creating topic maps" without being quite as specific any more.
=20
| - sec3.4.5:content
| 	'Two topic items are equal if they .... are required to be
| 	merged...'
|=20
| 	If they are merged, then they are only one topic? If they are
| 	not merged, then the map is inconsistent relative to SAM.
| 	Confusing.

This is the time issue again. We'll look into it.
=20
| - sec3.4.5:content
| 	The reification information seems to be bidirectional: Topics
| 	point to 'information items' via the 'reified' property and
| 	information items (<> topics) point to the topics which reify
| 	them.
|=20
| 	Q1: rename 'reified' to 'reifies'? Why past?

It's not the past tense, it's short for "whatever is reified by this
topic".
=20
| 	Q2: is not this graph representation somewhat arbitrary?

What do you mean?
=20
| 	Q3: Maybe describe this separately, then also the description
| 	'how this can be computed' can be there.

Again I can't quite follow. Please explain.

| - sec3.5:content
| 	variants seem to be underspecified. How is for instance the
| 	second example of
|=20
| 		http://www.topicmaps.org/xtm/#elt-variant
|=20
| 	represented?

As a flat set of variants where the parameters have been inherited
downwards in the element tree. The XTM deserialization specification
explains this. It turns out that the intent behind the XTM spec was
that the nesting was just syntactic sugar so that you wouldn't have to
repeat all commonly shared parameters. So the nesting has no
significance, only the parameters that the variants end up with when
you've done the inheritance.

We were quite surprised to discover that this was the case, but also
very happy, since it allows much more efficient storage of variants,
so I think this is the right design decision in the model.
=20
| - sec3.6:content
| 	The way this is described this sounds like 'scoped basenames'
| 	to me, so it makes me wonder where the difference is.

Good question. Will look at this, too.
=20
| - sec3.7:content
| 	It is possible to have [value] and [resource] NULL?
|=20
| 	Or have it exclusive OR'ed?

It's not possible. We need to add a constraint here, and to variant
names.=20
=20
| - sec3.7:content
| 	How can an occurrence type be null if XTM enforces a type?
|=20
| 		3.9.1 <occurrence> Element
|=20
| 		The <occurrence> element specifies a resource supplying
| 		information relevant to a topic. The class of which the
| 		occurrence is an instance is indicated via the <instanceOf>
| 		child element. If no such element is present, the occurrence
| 		type defaults to the class defined by the occurrence
| 		published subject.

This is the psi-generics issue:
  <URL: http://www.ontopia.net/omnigator/models/topic_complete.jsp?tm=3Dtm-=
standards.xtm&id=3Dpsi-generics >=20
=20
What's the use of a PSI for saying that an occurrence is an
occurrence? I don't see any, so we left it out.

| - sec3.8:content
| 	Association type can be NULL?
=20
Yes. This is consistent with XTM.
=20
| - sec3.9:content
| 	[role playing topic]: ....or null
|=20
| 	What about "exactly one topic item, maybe the predefined topic
| 	'psi-whatever-topic' if none as provided..."

We couldn't do that, since it would imply that all unspecified types
are the same type, which is unlikely to be true. In Montr=E9al a
different resolution was decided upon:

  <URL: http://www.ontopia.net/omnigator/models/topic_complete.jsp?tm=3Dtm-=
standards.xtm&id=3Dassoc-role-player-type >
=20
| - sec3.9:content
| 	[ role type]: ...or null
|=20
| 	Ditto.

Ditto. :)=20
=20
| - sec4:content
| 	'....where it is not clear....different subjects....'
|=20
| 	Any reason why not less complicated:
|=20
| 	'....is clear....some subjects...'?

Because what the sentence is actually saying is that you are allowed
to merge topics as long as their [subject address] properties are not
equal, because when they are the topics are known to represent
different subjects.
=20
| - *:general
| 	I assume that the deserialisation of XTM is defined separately
| 	using SAM?

Yes.

  <URL: http://www.y12.doe.gov/sgml/sc34/document/0328.htm >
=20
| - sec4:general
| 	The merging process is defined rather operationally. Any
| 	chance to have this more declarative?

We'd be happy to have a more declarative definition, but haven't been
able to write one. Feel free to contribute a better text. (And I
really do mean that. This does not make for easy reading, but I wish
it did.)
=20
| - sec4.3:general
| 	The merging operation is defined rather asymmetric, like
|=20
| 		A +=3D B
|=20
| 	instead of the more conventional (and functional)
|=20
| 		A + B
|=20
| 	This will make problems in defining ANY declarative semantics.
| 	We lock out all but state machines to define a formal model.
| 	Not so good.

It was originally symmetric, but that made the XTM syntax spec much
harder to formulate, so it was changed. As for declarative syntax I
think that will be free to choose either approach so long as it
produces an equivalent result. So I don't think this is a problem.
=20
| - sec5:content
| 	What is this good for if applications are free to define their
| 	own?

The point is that

  a) applications may define published subjects for completely
     different things than the subjects given in section 5,

  b) applications may use these published subjects if they want to get
     the defined behaviour from implementations, and

  c) applications may use different published subjects if they do not
     want the defined behaviour.

This means everyone can get what they want.

| - sec5.1:content
| 	'....this spec....does not require implementations to actually
| 	represent the type-instance relationship using associations as
| 	long...'
|=20
| 	Behold! Is this not the idea of having a model that all=20
| 	implementations behave the same way in WHAT they represent?

Yes and no. See section 6.
=20
| 	And what is special about type-instance what supertype-subtype
| 	has not?

type-instance is very very common in topic maps, so it makes sense to
optimize its representation. The same does not apply to
supertype-subtype.
=20
| - sec5.3:content
| 	This is necessary? Only for 13250 compatibility? Maybe move to
| 	an appendix?

Sort names are very useful, but display names are not necessary so
long as the TNC is not required. So it is partly ISO 13250/XTM
compatibility, and partly that they are useful. The compatibility part
is just a paragraph, though, so I think it's better to keep it here
than to create a very short appendix.

--=20
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >