[sc34wg3] Almost arbitrary markup in resourceData

Mon, 17 Nov 2003 19:15:03 +0100

Hello all

I've been considering for quite a while before jumping in that can of
worms. Rather than follow-up any of the ongoing argument threads that have
turned out to be quite hot, I will step back to the original Lars Marius
question, to which, seems to me, pragmatic, non-theological, answer can be
given, without conflicting with the fundamental principles of Topic Maps
paradigm - on which I am optimistic enough to believe everyone in this
forum (Murray included) agrees upon.

*Lars Marius

> do we want to allow XTM elements to appear inside these
> elements? That is, is
>
>   <resourceData>XTM is an <topicRef xlink:href="#XML"/>-based markup
>   language.</resourceData>
>
> OK? If so, what does it mean?

The latter is the fundamental question. My first-cut answer would be, along
the lines of Graham's one ("no big deal"), that it just *can't mean
anything* from the viewpoint of a TM application.

>From a TM application viewpoint <resourceData> content should be a black
box. The default behaviour of a TM application should be to store and pass
that box to its environment "as is" without opening it, processing it, or
trying by any means to figure what is inside and interpret it. And,
agreeing again with Graham on that, there is IMO absolutely no difference,
other than syntactic, with what happens with <resourceRef>. The
specification does not put any limits nor constraints to what you can get
when you dereference a <resourceRef>, and that's good, so why should it put
limitations or constraints on what you can get when you "open", so to
speak, the <resourceData> box? I've always understood <resourceData> as
just a shortcut for <resourceRef id="foo" xlink:href="#foo">. Not sure this
is valid syntax, but see what I mean : the referenced resource is *here* in
the file.

So, even if it allows extra markup in the <resourceData> box, the XTM
specification IMO should not say anything about any allowed, recommended or
forbidden syntax, and even less about the semantics of any of it, and the
conformance of a TM application should not include any capacity to handle
it.

Now if a specific application want to develop specific features based on
the markup embedded in <resourceData> (and I believe Jim and Eric and
Martin have excellen different reasons to want that), the architecture of
the applications should carefully make distinct what belongs to TM
processing (handling <resourceData> as black boxes) and what is "ad hoc"
processing able to open the boxes and deal with their content ... And any
implementation of that kind should be well aware that this ad hoc
processing has nothing to do with Topic Maps nor XTM specification.

Where I agree with Murray is that any kind of mix-up of XTM namespace with
extra namespaces, defined by XTM specification itself, would be opening a
can of worms and is not a good idea at all.
Now I won't argue about what validation is or is not, and if DTD or
RELAX-NG schema should be normative. I'm quite agnostic about that, as long
as the schema makes clear the above limit between the outside and inside of
"resource boxes".

So I would see some recommendation in the spec along the lines of :
"You can do that, but think twice about what will be interoperable with
whom."

And also
"You can do that, but think twice if you could not make it another, more
interoperable way."

I don't think the latter has been considered that much in the current
debate. Allowing embedded markup could open the door to lazy modeling,
meaning by that it might often be the case (and well, if you adopt the
Reference Model philosophy, it certainly *is* always the case) that
semantics captured in the embedded markup could have been expressed as
proper TM information at a finest level of granularity. And the
specification prose should recommend to do so whenever possible.

Example of a "lazy occurrence" of type "PostalAddress" for topic "John
Smith".

<resourceData>
	<street>Main Street</street>
	<number>23</number>
	<city>Nothing Gulch</city>
</resourceData>

It's clear that the lazy TM author could (should?) have defined
"PostalAddress" as a topic class, then "street", "number" and "city" as
occurrence types, and linked "John Smith" to "John Smith's address" using a
"PersonalAddress" association.

I'm not pretending that any embedded markup cases can practically and
easily boil down to that kind of reduction, but my ground experience, in
Mondeca real world implementations, so far, is that even in cases where
representation of fine-grained information embedded in existing resources
has been needed, a workaround to embedded markup has been found.

Bernard

Bernard Vatant
Senior Consultant
Knowledge Engineering
Mondeca - www.mondeca.com
bernard.vatant@mondeca.com