[sc34wg3] Almost arbitrary markup in resourceData

Lars Marius Garshol sc34wg3@isotopicmaps.org
18 Nov 2003 00:31:37 +0100

* Lars Marius Garshol
| do we want to allow XTM elements to appear inside these
| elements? That is, is
|   <resourceData>XTM is an <topicRef xlink:href="#XML"/>-based markup
|   language.</resourceData>
| OK? If so, what does it mean?

* Bernard Vatant
| The latter is the fundamental question. My first-cut answer would
| be, along the lines of Graham's one ("no big deal"), that it just
| *can't mean anything* from the viewpoint of a TM application.

This is my view as well, and this is the reason why I feel we should
exclude markup in the XTM namespace from appearing within
<resourceData>. It's not clear from what you write whether you agree
with that or not.
| From a TM application viewpoint <resourceData> content should be a
| black box. The default behaviour of a TM application should be to
| store and pass that box to its environment "as is" without opening
| it, processing it, or trying by any means to figure what is inside
| and interpret it. And, agreeing again with Graham on that, there is
| IMO absolutely no difference, other than syntactic, with what
| happens with <resourceRef>. 


| The specification does not put any limits nor constraints to what
| you can get when you dereference a <resourceRef>, and that's good,
| so why should it put limitations or constraints on what you can get
| when you "open", so to speak, the <resourceData> box? I've always
| understood <resourceData> as just a shortcut for <resourceRef
| id="foo" xlink:href="#foo">. Not sure this is valid syntax, but see
| what I mean : the referenced resource is *here* in the file.

And yep again.
| Where I agree with Murray is that any kind of mix-up of XTM
| namespace with extra namespaces, defined by XTM specification
| itself, would be opening a can of worms and is not a good idea at
| all.  Now I won't argue about what validation is or is not, and if
| DTD or RELAX-NG schema should be normative. I'm quite agnostic about
| that, as long as the schema makes clear the above limit between the
| outside and inside of "resource boxes".

I think it does. You can check for yourself when the draft arrives.
| So I would see some recommendation in the spec along the lines of :
| "You can do that, but think twice about what will be interoperable
| with whom."

| I don't think the latter has been considered that much in the
| current debate. Allowing embedded markup could open the door to lazy
| modeling, meaning by that it might often be the case (and well, if
| you adopt the Reference Model philosophy, it certainly *is* always
| the case) that semantics captured in the embedded markup could have
| been expressed as proper TM information at a finest level of
| granularity. And the specification prose should recommend to do so
| whenever possible.

This I agree with, and this is something that's bothered me a bit
about allowing embedded XML. We may well see people representing stuff
with elements and attributes that they really should be representing
with topics and associations. Or, even worse, we may see them doing
both, so that they have (horrors!) redundant data.

Whether we can, and whether we should, do anything about that I am not
sure. So far I've leaned towards just leaving it to "user education",
but I could be convinced that that's wrong.
| Example of a "lazy occurrence" of type "PostalAddress" for topic
| "John Smith".
| <resourceData>
| 	<street>Main Street</street>
| 	<number>23</number>
| 	<city>Nothing Gulch</city>
| </resourceData>
| It's clear that the lazy TM author could (should?) have defined
| "PostalAddress" as a topic class, then "street", "number" and "city"
| as occurrence types, and linked "John Smith" to "John Smith's
| address" using a "PersonalAddress" association.

Wow. I would have argued the opposite, actually, that this is just a
piece of data, and that it shouldn't be topic mapped, because you'd be
unlikely to want to ever say anything more about the address than what
you do in the example above. 

Of course, so long as the contents of <resourceData> are one level of
markup only, this works, but if they were nested you couldn't use this

Of course, we could also turn this around and say: create a topic for
the city (I have in most cases), create a topic for the street (I
wouldn't have), and make the street number an occurrence. (Of course,
this leaves you with zip code and all that, but still...)

I think people do addresses as inline occurrences now, and sometimes
structured like you suggest, and I sometimes do them structured like I
suggest above, so, again, it seems that while embedded XML does
provide one more way of blowing your foot off, there are several
almost identical ones there already.
| I'm not pretending that any embedded markup cases can practically
| and easily boil down to that kind of reduction, but my ground
| experience, in Mondeca real world implementations, so far, is that
| even in cases where representation of fine-grained information
| embedded in existing resources has been needed, a workaround to
| embedded markup has been found.

I think that is true. The question is how bad we consider the
workarounds to be, and so far the general opinion seems to be that we
consider them bad enough that we want to get rid of them.

Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >