[sc34wg3] Almost arbitrary markup in resourceData

Lars Marius Garshol sc34wg3@isotopicmaps.org
12 Nov 2003 21:01:51 +0100

* Murray Altheim
| [...] and I think it's fallacious to think that you can ignore,
| store, or throw away any arbitrary, well-formed markup within the
| document, as Eric has said, it's *part* of the document. 

If you mean that storing the markup doesn't take away the problem of
interpreting it when the time comes to, say, render the base name or
occurrence, then we agree. If you mean something else I don't know
what you mean, and I'm afraid you'll have to explain it to me.

| And if XTM is supposed to be an interchange syntax, you can't
| interchange with things you don't know about. If Eric sends me one
| of his hybrid *TM documents, with his embedded, proprietary markup,
| I can't use it. Sure, I can "store" it. But I can't use that *TM
| document unless I support the embedded markup. That's not
| interchange.

Let's try to be more precise here. You can process and interpret the
XTM parts without problems, which will give you a topic map with
embedded XML in it. What you may have difficulty with is processing
the XML in base names, variant names, and occurrences. Do you agree?

If you do agree that leaves the issue of the problems with the
embedded XML. Do you think it's better if people stick escaped markup
in there instead of embedded XML? Or if they start using codes like
[em]this[/em] to do what they want to do? People do this *already*, so
clearly we have the choice between ugly hacks like that, and allowing
markup in there.

I think that the problem of knowing how to process the string data
that you receive will be *bigger* if we *don't* allow XML markup,
since then we won't know what hack people have chosen to represent
their XML, whereas if we do allow the markup at least we'll know it's
XML when we see it.
| On the technical, syntax side, the embedded markup is going to have
| to be completely specified, and legal via XML Namespaces. It
| probably should have a wrapper element in its own namespace. If you
| bother to look at Paul Grosso's XML Fragments draft at the W3C (and
| he is one of the world's markup experts, no question), you'll see
| this is a very complicated problem, not something where you can just
| stick <xhtml:b> and expect to get away with it.

Yeah, I know. That's why Graham and I are supposed to write up two
different proposals so that the committee can evaluate them. We're
already painfully aware that this is difficult. Dmitry had a pretty
good list of some of the issues in his posting.
| But there are still a lot of issues with RELAX-NG.

What issues? (We've chosen to make the RELAX-NG schema the normative
schema for XTM 1.1, so this is not a rhetorical question.)

Personally, I'd like to go on record and say that RELAX-NG is far and
away the coolest technology I've seen for the past four years (since
XSLT came out and I learned about topic maps, basically). It is
simple, powerful, and extraordinarily beautiful. The more I see of it,
the more I love it. If we can do half as well in our first attempt at
TMCL we shall consider ourselves very lucky indeed.

| And what about those systems that don't support RELAX-NG?

One reason for choosing RELAX-NG as the normative schema is that it
allows us to say "a <topicMap> element is valid if it conforms to this
RELAX-NG schema" and be done. We don't require XTM implementations to
do anything, except to verify that the document follows the schema. If
they do that by using a DTD validator that's fine. If they do it with
custom validation code that's also fine. We just use the schema as a
replacement for prose.

| Are you going to require all Topic Map systems based in XML to use
| RELAX-NG if they want validation? 

No. That would be suicide.

| No DTD?

The DTD is in Annex B. It just isn't normative.

| Bad marketing decision, if you ask me.

That's possible. We chose RELAX-NG because it allowed us to have a
clean XML policy without getting involved in the brokenness of XML 1.0
and whether people use validating or non-validating parsers and all
that. We also thought it might be politically correct, but the
technical argument was what tipped the scales.
* Lars Marius Garshol
| Either way it won't hurt those who don't use XML inside XTM.
* Murray Altheim
| I don't follow this argument at all. "Won't hurt" means only those
| who have no need to interchange with the rest of the Topic Map
| community. 

True. (That's effectively what I discuss above.)

* Lars Marius Garshol
| I agree this is something that must be considered. We've already
| changed XTM by adding <instanceOf> inside <baseName>, but admittedly
| this is a more invasive change.
* Murray Altheim
| I'd say that it is an order of magnitude kind of change. I didn't
| have to do much at all to implement <instanceOf> within <baseName>,
| whereas my entire processing model is different if I have to do
| something intelligent with whatever markup somebody sends me. I need
| probably mappings for each possible XMLNS encountered, perhaps MIME
| support, who knows?

We've been through this already: you're only required to store it.
("You" being the XTM processor. The application side of it is
discussed above.)
* Lars Marius Garshol
| What's changed my mind is that there is an obvious need for this,
| since so many people are doing this in different ways already.
* Murray Altheim
| Where's the metric on this last statement? 

There's no formal metric; it's just my subjective judgement.

| It's of the ilk of your last claim that I was alone in making my
| side of this argument. 

That you *seem* to be alone *on this mailing list*.

| There is no "obvious" need except for the statements of several
| people, all of whom have specific, proprietary projects in mind.
| We're not talking about what people do within their proprietary
| projects (heck, I muck with XTM within Ceryle), we're talking about
| taking requirements from proprietary, custom projects and applying
| them back into the interchange syntax.

Yes, we are.
| Unfortunately, there's not even a large population of people that
| really look at the details or understand the implications of many
| things that pass by standards bodies. 

That's true, and I'm not sure what we can do about it. The more people
who read these drafts and comment on them, the happier I'll be. I
certainly point anyone who seems to have any kind of use for them
towards these drafts, but, well, it's not likely to make a big

* Lars Marius Garshol
| OK, but why do need the XHTML-stripping XSLT stylesheet, then?
* Murray Altheim
| Only to produce XTM 1.0 markup for processors that don't know what
| to do with XHTML+XTM.

OK. In that case I misunderstood you, and you would have had a
solution for those who would have been happy with XHTML+XTM. The
standard would have been exactly as complicated as what we are
proposing now, but you would have known what markup you could expect
to find inside inline base names and occurrences, and the problem of
interpreting the embedded XML would have gone away.

You would, however, have been left with an extensibility problem for
those people who need more than what XHTML provides.

But I admit what I said about your solution was based on a false

Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >