[sc34wg3] Almost arbitrary markup in resourceData

Murray Altheim sc34wg3@isotopicmaps.org
Wed, 12 Nov 2003 20:51:09 +0000

Lars Marius Garshol wrote:
> * Murray Altheim
> | 
> | [...] and I think it's fallacious to think that you can ignore,
> | store, or throw away any arbitrary, well-formed markup within the
> | document, as Eric has said, it's *part* of the document. 
> If you mean that storing the markup doesn't take away the problem of
> interpreting it when the time comes to, say, render the base name or
> occurrence, then we agree. If you mean something else I don't know
> what you mean, and I'm afraid you'll have to explain it to me.

No, you understood me. Regardless of the kind of markup (expanding
"markup" more along the lines of Eric's message to mean even
non-XML markup).

> | And if XTM is supposed to be an interchange syntax, you can't
> | interchange with things you don't know about. If Eric sends me one
> | of his hybrid *TM documents, with his embedded, proprietary markup,
> | I can't use it. Sure, I can "store" it. But I can't use that *TM
> | document unless I support the embedded markup. That's not
> | interchange.
> Let's try to be more precise here. You can process and interpret the
> XTM parts without problems, which will give you a topic map with
> embedded XML in it. What you may have difficulty with is processing
> the XML in base names, variant names, and occurrences. Do you agree?


> If you do agree that leaves the issue of the problems with the
> embedded XML. Do you think it's better if people stick escaped markup
> in there instead of embedded XML? Or if they start using codes like
> [em]this[/em] to do what they want to do? People do this *already*, so
> clearly we have the choice between ugly hacks like that, and allowing
> markup in there.

If they do this already, they know they're breaking the standard, *and*
any of this custom, non-XML markup obviously isn't breaking XML
validation since it's not using reserved characters (I'm assuming,
otherwise we're not talking XTM markup anyway).

> I think that the problem of knowing how to process the string data
> that you receive will be *bigger* if we *don't* allow XML markup,
> since then we won't know what hack people have chosen to represent
> their XML, whereas if we do allow the markup at least we'll know it's
> XML when we see it.

I'd still say that people who use something like "[em]this[/em]" and
send it out know it's not going to be correctly handled, so it's
less of a problem than somebody who assumes that <xhtml:applet> *is*
going to be handled. If somebody believes than any application that
receives <xhtml:applet> is going to correctly handle it (and lord
knows, they're out there, I can tell you from the HTML WG experience),
then simply prohibiting it is IMO a better approach for a standard.
It doesn't set up false expectations of applet, SVG, whatever, being
supported, and doesn't start up our own version of the browser wars.

> | On the technical, syntax side, the embedded markup is going to have
> | to be completely specified, and legal via XML Namespaces. It
> | probably should have a wrapper element in its own namespace. If you
> | bother to look at Paul Grosso's XML Fragments draft at the W3C (and
> | he is one of the world's markup experts, no question), you'll see
> | this is a very complicated problem, not something where you can just
> | stick <xhtml:b> and expect to get away with it.
> Yeah, I know. That's why Graham and I are supposed to write up two
> different proposals so that the committee can evaluate them. We're
> already painfully aware that this is difficult. Dmitry had a pretty
> good list of some of the issues in his posting.

Well, I'm glad you understand how mucky this really is. Most people
don't consider how bad, how non-trivial it can be. They think you
stick <xhtml:a> in there and suddently you have functional links,
and they build applications around that kind of thinking. It's rampant
in the W3C, this behaviour-attached-to-markup idea.

> | But there are still a lot of issues with RELAX-NG.
> What issues? (We've chosen to make the RELAX-NG schema the normative
> schema for XTM 1.1, so this is not a rhetorical question.)

Nothing intrinsic to RELAX-NG itself, but in how to reliably specify
the grammar. *All* XML processors know what to do with a DOCTYPE,
whereas that's not true with RELAX-NG.

> Personally, I'd like to go on record and say that RELAX-NG is far and
> away the coolest technology I've seen for the past four years (since
> XSLT came out and I learned about topic maps, basically). It is
> simple, powerful, and extraordinarily beautiful. The more I see of it,
> the more I love it. If we can do half as well in our first attempt at
> TMCL we shall consider ourselves very lucky indeed.

I have sincerely the highest regard for both James Clark and Murata
Makoto, and I think it was truly courageous for them to buck the
trend and state publicly by their actions that XML Schema was not
a good solution.

> | And what about those systems that don't support RELAX-NG?
> One reason for choosing RELAX-NG as the normative schema is that it
> allows us to say "a <topicMap> element is valid if it conforms to this
> RELAX-NG schema" and be done. We don't require XTM implementations to
> do anything, except to verify that the document follows the schema. If
> they do that by using a DTD validator that's fine. If they do it with
> custom validation code that's also fine. We just use the schema as a
> replacement for prose.
> | Are you going to require all Topic Map systems based in XML to use
> | RELAX-NG if they want validation? 
> No. That would be suicide.
> | No DTD?
> The DTD is in Annex B. It just isn't normative.

If you can't specify the language in a normative DTD and rely on RELAX-NG
so that you can open up what is ostensibly a "markup language" to having
arbitrary markup, I think you've changed the definition of a markup language.
You know have a document format that is a markup language *plus* what is
not a markup language, i.e., a grammar plus an unspecified grammar.

> | Bad marketing decision, if you ask me.
> That's possible. We chose RELAX-NG because it allowed us to have a
> clean XML policy without getting involved in the brokenness of XML 1.0
> and whether people use validating or non-validating parsers and all
> that. We also thought it might be politically correct, but the
> technical argument was what tipped the scales.
> | What's changed my mind is that there is an obvious need for this,
> | since so many people are doing this in different ways already.
> * Murray Altheim
> |
> | Where's the metric on this last statement? 
> There's no formal metric; it's just my subjective judgement.
> | It's of the ilk of your last claim that I was alone in making my
> | side of this argument. 
> That you *seem* to be alone *on this mailing list*.

Only of those who've piped up. There are a lot of people on this
mailing list who've not said anything at all. Until there is a
vote, that's no metric. As I said, the vocal minority never counts
as a majority, myself included.

> | There is no "obvious" need except for the statements of several
> | people, all of whom have specific, proprietary projects in mind.
> | We're not talking about what people do within their proprietary
> | projects (heck, I muck with XTM within Ceryle), we're talking about
> | taking requirements from proprietary, custom projects and applying
> | them back into the interchange syntax.
> Yes, we are.
> | Unfortunately, there's not even a large population of people that
> | really look at the details or understand the implications of many
> | things that pass by standards bodies. 
> That's true, and I'm not sure what we can do about it. The more people
> who read these drafts and comment on them, the happier I'll be. I
> certainly point anyone who seems to have any kind of use for them
> towards these drafts, but, well, it's not likely to make a big
> difference.

The approach to standards, which I'm sure Jim will agree with, is
to *always* take a conservative approach.

> * Lars Marius Garshol
> |
> | OK, but why do need the XHTML-stripping XSLT stylesheet, then?
> * Murray Altheim
> |
> | Only to produce XTM 1.0 markup for processors that don't know what
> | to do with XHTML+XTM.
> OK. In that case I misunderstood you, and you would have had a
> solution for those who would have been happy with XHTML+XTM. The
> standard would have been exactly as complicated as what we are
> proposing now, but you would have known what markup you could expect
> to find inside inline base names and occurrences, and the problem of
> interpreting the embedded XML would have gone away.

Yes. I think that is very reasonable solution, and I maintain that
would probably suit the 80/20 point for this problem. I have no
metric for that either though. I just think it's a nice, conservative
solution, but apparently wouldn't solve the proprietary markup part
of the problem. I don't think there's a solution to proprietary
markup within standards, by definition.

> You would, however, have been left with an extensibility problem for
> those people who need more than what XHTML provides.


> But I admit what I said about your solution was based on a false
> assumption.



Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

   Entitled Continuing Collateral Damage: the health and environmental
   costs of war on Iraq, the report estimates that between 22,000 and
   55,000 people - mainly Iraqi soldiers and civilians - died as a direct
   result of the war.

   Entitled Continuing Collateral Damage? ...a euphemism for BushCo.