[sc34wg3] Almost arbitrary markup in resourceData

Lars Marius Garshol sc34wg3@isotopicmaps.org
12 Nov 2003 22:42:24 +0100

* Murray Altheim
| No, you understood me.

* Lars Marius Garshol
| If you do agree that leaves the issue of the problems with the
| embedded XML. Do you think it's better if people stick escaped
| markup in there instead of embedded XML? Or if they start using
| codes like [em]this[/em] to do what they want to do? People do this
| *already*, so clearly we have the choice between ugly hacks like
| that, and allowing markup in there.
* Murray Altheim
| If they do this already, they know they're breaking the standard,
| *and* any of this custom, non-XML markup obviously isn't breaking
| XML validation since it's not using reserved characters (I'm
| assuming, otherwise we're not talking XTM markup anyway).

I'm not sure I follow you here, Murray. When you talk about breaking
the standard you mean using unescaped non-XML markup inside
<resourceData>, right? People are doing that, but they are also doing
stuff like this:

  <resourceData><![CDATA[XTM is <em>really</em> cool.]]></resourceData>

  <resourceData>XTM is [em]really[/em] cool.</resourceData>

and what I was trying to say was I think the interpretation problems
with embedded XML are even worse with gunk like this, because with XML
you at least know it's XML. With this stuff you don't know if it's XML
or not, and you don't know how to process it. (Does the [] syntax
support attributes? Processing instructions? And so on.)

Given that we are going to have to deal with stuff like this whether
we allow embedded XML or not I think allowing embedded XML is the
lesser of the two evils. At least that gives us *some* control, and
allows for things like supporting validation of it through connecting
in schema/DTD fragments etc.

| I'd still say that people who use something like "[em]this[/em]" and
| send it out know it's not going to be correctly handled, so it's
| less of a problem than somebody who assumes that <xhtml:applet> *is*
| going to be handled. 

Well, that's only until you hit "[applet path=bing alt=blong]" or
something like it. (Note the absence of quotes and the absence of the
final slash. With a custom XML-like syntax you really lose badly on

| If somebody believes than any application that receives
| <xhtml:applet> is going to correctly handle it (and lord knows,
| they're out there, I can tell you from the HTML WG experience), then
| simply prohibiting it is IMO a better approach for a standard.  It
| doesn't set up false expectations of applet, SVG, whatever, being
| supported, and doesn't start up our own version of the browser wars.

What we could do, and probably even should do, is add a warning about
not relying on recipients supporting the semantics of the embedded XML
markup. I think that's probably all we can do, since if people want to
do this they'll find some way to do it even if we disallow it.
| Well, I'm glad you understand how mucky this really is. 

I'm not sure I'm so glad I do. This really is a pain, and I'm not
looking forward to dealing with it. :-(

| Most people don't consider how bad, how non-trivial it can be. They
| think you stick <xhtml:a> in there and suddently you have functional
| links, and they build applications around that kind of
| thinking. It's rampant in the W3C, this behaviour-attached-to-markup
| idea.

All true.
* Lars Marius Garshol
| What issues? (We've chosen to make the RELAX-NG schema the normative
| schema for XTM 1.1, so this is not a rhetorical question.)
* Murray Altheim
| Nothing intrinsic to RELAX-NG itself, but in how to reliably specify
| the grammar. 

Not sure what you mean. Are you referring to the lack of RELAX-NG
support, or something else?

| *All* XML processors know what to do with a DOCTYPE, 

All validating ones, yes. Not all XML processors are validating,

| whereas that's not true with RELAX-NG.

True. The point is that the schema takes over for prose. You're not
required to actually use the schema.
| I have sincerely the highest regard for both James Clark and Murata
| Makoto, and I think it was truly courageous for them to buck the
| trend and state publicly by their actions that XML Schema was not
| a good solution.

Seems we agree, then. :)
* Lars Marius Garshol
| The DTD is in Annex B. It just isn't normative.
* Murray Altheim
| If you can't specify the language in a normative DTD and rely on
| RELAX-NG so that you can open up what is ostensibly a "markup
| language" to having arbitrary markup, I think you've changed the
| definition of a markup language.  You know have a document format
| that is a markup language *plus* what is not a markup language,
| i.e., a grammar plus an unspecified grammar.

I think you are right: XTM is not a markup language in the traditional
sense, it's an interchange syntax for a non-XML data model. DocBook
and XHTML are markup languages, but I don't think XTM is. (Regardless
of whether we allow arbitrary embedded markup.)

In either case it doesn't matter. I don't think whether XTM is a
markup language or not is a relevant issue. The *consequences* of it
may be.
| Only of those who've piped up. There are a lot of people on this
| mailing list who've not said anything at all. Until there is a vote,
| that's no metric. As I said, the vocal minority never counts as a
| majority, myself included.

Well, we agree on this, I think.

| [XHTML + XTM] 
| Yes. I think that is very reasonable solution, and I maintain that
| would probably suit the 80/20 point for this problem. I have no
| metric for that either though. I just think it's a nice,
| conservative solution, but apparently wouldn't solve the proprietary
| markup part of the problem. I don't think there's a solution to
| proprietary markup within standards, by definition.

I can respect that opinion, but I don't agree with it.

Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >