[sc34wg3] Almost arbitrary markup in resourceData

Murray Altheim sc34wg3@isotopicmaps.org
Wed, 12 Nov 2003 19:28:28 +0000


Lars Marius Garshol wrote:
> * Lars Marius Garshol
> |
> | I think that question is irrelevant, really. If there really were a
> | problem with embedding arbitrary markup in XTM the question would be
> | relevant, but I don't think there is, nor does anyone on the list
> | seem to think there is, except for you. And you can't even come up
> | with examples of potential problems or arguments for why simply
> | storing the markup is harmful.
> 
> * Murray Altheim
> | 
> | I think the problems in the RDF world are well known, as are they in
> | XML with XML Namespaces generally. I'm not going to enumerate those
> | here.
> 
> I don't think you have to, either, since we agree that those are
> problems. What I am asking you to do is to give examples or in some
> other way substantiate that those problems will apply to XTM 1.1 if
> XTM 1.1 allows arbitrary markup in <resourceData>. I don't think they
> apply in that case, but you clearly do. Why? That's what I'm asking
> you.

I don't see how arbitrary markup is in any way different than defining
a markup language that has big holes in it. Put it this way: if it had
been up to the majority of the W3C HTML WG, there would never have been
DTDs at all for XHTML. I fought alone for almost two years to convince
them that DTDs were necessary. Since then, they've been making the same
claim as you about things called markup languages that have these big
holes in them. I don't think they're markup languages at all if they
have holes, and I think it's fallacious to think that you can ignore,
store, or throw away any arbitrary, well-formed markup within the
document, as Eric has said, it's *part* of the document. And if XTM is
supposed to be an interchange syntax, you can't interchange with things
you don't know about. If Eric sends me one of his hybrid *TM documents,
with his embedded, proprietary markup, I can't use it. Sure, I can
"store" it. But I can't use that *TM document unless I support the
embedded markup. That's not interchange.

On the technical, syntax side, the embedded markup is going to have
to be completely specified, and legal via XML Namespaces. It probably
should have a wrapper element in its own namespace. If you bother to
look at Paul Grosso's XML Fragments draft at the W3C (and he is one
of the world's markup experts, no question), you'll see this is a
very complicated problem, not something where you can just stick
<xhtml:b> and expect to get away with it.

> (The two issues you list below are answers to that question, but see
> my replies.)
> 
> | As for potential problems, how about the fact that you could no
> | longer validate XTM? 
> 
> That's no problem. The RELAX-NG schema in the next XTM 1.1 draft will
> allow such validation. It won't validate the non-XTM markup, of
> course, but users can deal with that themselves, or ignore it.  

As with the W3C HTML WG, I left them over three years ago with the
group all enthusiastic about RELAX-NG and XML Schema. They were being
forced by the W3C staff to do XML Schema. Well, if you look at the
XHTML 2.0 draft, they're on their fifth incarnation and the number
of issues is still growing. They've dropped XML Schema entirely (and
I'm sure they're getting a lot of flak for that) and are using
RELAX-NG. But there are still a lot of issues with RELAX-NG. And
what about those systems that don't support RELAX-NG? Are you going
to require all Topic Map systems based in XML to use RELAX-NG if
they want validation? No DTD? Bad marketing decision, if you ask me.

 > Either way it won't hurt those who don't use XML inside XTM.

I don't follow this argument at all. "Won't hurt" means only those
who have no need to interchange with the rest of the Topic Map
community. If somebody sends you an XTM 1.0 document, you can at
least guarantee a minimal level of interchange. You lose that if
you require support for RELAX-NG and also require the ability to
properly handle proprietary markup within the documents. You have
created a two-tiered community, those who do and those who don't.

> | Or more generally, the fact that we're a small and fragile market
> | right now and that arguments for supposedly necessary features that
> | could enlarge our market might just as easily be met by arguments
> | against the destabilization you'll create by splintering that market
> | we currently have into XTM 1.0 and XTM 1.+.
> 
> I agree this is something that must be considered. We've already
> changed XTM by adding <instanceOf> inside <baseName>, but admittedly
> this is a more invasive change.

I'd say that it is an order of magnitude kind of change. I didn't have
to do much at all to implement <instanceOf> within <baseName>, whereas
my entire processing model is different if I have to do something
intelligent with whatever markup somebody sends me. I need probably
mappings for each possible XMLNS encountered, perhaps MIME support,
who knows?

> I've long been of the opinion that we shouldn't change XTM 1.0 at all
> if we could avoid it, but in the case of <baseName> we found no other
> solution to the TNC problem, and so we made the change.
> 
> Having made that change we've already done the splintering. Admittedly
> this change makes it bigger, and personally I would prefer not to make
> this change. What's changed my mind is that there is an obvious need
> for this, since so many people are doing this in different ways
> already.

Where's the metric on this last statement? It's of the ilk of your
last claim that I was alone in making my side of this argument. There
is no "obvious" need except for the statements of several people,
all of whom have specific, proprietary projects in mind. We're not
talking about what people do within their proprietary projects (heck,
I muck with XTM within Ceryle), we're talking about taking requirements
from proprietary, custom projects and applying them back into the
interchange syntax.

> It's a trade-off, but there seems to be a majority for trading off in
> this particular direction.
> 
> | The size of this group isn't large enough and the fact that we all
> | bring our own agendas means that neither you nor I are able to speak
> | for the needs of a general population, [...]
> 
> Of course not. That's why I brought it up in the ISO meeting, and the
> meeting ruled that we should do it. We've now discussed it one more
> time, and again there is a majority for making the change. 
> 
> What will happen now is that the change will go into the official ISO
> Committee Drafts, and that gives the wider user community a chance to
> voice its opinion on this in the ballot period. What more we can do to
> find out where people stand on this issue I don't really know, but
> what's clear is that I'm not going to drop this just because a single
> person objects.

No, I'd not expect you to. That's what standards bodies are for.
Unfortunately, there's not even a large population of people that
really look at the details or understand the implications of many
things that pass by standards bodies. My example of Microsoft deliberately
squeaking by corruptions into Unicode comes to mind.

> * Lars Marius Garshol
> |
> | As I've already explained to you: your proposed solution achieves
> | nothing, since it implies stripping the markup back out of the XTM
> | document before XTM processors can see it.
>  
> * Murray Altheim
> |
> | You've not implied correctly. The idea of having *one* hybrid markup
> | language would mean XTM processors do see it, but only see one kind
> | of markup, not arbitrary markup, and handled like any other XML. No
> | strange "storage", serialization and de-serialization of markup.
> 
> OK, but why do need the XHTML-stripping XSLT stylesheet, then?

Only to produce XTM 1.0 markup for processors that don't know what to
do with XHTML+XTM.

Murray

...........................................................................
Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

   Entitled Continuing Collateral Damage: the health and environmental
   costs of war on Iraq, the report estimates that between 22,000 and
   55,000 people - mainly Iraqi soldiers and civilians - died as a direct
   result of the war.
   http://news.bbc.co.uk/1/hi/world/middle_east/3259489.stm

   Entitled Continuing Collateral Damage? ...a euphemism for BushCo.