[sc34wg3] Almost arbitrary markup in resourceData

Graham Moore sc34wg3@isotopicmaps.org
Thu, 13 Nov 2003 15:44:40 +0100


Murray, quoting Eric, said something in this post which got me thinking
about this embedded XML thingy.

 "I think it's fallacious to think that you can ignore, store, or throw =
away
any arbitrary, well-formed markup within the document, as Eric has said,
it's *part* of the document."=20

When I think about topic maps the last thing I ever really consider is =
the
syntax, the XTM. I see topic maps as some abstract model that references
real stuff, like documents. I don't consider an XTM document as a
'document'. I see it as the serialization of some well defined data =
model.
When XTM is processed it is turned into this model (although not always
explicitly).

In this model serialization (XTM) there are some elements that are used =
to
key the creation of the abstract constructs. These are clearly defined -
i.e. topic, association elements.=20

ResourceData semantically says to me - the stuff that follows is the
contents of a resource which has no identity (like an anonymous =
resource).
And whatever that TM processor wants to do with this local resource is =
up to
it. But it has no meaning within the TM model.=20

In XTM we don't restrict the kinds of things that occurrences can =
reference,
nor do we limit the resources that can be represented by topics. For me
then, conceptually, it follows that having any arbitrary markup in there =
is
no big deal.

Now there maybe some practical issues with processing an all XML =
document
but I hope that conceptually we could agree on what we mean when we use
resourceData and thus see a small, discernable but ultimately irrelevant
distinction between resRef and resData.
=20
Cheers,

gra

----------------------------------------------------------------
Graham Moore, Ontopian            moore@ontopia.net
GSM: +47 926 82 437           http://www.ontopia.net


-----Original Message-----
From: sc34wg3-admin@isotopicmaps.org =
[mailto:sc34wg3-admin@isotopicmaps.org]
On Behalf Of Murray Altheim
Sent: 12 November 2003 20:28
To: sc34wg3@isotopicmaps.org

Lars Marius Garshol wrote:
> * Lars Marius Garshol
> |
> | I think that question is irrelevant, really. If there really were a=20
> | problem with embedding arbitrary markup in XTM the question would be =

> | relevant, but I don't think there is, nor does anyone on the list=20
> | seem to think there is, except for you. And you can't even come up=20
> | with examples of potential problems or arguments for why simply=20
> | storing the markup is harmful.
>=20
> * Murray Altheim
> |=20
> | I think the problems in the RDF world are well known, as are they in =

> | XML with XML Namespaces generally. I'm not going to enumerate those=20
> | here.
>=20
> I don't think you have to, either, since we agree that those are=20
> problems. What I am asking you to do is to give examples or in some=20
> other way substantiate that those problems will apply to XTM 1.1 if=20
> XTM 1.1 allows arbitrary markup in <resourceData>. I don't think they=20
> apply in that case, but you clearly do. Why? That's what I'm asking=20
> you.

I don't see how arbitrary markup is in any way different than defining a
markup language that has big holes in it. Put it this way: if it had =
been up
to the majority of the W3C HTML WG, there would never have been DTDs at =
all
for XHTML. I fought alone for almost two years to convince them that =
DTDs
were necessary. Since then, they've been making the same claim as you =
about
things called markup languages that have these big holes in them. I =
don't
think they're markup languages at all if they have holes, and I think =
it's
fallacious to think that you can ignore, store, or throw away any =
arbitrary,
well-formed markup within the document, as Eric has said, it's *part* of =
the
document. And if XTM is supposed to be an interchange syntax, you can't
interchange with things you don't know about. If Eric sends me one of =
his
hybrid *TM documents, with his embedded, proprietary markup, I can't use =
it.
Sure, I can "store" it. But I can't use that *TM document unless I =
support
the embedded markup. That's not interchange.

On the technical, syntax side, the embedded markup is going to have to =
be
completely specified, and legal via XML Namespaces. It probably should =
have
a wrapper element in its own namespace. If you bother to look at Paul
Grosso's XML Fragments draft at the W3C (and he is one of the world's =
markup
experts, no question), you'll see this is a very complicated problem, =
not
something where you can just stick <xhtml:b> and expect to get away with =
it.

> (The two issues you list below are answers to that question, but see=20
> my replies.)
>=20
> | As for potential problems, how about the fact that you could no=20
> | longer validate XTM?
>=20
> That's no problem. The RELAX-NG schema in the next XTM 1.1 draft will=20
> allow such validation. It won't validate the non-XTM markup, of=20
> course, but users can deal with that themselves, or ignore it.

As with the W3C HTML WG, I left them over three years ago with the group =
all
enthusiastic about RELAX-NG and XML Schema. They were being forced by =
the
W3C staff to do XML Schema. Well, if you look at the XHTML 2.0 draft,
they're on their fifth incarnation and the number of issues is still
growing. They've dropped XML Schema entirely (and I'm sure they're =
getting a
lot of flak for that) and are using RELAX-NG. But there are still a lot =
of
issues with RELAX-NG. And what about those systems that don't support
RELAX-NG? Are you going to require all Topic Map systems based in XML to =
use
RELAX-NG if they want validation? No DTD? Bad marketing decision, if you =
ask
me.

 > Either way it won't hurt those who don't use XML inside XTM.

I don't follow this argument at all. "Won't hurt" means only those who =
have
no need to interchange with the rest of the Topic Map community. If =
somebody
sends you an XTM 1.0 document, you can at least guarantee a minimal =
level of
interchange. You lose that if you require support for RELAX-NG and also
require the ability to properly handle proprietary markup within the
documents. You have created a two-tiered community, those who do and =
those
who don't.

> | Or more generally, the fact that we're a small and fragile market=20
> | right now and that arguments for supposedly necessary features that=20
> | could enlarge our market might just as easily be met by arguments=20
> | against the destabilization you'll create by splintering that market =

> | we currently have into XTM 1.0 and XTM 1.+.
>=20
> I agree this is something that must be considered. We've already=20
> changed XTM by adding <instanceOf> inside <baseName>, but admittedly=20
> this is a more invasive change.

I'd say that it is an order of magnitude kind of change. I didn't have =
to do
much at all to implement <instanceOf> within <baseName>, whereas my =
entire
processing model is different if I have to do something intelligent with
whatever markup somebody sends me. I need probably mappings for each
possible XMLNS encountered, perhaps MIME support, who knows?

> I've long been of the opinion that we shouldn't change XTM 1.0 at all=20
> if we could avoid it, but in the case of <baseName> we found no other=20
> solution to the TNC problem, and so we made the change.
>=20
> Having made that change we've already done the splintering. Admittedly =

> this change makes it bigger, and personally I would prefer not to make =

> this change. What's changed my mind is that there is an obvious need=20
> for this, since so many people are doing this in different ways=20
> already.

Where's the metric on this last statement? It's of the ilk of your last
claim that I was alone in making my side of this argument. There is no
"obvious" need except for the statements of several people, all of whom =
have
specific, proprietary projects in mind. We're not talking about what =
people
do within their proprietary projects (heck, I muck with XTM within =
Ceryle),
we're talking about taking requirements from proprietary, custom =
projects
and applying them back into the interchange syntax.

> It's a trade-off, but there seems to be a majority for trading off in=20
> this particular direction.
>=20
> | The size of this group isn't large enough and the fact that we all=20
> | bring our own agendas means that neither you nor I are able to speak =

> | for the needs of a general population, [...]
>=20
> Of course not. That's why I brought it up in the ISO meeting, and the=20
> meeting ruled that we should do it. We've now discussed it one more=20
> time, and again there is a majority for making the change.
>=20
> What will happen now is that the change will go into the official ISO=20
> Committee Drafts, and that gives the wider user community a chance to=20
> voice its opinion on this in the ballot period. What more we can do to =

> find out where people stand on this issue I don't really know, but=20
> what's clear is that I'm not going to drop this just because a single=20
> person objects.

No, I'd not expect you to. That's what standards bodies are for.
Unfortunately, there's not even a large population of people that really
look at the details or understand the implications of many things that =
pass
by standards bodies. My example of Microsoft deliberately squeaking by
corruptions into Unicode comes to mind.

> * Lars Marius Garshol
> |
> | As I've already explained to you: your proposed solution achieves=20
> | nothing, since it implies stripping the markup back out of the XTM=20
> | document before XTM processors can see it.
> =20
> * Murray Altheim
> |
> | You've not implied correctly. The idea of having *one* hybrid markup =

> | language would mean XTM processors do see it, but only see one kind=20
> | of markup, not arbitrary markup, and handled like any other XML. No=20
> | strange "storage", serialization and de-serialization of markup.
>=20
> OK, but why do need the XHTML-stripping XSLT stylesheet, then?

Only to produce XTM 1.0 markup for processors that don't know what to do
with XHTML+XTM.

Murray

.........................................................................=
..
Murray Altheim                         =
http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                   =
 .

   Entitled Continuing Collateral Damage: the health and environmental
   costs of war on Iraq, the report estimates that between 22,000 and
   55,000 people - mainly Iraqi soldiers and civilians - died as a =
direct
   result of the war.
   http://news.bbc.co.uk/1/hi/world/middle_east/3259489.stm

   Entitled Continuing Collateral Damage? ...a euphemism for BushCo.

_______________________________________________
sc34wg3 mailing list
sc34wg3@isotopicmaps.org
http://www.isotopicmaps.org/mailman/listinfo/sc34wg3