[sc34wg3] Almost arbitrary markup in resourceData

Murray Altheim sc34wg3@isotopicmaps.org
Tue, 11 Nov 2003 15:27:57 +0000

Freese, Eric D. (LNG-DAY) wrote:
> OK, in earlier instances of this thread there was a mention made of user
> requirements.  Let me, as a user, explain why LexisNexis and Reed Elsevier
> see arbitrary markup added to a LIMITED number of places within XTM (i.e.
> names and resourceData) as an improvement to the standard.
> As I see it, the content within resourceData and names is specifically for
> human consumption, not a topic map processor.  [This could, I believe, be
> said of any PCDATA within an XML topic map, but that is another discussion.
> When the TCN is not in effect, doesn't all the cool stuff happen in the
> attributes?]]  If the standard were changed to allow well-formed XML within
> these places, to make it more useable for humans, then I see that as a win.
> Any markup outside the XTM namespace is not to be processed by an XTM
> processor, just stored/passed through/not touched/whatever as part of the
> data - end of requirement.  Do I expect XTM procesors to know what to do
> with XHTML?  No.  Do I expect XTM processors to know what to do with
> arbitrary XML?  No.  Do I expect XTM processors to process XTM?  Yes.  XTM
> is a tool in my toolbox, but I have others that also have specific purposes.
> I would like to be able to send the non-XTM (but still XML) data to the
> appropriate application and have something useful done with it.

Wouldn't we all. But that's a non-sequitor, unless you're able to build
an application that can correctly process all known and unknown markup
languages, i.e., arbitrary markup.

> I really
> don't think that is muddying up the waters that much.  Also, if a tool DID
> try to do something with the additional markup, I'd be highly skeptical.  An
> API on the other hand might be a cool thing to differentiate products.  But
> I will buy a topic map processor, first and foremost, on its topic map
> functionality, not the bells and whistles.
> It has also been said that XTM is for interchange.  I agree.  However, the
> additional markup in my data represents some of the semantic information of
> my data - the same semantics I want to interchange. If I have to strip this
> semantic information out of my data for the sake of meeting some arbitrary
> requirement of an interchange standard, it might be considered a limitation
> that I don't necessarily need to live with.  Now this "interchange" standard
> is telling me what of my semantic information I can and can't interchange.
> So is it really a useful interchange standard?  One might wonder.  

That argument doesn't hold much water, in the sense that if we consider
the idea that Topic Maps should be able to interchange *any* semantics,
i.e., any markup, and we then allow any markup language within XTM, such
as MathML, SVG, XHTML, RSS, PDQ, XYZ (all manner of unknown markup), then
there's no way that anyone can expect to know what to do with an XTM
document. You have the same expectation in being able to correctly process
XTM with arbitrary markup as you do being able to process arbitrary
markup, which is just about zilch. XTM's PCDATA is a minimal exchange.
You have to strip markup that you say bears semantic information to force
it into PCDATA. Fair enough. That's the kind of compromise one must make
in order to guarantee interchange. You don't use really rare and complex
language when speaking with a non-native speaker either.

Now, a third time's a charm -- I've offered to create an XHTML+XTM DTD,
which I think would be a reasonable compromise in the balance between
interchange and complete anarchy. A fairly big step up from PCDATA, given
that the entire Web has been satisfied with it. Now, I'm guessing there
are those who will still demand complete anarchy, the ability to embed
any kind of markup within an interchange syntax. They're hopefully in
the 5-10% use case scenario. Allowing arbitrary markup within XTM is by
definition creating a useless standard, a non-standard, a format that
guarantees a complete lack of guarantee of interoperability. That's what
"arbitrary" means, eh? You can't have it both ways. You can't have the
ability to do anything you like *and* have everyone be able to correctly
understand it.

So I'm guessing the XHTML+XTM DTD wouldn't do it?


Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

   Entitled Continuing Collateral Damage: the health and environmental
   costs of war on Iraq, the report estimates that between 22,000 and
   55,000 people - mainly Iraqi soldiers and civilians - died as a direct
   result of the war.

   Entitled Continuing Collateral Damage? ...a euphemism for BushCo.