[sc34wg3] Almost arbitrary markup in resourceData

Freese, Eric D. (LNG-DAY) sc34wg3@isotopicmaps.org
Tue, 11 Nov 2003 12:48:24 -0500

> Freese, Eric D. (LNG-DAY) wrote:
> > I would like to be able to send the non-XTM (but still XML) data to the
> > appropriate application and have something useful done with it.
> Murray Altheim wrote:
> Wouldn't we all. But that's a non-sequitor, unless you're 
> able to build
> an application that can correctly process all known and unknown markup
> languages, i.e., arbitrary markup.

By "all known and unknown markup languages", do you mean applications of XML
or something bigger than that?

You seem to be looking at this from the view that a generalized XTM tool
must *do* something with the additional markup.  I'm not.  The only
arbitrary markup I care about is in XML.  I want the XTM tool to pass the
well-formed XML within a node through so that a subsequent tool can do
something interesting to the non-XTM data within the topic map.  For
example, I could use an XTM tool to drive a web site and allow stylesheets
to do some even better stuff with the XML within the nodes.  What does the
reference model say about nodes that have XML in them?  My guess is that it
doesn't care.  So why should XTM?  XML provides an ability to shield off the
markup that matters to an XTM processor, so why should it care about the
node contents?

> Freese, Eric D. (LNG-DAY) wrote:
> > It has also been said that XTM is for interchange.  I agree.  However,
> > additional markup in my data represents some of the semantic information
> > my data - the same semantics I want to interchange. If I have to strip
> > semantic information out of my data for the sake of meeting some
> > requirement of an interchange standard, it might be considered a
> > that I don't necessarily need to live with.  Now this "interchange"
> > is telling me what of my semantic information I can and can't
> > So is it really a useful interchange standard?  One might wonder.  
> Murray Altheim wrote:
> That argument doesn't hold much water, in the sense that if  we consider
> the idea that Topic Maps should be able to interchange *any* semantics,
> i.e., any markup, and we then allow any markup language within XTM, such
> as MathML, SVG, XHTML, RSS, PDQ, XYZ (all manner of unknown markup), then
> there's no way that anyone can expect to know what to do with an XTM
> document. You have the same expectation in being able to correctly process
> XTM with arbitrary markup as you do being able to process arbitrary
> markup, which is just about zilch. XTM's PCDATA is a minimal exchange.
> You have to strip markup that you say bears semantic information to force
> it into PCDATA. Fair enough. That's the kind of compromise one must make
> in order to guarantee interchange. You don't use really rare and complex
> language when speaking with a non-native speaker either.

So now you appear to be saying that the topic map model is limited in the
semantics it can be used to interchange?  What does that do to the Newcomb &
Co. vision of "global knowledge federation"?  Has anyone bothered to
document these limitations to educate the public?  Is this really the right
group of people to determine what semantics can and cannot be interchanged?
When in France you speak French or hope you find someone that speaks your
language.  When processing data in a LexisNexis stream, we use LN markup.
Both ends of the process understand LN markup, and communication/processing
occurs.  Don't forget that people are using XML (and hopefully XTM) for
interchange between internal processes too.  
> Now, a third time's a charm -- I've offered to create an  XHTML+XTM DTD,
> which I think would be a reasonable compromise in the balance between
> interchange and complete anarchy. A fairly big step up from PCDATA, given
> that the entire Web has been satisfied with it. Now, I'm guessing there
> are those who will still demand complete anarchy, the ability to embed
> any kind of markup within an interchange syntax. They're hopefully in
> the 5-10% use case scenario. Allowing arbitrary markup within XTM is by
> definition creating a useless standard, a non-standard, a format that
> guarantees a complete lack of guarantee of interoperability.  That's what
> "arbitrary" means, eh? You can't have it both ways. You can't have the
> ability to do anything you like *and* have everyone be able to correctly
> understand it.

Thanks for the offer.  I think we should take you up on it.  It's a step in
the right direction.

Why do you assume that you must "understand" it?  If you have a topic map in
English and I have one in Swahili and some merging occurs based on the
well-defined rules for merging, you, most likely, cannot understand the
content of the data, correct?  Does that mean the new topic map is trash?
No.  So why would it be any different for additional markup?  It's just a
dialect that isn't understood, but yet has value.

Why can't we standardize the topic map-ish stuff and freely admit that we're
not going to touch anything outside of that domain?

Interoperability occurs when predictable things happen to the elements
within the XTM namespace.  I would expect that *within* LexisNexis,
interoperability would occur based on the other markup as well because
something predictable will happen to it as well.  I wouldn't expect Joe Blow
off the street to know what to do with the LN markup.  As a topic map owner,
I could translate the LN markup into XHTML or something more general or even
strip it before the topic map went out for public interchange.  But, as I
said, the markup is VERY useful in internal applications and that's where
the requirement comes from.

> So I'm guessing the XHTML+XTM DTD wouldn't do it?

Not for my use case.  We plan to use real XML with semantic tag names and
everything.  But I could see its use for those whose application is only