[sc34wg3] Almost arbitrary markup in resourceData

Murray Altheim sc34wg3@isotopicmaps.org
Tue, 11 Nov 2003 18:32:54 +0000


Freese, Eric D. (LNG-DAY) wrote:
>>Freese, Eric D. (LNG-DAY) wrote:
> [...]
>>>I would like to be able to send the non-XTM (but still XML) data to the
>>>appropriate application and have something useful done with it.
>>
>>Murray Altheim wrote:
>>Wouldn't we all. But that's a non-sequitor, unless you're 
>>able to build
>>an application that can correctly process all known and unknown markup
>>languages, i.e., arbitrary markup.
> 
> By "all known and unknown markup languages", do you mean applications of XML
> or something bigger than that?

I perhaps should have been clearer. We had been talking about arbitrary
XML. I meant arbitrary XML markup, unknown XML markup, ad hoc XML markup.

> You seem to be looking at this from the view that a generalized XTM tool
> must *do* something with the additional markup.  I'm not.  The only
> arbitrary markup I care about is in XML.  I want the XTM tool to pass the
> well-formed XML within a node through so that a subsequent tool can do
> something interesting to the non-XTM data within the topic map.

In your last message, you said "something useful." This time, "something
interesting." But if you don't know beforehand what kind of markup you're
going to receive, how are you going to handle it?

 > For
> example, I could use an XTM tool to drive a web site and allow stylesheets
> to do some even better stuff with the XML within the nodes.  What does the
> reference model say about nodes that have XML in them?  My guess is that it
> doesn't care.  So why should XTM?  XML provides an ability to shield off the
> markup that matters to an XTM processor, so why should it care about the
> node contents?
> [...]

The reference model talks about Topic Maps. XTM is a specific interchange
syntax for Topic Maps. The RM is defined in terms of abstractions that
have nothing to do with XML, so it says nothing about the constraints
incurred by using XML. You can create a Topic Map using napkins and felt
markers and the RM says how that all works.

The idea that XML provides an ability to "shield off" markup that matters
is where the problem arises -- the non-XTM stuff is left hanging in some
nether world of unknown semantics, unknown processing, unknown application
handling.

>>Freese, Eric D. (LNG-DAY) wrote:
>>
>>>It has also been said that XTM is for interchange.  I agree.  However,
> the
>>>additional markup in my data represents some of the semantic information
> of
>>>my data - the same semantics I want to interchange. If I have to strip
> this
>>>semantic information out of my data for the sake of meeting some
> arbitrary
>>>requirement of an interchange standard, it might be considered a
> limitation
>>>that I don't necessarily need to live with.  Now this "interchange"
> standard
>>>is telling me what of my semantic information I can and can't
> interchange.
>>>So is it really a useful interchange standard?  One might wonder.  

One might. I don't. If you require a standard interchange syntax to be
able to allow *everyone* to include *everything*, how can that possibly
still be considered an interchange syntax?

>>Murray Altheim wrote:
>>That argument doesn't hold much water, in the sense that if  we consider
>>the idea that Topic Maps should be able to interchange *any* semantics,
>>i.e., any markup, and we then allow any markup language within XTM, such
>>as MathML, SVG, XHTML, RSS, PDQ, XYZ (all manner of unknown markup), then
>>there's no way that anyone can expect to know what to do with an XTM
>>document. You have the same expectation in being able to correctly process
>>XTM with arbitrary markup as you do being able to process arbitrary
>>markup, which is just about zilch. XTM's PCDATA is a minimal exchange.
>>You have to strip markup that you say bears semantic information to force
>>it into PCDATA. Fair enough. That's the kind of compromise one must make
>>in order to guarantee interchange. You don't use really rare and complex
>>language when speaking with a non-native speaker either.
> 
> So now you appear to be saying that the topic map model is limited in the
> semantics it can be used to interchange?  What does that do to the Newcomb &
> Co. vision of "global knowledge federation"?  Has anyone bothered to
> document these limitations to educate the public?  Is this really the right
> group of people to determine what semantics can and cannot be interchanged?
> When in France you speak French or hope you find someone that speaks your
> language.  When processing data in a LexisNexis stream, we use LN markup.
> Both ends of the process understand LN markup, and communication/processing
> occurs.  Don't forget that people are using XML (and hopefully XTM) for
> interchange between internal processes too.  

You're misinterpreting what I said, and I have difficulty believing
you think that's what I meant. The "semantics" (gad how I hate that
word) of text (e.g., PCDATA) is completely unlimited. We have hundreds
of thousands of books with plenty of "semantics" expressed in text. The
Topic Map paradigm does not rely on your ability to handle Lexus Nexus
in some custom application to still work. You can create custom Topic
Map applications all you like. The whole point of the last few years'
work has been to develop a means of establishing what a Topic Map
Application is. I'm arguing for purity of the interchange syntax, not
trying to stop you from creating your own TMAs. Global knowledge
interchange doesn't happen when you allow proprietary markup that
obscures the ability to interchange, it creates islands of proprietary
functionality. It runs completely counter to Newcomb & Co.'s vision.

>>Now, a third time's a charm -- I've offered to create an  XHTML+XTM DTD,
>>which I think would be a reasonable compromise in the balance between
>>interchange and complete anarchy. A fairly big step up from PCDATA, given
>>that the entire Web has been satisfied with it. Now, I'm guessing there
>>are those who will still demand complete anarchy, the ability to embed
>>any kind of markup within an interchange syntax. They're hopefully in
>>the 5-10% use case scenario. Allowing arbitrary markup within XTM is by
>>definition creating a useless standard, a non-standard, a format that
>>guarantees a complete lack of guarantee of interoperability.  That's what
>>"arbitrary" means, eh? You can't have it both ways. You can't have the
>>ability to do anything you like *and* have everyone be able to correctly
>>understand it.
> 
> Thanks for the offer.  I think we should take you up on it.  It's a step in
> the right direction.

Well, I'm only interested if it has any meaning. If those controlling
the further development of the standard decide to allow arbitrary XML
markup, there's little need for an XHTML+XTM DTD. The game's up at that
point.

> Why do you assume that you must "understand" it?  If you have a topic map in
> English and I have one in Swahili and some merging occurs based on the
> well-defined rules for merging, you, most likely, cannot understand the
> content of the data, correct?  Does that mean the new topic map is trash?
> No.  So why would it be any different for additional markup?  It's just a
> dialect that isn't understood, but yet has value.

By "understand" it I mean that a Topic Map application can receive *any*
XTM document and be guaranteed of the ability to correctly process the
content, unambiguously, without having to throw away, "store", hide,
spindle, mutilate, or otherwise incorrectly handle it. If that means that
XTM documents can't directly embed Lexus-Nexus content, that's absolutely
fine. XTM is meant as a standard, interchange format. Your application is
one of thousands. We can't accommodate everyone's pet project in a
standard (it was tried with HTML).

> Why can't we standardize the topic map-ish stuff and freely admit that we're
> not going to touch anything outside of that domain?

What does that mean? "not touch"? If I send you an XTM document today,
it will function essentially the same within any XTM-compliant application.
I can open Steve's Opera Topic Map with Ceryle. If suddenly the freedom
to embed any proprietary markup within XTM exists, my application now
has to deal with whatever the hell happens to be there.

> Interoperability occurs when predictable things happen to the elements
> within the XTM namespace.

Untrue. Interoperability occurs when predictable things happen to
the elements within XTM *documents*. Allowing non-XTM content means
that application A differs from application B differs from application
C in handling any given document containing markup that A, B, and C
can correctly process. If application B differs from A and C in
being able to handle MathML, users of A and C don't have the same
experience as users of B. That's PRECISELY the problem with Microsoft's
approach to software. Hell, they managed to chair a Unicode committee
and added character-level codes for bold and italic and positioning
into Unicode, so that only Microsoft applications (or vendors who
were willing to do what Microsoft did) would "correctly" process those
weird-ass codes.

 > I would expect that *within* LexisNexis,
> interoperability would occur based on the other markup as well because
> something predictable will happen to it as well.  I wouldn't expect Joe Blow
> off the street to know what to do with the LN markup.  As a topic map owner,
> I could translate the LN markup into XHTML or something more general or even
> strip it before the topic map went out for public interchange.  But, as I
> said, the markup is VERY useful in internal applications and that's where
> the requirement comes from.

So use it in internal applications. But don't expect a standardized
interchange syntax to allow it. Where's the logic in that?

>>So I'm guessing the XHTML+XTM DTD wouldn't do it?
> 
> Not for my use case.  We plan to use real XML with semantic tag names and
> everything.  But I could see its use for those whose application is only
> presentation.

I find it only humourous that you somehow think "real XML" has "semantic"
tag names (whatever that means), and by inference that XTM is perhaps
not "real XML". Eric, you're too much of a markup expert to seriously
mean that. Get real. SVG, MathML, all function because they are distinct
markup languages. What I hear now is that some people don't want markup
languages, they want arbitrary XML markup, i.e., no restrictions. This
sounds like Dave Raggett talking, not you. Few of the markup experts I've
talked to in the last five years (including about half of the original
XML WG) think XML Namespaces is anything but a colossal failure. If
that's "real XML" I prefer the unreal.

Murray

...........................................................................
Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

   Entitled Continuing Collateral Damage: the health and environmental
   costs of war on Iraq, the report estimates that between 22,000 and
   55,000 people - mainly Iraqi soldiers and civilians - died as a direct
   result of the war.
   http://news.bbc.co.uk/1/hi/world/middle_east/3259489.stm

   Entitled Continuing Collateral Damage? ...a euphemism for BushCo.