[sc34wg3] Editors' drafts of TMDM and XTM 1.1

Murray Altheim sc34wg3@isotopicmaps.org
Wed, 11 Jan 2006 02:31:33 +0000


[I've had this sitting unfinished for a week or so, but I'm not
going to be able to devote any time to it in the next week, so
here it is, unvarnished glory or whatever. I haven't had a chance
to look over Steve P.'s recent message entitled "XTM 2.0" yet,
just hitting Send...]

Lars Marius Garshol wrote:
> * Murray Altheim
> 
>>If the sole good reason for maintaining that the new markup be
>>called "XTM" is a marketing one, I have little else to say on
>>the matter. Markup abuse is a form of semantic violence -- if
>>it's done in the name of marketing, so be it.
> 
> The first question that has to be settled is whether the new XTM  
> really is so different from the old. If it's not, then *that* is the  
> reason to keep the name. Personally, I think the differences are very  
> minor, semantically. The biggest differences are in naming, and in  
> that processing is now much simpler than it was.

The following message was edited over more than a week, so it's
not entirely succinct. In looking over the changes, they are not
minor, they are not merely regarding naming, and if the processing
is simpler, it's also less specified, more arbitrary, and now
relies upon external specifications that are either unfinished,
more complicated, or bring in unnecessary and unwanted semantics.
For example, by dumping XLink we've now entirely lost the linking
model and not specified a replacement, leaving linking semantics
and behaviour unspecified, just some hand waving text about links.

> The list of differences is long, but the differences are all small,  
> and most of them are made at the fringes of XTM, to remove features  
> much hated by all implementors, and used by no user in the history of  
> the syntax. (I'm referring to things like[...]

In pulling the list from the XTM 1.1 draft:

> The differences are: 

> The namespace URI has changed.

Yes, absolutely.

> The version attribute has been added to the topicMap element. 

There was no 'version' attribute in XTM 1.0, as the XML namespace
identifier worked for that. I'm not sure why a 'version' attribute
is really necessary, as it's redundant with the XML namespace. If
the two are in conflict, what happens? Not specified.

> The parameters element has been replaced by scope. 

I'm not sure if people remember the reason we didn't use 'scope' in
the first place, but I thought there was some pretty good reasons,
i.e., that 'scope' isn't the proper term to describe parameters that
alter variants -- it's an entirely different function and 'scope' is
being overloaded here.

> The roleSpec element has been replaced by instanceOf. 

Huh? This is considered a simplification or are we deliberately
abusing the terminology? We need a specification of the role in
an Association, not what the member is an instance of.

> The member element has been replaced by role. 

We have an Association that has no members, only roles? This
sounds like Associations are now being limited to not modeling
or containing actual Topics, only classes of Topics.

> A single topic reference is now required as the child of role. 

So we can no longer model Associations between groups of Topics,
and have specifically decided to limit Associations to being
bipartite (relations with only two members).

> The baseName element has been replaced by topicName. 

Throwing away the entire baseName structure is a minor change?

> The instanceOf element is now allowed inside topicName. 

Creating an entirely new inner structure within Topic names
is a minor change?

> The variantName and subjectIdentity elements have been removed. 

A simplification that removes a container.

> The variant element can no longer be nested. 

While some people might have found this odd, the hierarchy of
variants did allow selection of a specific Topic name based on
a specific set of accumulated variant parameters. If this
features (which is decidedly complicated) is being removed,
that's hardly a 0.x version change. That removes an entire
feature of a language.

> The instanceOf element is now required inside occurrence, 
   > association, and role.

So forget authoring. I often store an XTM document prior to
it having all the <instanceOf> elements in place, as I often
populate the Topic Map with content prior to adding in all
the class information. This would preclude my ability to store
my XTM documents as valid XTM.

This kind of thing should be described in a higher-level schema,
not at the syntax level.

> The mergeMap element no longer supports added scope. 

Ugh. So now when I merge in another XTM document I have no
ability to un-merge it or determine where a Topic comes from?

This is a minor change?  Uh -- no, not for anyone who actually
uses this feature.

> The id attribute has been removed from all elements except topic,
> and the reifies attribute has been added on some elements. 

While some people may not see the *need* for ID on all elements,
it never hurt to have it. The presence of an ID doesn't mean
that any element must be reifiable, it means that the ID can
be used by things like XSLT stylesheets and other syntax-level
processes to canonically identify an XML element within the
document. Removing that ability likely means that processors
would need to rely on things like XPath to do certain kinds of
processes, and given that XTM is a bag and not a sequence, this
would likely mean that some processes would no longer be
possible.

This did no harm and should be reconsidered. If there are some
concerned about reification of those elements, make a list of
those that can be reified and include it in the prose.

> The itemIdentity, subjectLocator, and subjectIdentifier elements
> have been added. 

No problem? Just a minor change?  You've made substantive changes
to the whole way that identity and reification are being managed,
and that's a point version change? No.

> The subjectIndicatorRef and resourceRef elements have been removed. 

I wasn't aware of the reasons why but I see that we've decided to
completely revamp the whole subject identity handling. One of the
advantages of XTM over RDF was the ability to characterize the
relation between the reference and the referenced entity. I see
that the ISO committee no longer sees this as important? That
harmonization with RDF was the reason?  Hard to understand, or at
least hard to understand this as a point version change. Very
profound.

> XTM no longer uses XLink and XML Base.

Cripe! Really??????? Well, you've just lost the linking model, and
the ability to state the canonical location of an XTM document. In
the former case you'll need to restate in its entirety the alink
model from XLink, otherwise you've left linking completely arbitrary.
For xml:base there is no substitute, and at least for all of my own
work this alone would keep me from using XTM 1.1. I need to be able
to canonically specify the base address of my documents so that
they are portable, otherwise I've got to include a subject identity
URI statement for every single <topic>. Ugh.

> The mergeMap element must now come before all topic and association
> elements. 

There should be no ordering requirement of XTM documents. It's not
a sequence, it's a bag. If applications need to process <mergeMap>
elements first, they should pull them from the graph and process
them first.

> The datatype attribute has been added to resourceData, which also
> now supports embedded markup.

Both very substantial changes, both in terms of semantics and in
terms of processing requirements. And *please* don't pull a W3C
on me and say that you can "just ignore the markup you don't
understand." (please)

I think you're making an enormous mistake formally tying XTM to
XML Schema. There is a vast array of datatyping schemas, and the
majority of them that have any value have nothing to do with the
work of the W3C. Was this tie-in a specific decision of the ISO
committee, or just an assumption that if one needed datatyping,
the W3C had a convenient set of types? Might I suggest people be
aware of Bill Kent's book "Data and Reality" before jumping into
this fray too quickly?

If we're going to break validation and allow embedded markup
(a questionable strategy at best), I would at least highly
recommend including the 'datatype' attribute but not assigning
values outside of XTM's namespace, i.e., either leave it blank or
create XTM's own set for the necessary datatypes, with our own
definitions for our own purposes, not those tied in with
Description Logics, which is an entirely different domain than
Topic Maps, based on an entirely different set of core assumptions.
Data typing is really an application-level specification, and
probably shouldn't be included in the core graph syntax, within
the XTM namespace. If we're going to allow arbitrary markup,
the W3C has pointed the way: stuff it in anywhere you like and
ignore the consequences. Ignore the markup you don't understand,
to quote Tim and Dan. (At your peril, to quote Murray.) Put it
this way: the architecture for mixing markup languages is woefully
underspecified, and the meanings of specific instances of
embedded markup needs to be made clear, not simply in some kind
of attribute value, but in a meta-schema. If we want, e.g., XHTML
content within <resourceData>, it should be specified in a schema,
not simply (and stupidly) flagged with a namespace identifier.

> It's very difficult to argue that these changes make the new version
> into a different language.

Oh? I would say that precisely. These are substantive changes that
at *very least* warrant a 2.0, but really, there are a lot of pretty
fundamental changes. You've not just changed names, you've eliminated
a lot of specialized semantics in favour of reusing existing ones
that don't have the same meaning (e.g., <roleSpec> is by no definition
the same as <instanceOf>). You've also invented a lot of new features,
such as the reification and itemIdentity features. The whole subject
identity machinery has changed. That's pretty fundamental.

>>If the W3C were to put out a new version of XHTML that fundamentally
>>altered the way that browsers handled the markup, they'd have a bit
>>of a tough time selling the idea, marketing or not. The HTML WG had
>>a requirement that XHTML be roughly compatible with HTML, and even
>>so we changed the name. The changes being proposed for XTM are a lot
>>more fundamental than HTML --> XHTML. For example, TM4J would have to
>>make fundamental alternations to its processing of XTM documents, far
>>more than simply being able to parse the incoming markup.
>>[...]
>>I think this again is confusing the issue with XML Namespaces. This
>>isn't just a namespace change, it's a different underlying model too.
> 
> 
> I'm sorry, Murray, but what you are writing here just isn't true.  
> Every Topic Maps engine I know need only create a new importer and  
> exporter for the new XTM version, and they should be fine. (Except  
> for datatypes, but we added support for those more than a year ago,  
> and in any case that's just a simple extension and hardly a break  
> with previous tradition.)

If this were actually true, I'll have less a problem with the issues.
It's just that you seem to be talking in two directions at once. In
your last message you said

        The changes to XTM are *very* extensive, as nearly every part of
        the document has changed, so it's very important to give it a
        proper review.

Now you're saying that little has changed. Which is it?

I think the idea that things "should be fine" indicates that you're
willing to entirely sweep all the changes in semantics under the
carpet. This is the kind of thing the W3C does all the time, but I
hardly think we should follow suit.

> The new XTM version, in fact, maps to a model that is IDENTICAL to  
> the one that the old XTM version mapped to. Compare the 2005-07-20  
> and 2005-12-16 drafts, and you'll see. (Thanks to your suggestion a  
> few months back, they are both on the web.)

I'm comparing XTM 1.0 and the current draft, not TMDM versions both
from 2005.

> You could argue, of course, that XTM 1.0 had a different model, but  
> the truth is that it had no explicit model (much to the disgust of  
> many people at the time it was published). But TMDM really was  
> created to be that missing model, and in the last 4.5 years we've  
> bent over backwards to maintain compatibility with XTM 1.0, even if  
> this was actually quite hard.

I'm not sure if it's fair to say that there was disgust at there
being a missing model. There had been a very diligent attempt at
developing a model, which is of course roughly where the entire
rift in this community began. We did manage to include a Conceptual
Model as Annex F of XTM 1.0:

       http://www.topicmaps.org/xtm/1.0/index.html#conceptualmodel

for which seem to remember creating the UML graphics. Now, yes, it
can be argued that an explicit data model of XTM did not exist
because the community could not agree on it. That the TMDM is now
the model endorsed by the ISO committee is a matter of history. Okay.
So we move forward.

> As for arguing that the markup languages themselves are wildly  
> different; well, just look at the example I posted. What's actually  
> different there from the same topic map in XTM 1.0? Two new  
> attributes, and four elements that have changed name. Of those  
> elements, two have reverted to what is effectively their names in ISO  
> 13250:2000. That leaves <value> which I firmly believe is an  
> improvement over <baseNameString>, and which could hardly be said to  
> be a major change. The fourth is <subjectIndicatorRef>, which has now  
> become <subjectIdentifier>. (Okay, so I skipped that <parameters> has  
> been replaced with <scope> and <roleSpec> with <instanceOf>, but  
> <scope> and <instanceOf> are XTM 1.0 elements, and this is just a  
> simple consistency change.)
> 
> So where is the big difference?

I don't have a  real problem with changing <parameters> to <scope>
(except to mirror what Michel and Steve N. were saying back in 2000,
that "scope" is actually an incorrect name since the variants have
parameters, not actually scope, which is reserved for names. And that
<roleSpec> is specifying the roles played by the members, not any
kind of "instanceOf" function. But to nitpick. We should perhaps not
care that things are named correctly if we can be more consistent,
non? IOW, the arguments for using <parameters> and <roleSpec> in
XTM 1.0 were based on choosing a name consistent with the actual
meaning of the markup, not an attempt at economization of element
types at the expense of accuracy.

The list of changes doesn't include two really important ones, the
choice to move from URIs (a recognized standard) to IRIs (which are
only an RFC right now and are rarely supported by existing software,
whereas URIs are built into Java, Python, etc.).

You've also used terminology like "topic item" rather loosely, as
it is not defined anywhere in the document. Should I assume that
the "3 Terms and definitions" section will eventually include
definitions for everything that is currently unspecified? This is
mirrored in the way that terms for reification, etc. are handled:
used but not defined.

The choice for deserialization to go with the XML Infoset is perhaps
welcomed by some, but certainly not by me. I don't see it as an
improvement, just a snazzy reference to a cool-sounding W3C spec that
seems to confuse a lot of people. I really love statements like this
one:

     "Reliance on any particular behaviour in the XML processors
      used by recipients is strongly discouraged."

If developers can't rely on any particular behaviour, what can they
rely upon? I don't get it. We suddenly have in front of us a number
of sketchy new ideas (e.g., IRIs, Infoset, reification, topic items)
that are either undefined or point at theoretically functional specs,
but taken as a whole I don't see the processing model through the
haze. While Annex F in XTM 1.0 was perhaps only partial, it at least
left developers with some scrap of an idea of what to build. The
current document does not sync up with XTM 1.0 and yet does not
point the way for XTM 1.0+ processors. If the spec for XTM 1.0+
isn't going to include any behaviours (and is just a data model),
then this should be clearly stated.

Perhaps I don't understand the rationale, not having been party
to the discussions, but I don't see the current specification as
an improvement over XTM 1.0. The things that you've characterized
as "simplifications" seem to commonly overload terms (i.e., element
types); removal of theoretically unnecessary container elements
removes some of the semantics as well as actually making it harder
to process (e.g., one must go up to the parent element -- usually
<topic> -- to obtain the list of multiple children, which might be
interspersed with other elements, rather than just locating the
properly-named container and its direct child elements); the whole
variant apparatus is gone and topic naming substantially altered;
all of the subject identity and reification changes are hardly
inconsequential, either semantically or in terms of processing.
The latter alone is an *enormously profound* change, one that is
certainly not discussed in Annex E.

All in all, I don't think I'll be using the newer version of XTM
for anything I'm working on. There are too many substantive changes,
and I disagree with many or most of them. This isn't to me just
sour grapes, it's changed the entire path that XTM was moving to
harmonize it with a different model than the one it was originally
created for, it now includes external, underspecified, and unwanted
semantics. All this to me isn't just a version number issue, it's
a naming issue: this isn't XTM anymore, it's something else and
should be named accordingly, marketing arguments aside.

Murray

......................................................................
Murray Altheim                          http://www.altheim.com/murray/
Strategic Services Development Manager
The Open University Library and Learning Resources Centre
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

        As late as 1855, New York newspapers reported that Presbyterian,
        Baptist and Methodist churches were closed on Dec. 25 because
        "they do not accept the day as a Holy One." On the eve of the
        Civil War, Christmas was recognized in just 18 states.
        http://www.nytimes.com/2005/12/04/opinion/04sun3.html