[sc34wg3] CTM: The arguments for standardization

Mason, James David (MXM) sc34wg3@isotopicmaps.org
Thu, 21 Jul 2005 16:15:44 -0400

Lars Marius has just made what looks to me to be the clearest statement =
far of why one part of WG3 wants to standardize CTM. I think we need to =
this down further.

Our predecessor (SC18) had a formal process for "User Requirements". I =
it at the time (because it was used offensively, to attempt to derail =
and prevent development of DSSSL and HyTime). However, I think we need =
part of the process here.=20

Lars Marius's posting is a start on that process.

One of the first questions to be answered is "Who are the users?" (A lot =
the warfare in SC18 turned on that point, particularly the wide =
between the SGML crowd, who were writing a language for end users to =
in creating documents, and the ODA crowd, who eventually admitted they =
writing for other standards developers. [Yes, we expected people to type =
SGML, chicken lips and all, because there were no other tools than =
like EMACS when we started. Having been through that is one reason I =
tools that can disguise the syntax issues entirely.])

I see several kinds of users emerging in Lars Marius's posting, ranging =
standards developers to people who type examples in papers. We need to
clarify those roles, even if some people (like LMG) may fill more than =
Knowing who the users are is the first step to deciding whether this =
needs to be standardized.

(For the record, I think saying this is needed for standards =
that it will be used in other standards, is a whole lot better excuse =
saying that people need a quick-and-dirty way of developing test TMs or =
that they need an efficient notation for papers. Developing test data =
won't cut it as an excuse for a standard, and needing a symbolic =
notation for
papers isn't much better. In the latter case, you might do as well using =
symbolic logic notation that's accepted in mathematics or the KM/KE
communities though going as far as the full-blown logic notation that =
has provided us for TMRM is likely to be intimidating. See John Sowa.

I don't think compactness per se is justification for standardization. =
of the lumps in SGML are the result of attempts at compactness (e.g.,
DATATAG, attribute minimization). What we learned from SGML is that once =
standard gets real implementation support, the notation is written by
machines, not humans, and how ugly it is matters little. If you want
compactness, go binary. If a standard is successful, the only humans (?) =
have to deal with the raw notation are developers, and they're paid to

I don't believe that having a higher bar for implementation is a =
excuse to bar development. As several people have said, the hard part is
instantiating the underlying model, not building an input parser/output

I am concerned that we clarify why we have multiple notations. I think =
could start by stipulating that XTM is intended as a transfer syntax and =
is intended as a notation to be used inside something else but not for =
transfer. I might buy that, but I'll resist any attempt to justify CTM =
as a
language for creating TMs per se. I don't care how you create TMs in the
darkness of your own laboratory, so long as you can ship them to me in =
[I will say that for myself, I generate TMs either by processing other =
data through XSLT or by using editors designed around the semantics of =
TMs.] How you do it in your shop is of little concern to SC34.)

What I think we need now is a further clarification of who the users for =
are and, if TMCL and TMQL are part of the justification, why they need
special syntaxes. Again, I don't think compactness alone cuts it.

(As I have mentioned, one of our worst mistakes in creating SGML was to
create a special syntax for DTDs. It was more compact, but it was =
syntax, and we've paid dearly for it ever since. Perhaps what disturbs =
most about things like LTM is that they remind me of some of the early =
of the development of the DTD language. I also see delimiters =
I've never quite recovered from some of the criticism of one of the =
versions of SGML for having too many delimiters. On the other hand, you
wouldn't want to go down one of the paths we tried for DTDs, which =
an entirely positional notation without delimiters [other than =
connectors for
optionality, sequence, or, etc.]. W3 Schema is hideously ugly, as well =
unspeakably verbose, but at least things are named for what they are, =
and new
delimiters aren't introduced. 19757-2, original notation, is an obvious
example of how to do the job better.)

'Nuff said for one day.

Jim Mason