[sc34wg3] CTM: The arguments for standardization

Mason, James David (MXM) sc34wg3@isotopicmaps.org
Thu, 21 Jul 2005 16:15:44 -0400


Lars Marius has just made what looks to me to be the clearest statement =
so
far of why one part of WG3 wants to standardize CTM. I think we need to =
tie
this down further.

Our predecessor (SC18) had a formal process for "User Requirements". I =
hated
it at the time (because it was used offensively, to attempt to derail =
SGML
and prevent development of DSSSL and HyTime). However, I think we need =
some
part of the process here.=20

Lars Marius's posting is a start on that process.

One of the first questions to be answered is "Who are the users?" (A lot =
of
the warfare in SC18 turned on that point, particularly the wide =
divergence
between the SGML crowd, who were writing a language for end users to =
employ
in creating documents, and the ODA crowd, who eventually admitted they =
were
writing for other standards developers. [Yes, we expected people to type =
in
SGML, chicken lips and all, because there were no other tools than =
editors
like EMACS when we started. Having been through that is one reason I =
prefer
tools that can disguise the syntax issues entirely.])

I see several kinds of users emerging in Lars Marius's posting, ranging =
from
standards developers to people who type examples in papers. We need to
clarify those roles, even if some people (like LMG) may fill more than =
one.
Knowing who the users are is the first step to deciding whether this =
thing
needs to be standardized.

(For the record, I think saying this is needed for standards =
development,
that it will be used in other standards, is a whole lot better excuse =
than
saying that people need a quick-and-dirty way of developing test TMs or =
even
that they need an efficient notation for papers. Developing test data =
just
won't cut it as an excuse for a standard, and needing a symbolic =
notation for
papers isn't much better. In the latter case, you might do as well using =
some
symbolic logic notation that's accepted in mathematics or the KM/KE
communities though going as far as the full-blown logic notation that =
\rho
has provided us for TMRM is likely to be intimidating. See John Sowa.

I don't think compactness per se is justification for standardization. =
Many
of the lumps in SGML are the result of attempts at compactness (e.g.,
DATATAG, attribute minimization). What we learned from SGML is that once =
a
standard gets real implementation support, the notation is written by
machines, not humans, and how ugly it is matters little. If you want
compactness, go binary. If a standard is successful, the only humans (?) =
who
have to deal with the raw notation are developers, and they're paid to
suffer.=20

I don't believe that having a higher bar for implementation is a =
sufficient
excuse to bar development. As several people have said, the hard part is
instantiating the underlying model, not building an input parser/output
writer.

I am concerned that we clarify why we have multiple notations. I think =
we
could start by stipulating that XTM is intended as a transfer syntax and =
CTM
is intended as a notation to be used inside something else but not for =
data
transfer. I might buy that, but I'll resist any attempt to justify CTM =
as a
language for creating TMs per se. I don't care how you create TMs in the
darkness of your own laboratory, so long as you can ship them to me in =
XML.
[I will say that for myself, I generate TMs either by processing other =
XML
data through XSLT or by using editors designed around the semantics of =
my
TMs.] How you do it in your shop is of little concern to SC34.)

What I think we need now is a further clarification of who the users for =
CTM
are and, if TMCL and TMQL are part of the justification, why they need
special syntaxes. Again, I don't think compactness alone cuts it.

(As I have mentioned, one of our worst mistakes in creating SGML was to
create a special syntax for DTDs. It was more compact, but it was =
another
syntax, and we've paid dearly for it ever since. Perhaps what disturbs =
me
most about things like LTM is that they remind me of some of the early =
stages
of the development of the DTD language. I also see delimiters =
proliferating.
I've never quite recovered from some of the criticism of one of the =
ballot
versions of SGML for having too many delimiters. On the other hand, you
wouldn't want to go down one of the paths we tried for DTDs, which =
involved
an entirely positional notation without delimiters [other than =
connectors for
optionality, sequence, or, etc.]. W3 Schema is hideously ugly, as well =
as
unspeakably verbose, but at least things are named for what they are, =
and new
delimiters aren't introduced. 19757-2, original notation, is an obvious
example of how to do the job better.)

'Nuff said for one day.

Jim Mason