[sc34wg3] Feedback on the CTM draft
Lars Marius Garshol
larsga at garshol.priv.no
Mon Aug 7 14:43:25 EDT 2006
Some new comments before I reply to your replies:
- Why use the terms "assertion" and "assertion blocks" when TMDM uses
"statement" for (maybe) the same thing? Could we just call them
- encoding-directive is missing from the EBNF in Annex A.
* Steve Pepper
> This is exactly true. But for a first draft we felt most people
> would need
> something more like a tutorial, and that it would be a waste of
> time to be
> too much like a specification until the general design has
> stabilized. It's
> the latter we really need feedback on at this stage.
I think that's fair enough, but it would probably help to have a note
to this effect in the draft.
> [not true that draft defines mapping to TMDM]
> This sentence is borrowed from the XTM spec and signifies our
> intent: it
> will be true once Annex B has been written. That, in turn, has to
> wait for
> the grammar to stabilize a little more.
You mean, the mapping to TMDM will be in annex B? Hmmmmm. It seems
very strange for the real content of the specification to be in an
annex, if you ask me.
> [which BNF to use]
> We would like more input on this. In general, ISO standards are
> obliged to
> reuse other ISO standards when appropriate ones exist. The question
> is: are
> there acceptable reasons for NOT using ISO 14977?
The main problem is that it's a highly idiosyncratic EBNF syntax. It
doesn't follow the normal +*? convention, nor is it the same as the
equally idiosyncratic (but rather better known) IETF EBNF. It uses
to mean non-terminal foo once, but
means the same as normal foo*. So using 14977 means everyone will
have to learn another gratuituously different EBNF syntax in order to
read the CTM specification. I think ISO 14977 is a misshapen creature
that deserves a quiet death in obscurity.
I think you are probably right that there is a guideline saying we
should use ISO standards where we can, but in this case I think we
should just quietly pretend ignorance of ISO 14977 for as long as we
> [normative references to XTM]
> There aren't, but there might be in the future.
I can't imagine why you would need this, but, well, you are the editors.
> [why have both single- and triple-quoted strings]
> The argument was that triple-quoted strings permit strings that
> unescaped quote marks and that this is familiar to many users through
> Python. The question is whether this advantage is big enough to
> warrant the
> additional syntax. What do others think?
But triple-quotes are longer than just using the normal escape syntax:
"""it's a "feature", they say"""
"it's a \"feature\", they say"
I think this just means extra syntax to no real gain for the user.
> [datatype in occurrence templates]
> Yes. The main reason is to enable greater compactness, since the
> will not have to be specified on every individual instance of an
> type whose values always have a datatype that is not "autodetected".
I think that's good.
> An additional advantage of allowing datatypes in a template MIGHT
> be to
> enable more datatypes to be autodetected (e.g. "2006" could be
> recognized as
> an xsd:gYear rather than an xsd:Integer).
Didn't get that. Surely "2006" is a string, and not an integer? Also,
why would we autodetect a type that's hard-wired in the template? If
you meant 2006 without the quotes that would make sense, but I think
it would be simpler to say that all specially typed values must be
written as strings.
> [why both . and EOL EOL as block terminators]
> We have gone back and forth on both of these options. It seems not
> to be
> possible to get rid of delimiters inside assertion blocks without
> reducing expressiveness (in the case of comma), or requiring some
> additional syntax (in the case of semicolon).
Why not just use line breaks for semicolon and leave the comma as it
is? That way you could use two line breaks for the terminator, and
ditch the period.
> Regarding the termination of an assertion block, there seemed to be
> arguments in favour of both the period (consistency with comma and
> syntax; conservation of vertical space), and the empty line (likely
> to be
> used for readability anyway when editing lengthy topic maps). So we
> ended up
> giving the user the choice.
I guess this is a matter of taste but I would prefer to see just one
of these. If the linebreak has no significance anywhere else I don't
think it should have one here.
> One point regarding TMQL (and TMCL): CTM obviously has to be
> aligned with
> these standards, but the fact that one or the other has made a
> design choice in its current draft is not necessarily an argument
> to do
> things that way. We need to find solutions that fit the
> requirements of all
> three standards, and that may involve some modifications to the
> drafts of TMQL and TMCL.
I guess what you are saying that we may decide to change TMQL instead
of making CTM do what TMQL does just because TMQL does it some
particular way. I agree.
> [remove clause 6 from standard]
> Why? Because you think CTM should support all of the TMDM, or for
> some other
This sort of thing does not belong in a standard (it's not
normative). If you really want it, I guess it could go in as a non-
normative annex. The rationale definitely does not belong in a standard.
* Lars Marius Garshol
> Comments should not be included in the grammar, since they are removed
> in the lexing stage.
* Steve Pepper
> And yet the XML spec, which you suggest using as a model in other
> *does* include comments.
In most formal languages comments are allowed anywhere where
whitespace is, which makes it a horrible pain to have to specify it
explicitly in the grammar, because it winds up having to go
everywhere, and it almost certainly will be forgotten somewhere where
it was intended to be allowed.
In addition, if you use a parser generator this means you have to
include the comment production in your code everywhere, which again
is a real pain, but it's necessary to ensure that comments don't
occur somewhere where they are not allowed. A much easier solution
(used in most cases) is to have the lexer recognize and discard
comments so that when you are matching the token stream against the
grammar you don't see the comments at all.
None of these two points apply to XML. XML parsers are not
implemented using parser generators (there were some exceptions
initially, but they were horribly slow), and XML only allows comments
in a couple of places in the grammar.
The way your grammar is currently written, the following would for
example not be allowed
%version ctm 1.0 # I stick to 1.0 because 1.1 sucks
which I doubt you intended. Similarly, you don't allow
puccini # FIXME: don't have all the data yet
which I don't think was intentional, either.
To avoid all of this it's better to just allow comments everywhere
whitespace is allowed, and to state this just once. It does mean that
people can write things like
%version ctm # why do we have to say it's CTM, anyway?
but if they want to, why not?
> Some of the editors felt it would be wrong to prevent this; others
> felt we
> should encourage the best practice of keeping all directives and
> in the header. More opinions on this are solicited.
You can count me in the "only allow them at the top" camp. One reason
for this is that if you write
sort-name "puccini, giacomo" .
sort-name "else, someone" .
Then one of these will be an occurrence, and the other a name. In
other words: you can fuck up your data by simply having one statement
too high up. If it gave a parsing error it wouldn't be a problem, but
this is. (Warnings are no good.)
Lars Marius Garshol, Ontopian http://www.ontopia.net
+47 98 21 55 50 http://www.garshol.priv.no
More information about the sc34wg3