[sc34wg3] What now, WG3? (was: Montreal meeting recommendations)

18 Sep 2001 02:02:11 +0200

(I've split the agenda discussion out into a separate email with a
different subject to make this thread (and the lengthy replies) more
manageable. The discussion of the costs of model changes, the value of
conformance, and what we can learn from history does not seem to
affect the agenda discussion, so this seemed more useful to me. I'm
posting my response to the second part first, because it was much
easier to write.)

* Steven R. Newcomb
|
| Do I understand you correctly if I interpret your use of the term,
| "the infoset model", to refer to the style guide that
| implicitly/explicitly guided those who created the XML Infoset
| Recommendation?  Or do you mean something else?

By "the infoset model" I mean SC34 N 0242 and any future versions
and derivations thereof. 

If this suggests to you that the infoset model is a model very much in
need of a better name I can only agree. I'll try and come up with one
(or get Steve P. to rescue me).

* Lars Marius Garshol
|
| [suggestion:] We discuss the issues on this list. SRN & MB maintain
| a terminology document.

* Steven R. Newcomb
|
| Right.  Many standards, including ISO 13250, include a terminology
| section.  It's a normal and necessary aspect of the editorial work.

What I meant was: we need something that documents the terminology
common to the two models, the core model and the infoset model. I
suggested that you and MB maintain that document as a common reference
to be used during this work, preferably as a document separate from
the core model, at least to begin with.

* Lars Marius Garshol
|
| [documents to be written:] Infoset-requirements, infoset-model,
| PMTM4, terminology document.

Sounds reasonable to me. I guess it belongs in the core model (which I
carelessly called PMTM4 above).

|   * the parsing rules for the XTM syntax, in terms of
|     the core model.

IMHO this belongs in the infoset model.

We need to define how to get from an XTM document to both an instance
of the infoset model as well as of the core model. This means that we
must either

  a) define a mapping from XTM to core, and then from core to infoset,
     or 
  b) define a mapping from XTM to infoset, and then from infoset to
     core.

As PMTM4 stands today a) is not possible, because PMTM4 does not
explicitly contain all the information that is in the infoset
model[1]. Another argument for b) is that the infoset follows XTM much
more closely than does the core, and so b) is likely to be much easier
than a). A third argument for b) is that the editors of the infoset
model seem much more keen to fully specify the XTM deserialization
process than do the editors of the core model.

|   * the parsing rules for the existing "HyTime-based"
|     13250 syntax, in terms of the core model.

IMHO this belongs in the infoset model, for the same reason as the
XTM deserialization above.

This may not be obvious, but in my opinion we should have a mapping
from ISO 13250 to the same infoset model as that of XTM (this is
possible). We do not need the core as a go-between here.

Specifying the XTM deserialization process is not enough, by the way.
We must also specify the serialization process. That is, how to
generate XTM 1.0 and ISO 13250 documents from instances of the model.

|   * natural language expression of the semantics of
|     each of the assertion types that are built into
|     both the XTM and HyTime-based syntaxes, in terms of
|     the core model.

Can you give a rough example of such an expression, to help me
understand what you mean?

|   * One or more formal expressions of the semantics of
|     each of the assertion types documented in the above
|     item, as UML, Property Set, API, etc.

IMHO this is what the infoset model would do, or, more likely, this is
what the mapping between the core model and the infoset model would do.

|   * rules for expressing Doctrines for the Expression
|     of Scope
| 
|   * a Standard (default) Doctrine for the Expression of
|     Scope

I'm not sure what these are. I think I get what their relationship is,
but not what they are. Examples are probably the way to go.

| The definition of "assertion" (what we are saying when we write an
| <association>, for example) should be in the core model.

Fine by me.

| The semantics of those very few assertion types that are needed to
| allow us to define all assertion types should be defined in the core
| model.  These include template-role-RPR, class-instance, and
| superclass-subclass.

Sounds reasonable to me, at least as a first approach.

| The definitions of all the other assertion types that are explicitly
| or implicitly required by the parsing models for both syntaxes (XTM
| and the HyTime-based syntax) should be in the definition of the
| Standard Application.  These include topic-basename,
| basename-variantname, and topic-occurrence.

The "definition of the standard application" is what I call the
infoset model, right? If so, this is also fine.

| [which model should TMCL and TMQL build on top of?]
|
| One possibility for each language is that it contains primitives
| based on The Standard Application, and it is therefore related to
| and dependent on The Standard Application.  I think this possibility
| is the one that has been in almost everyone's mind, anyway.  My gut
| feeling is that this is what we should do, in fact, but I reserve
| the right to change my opinion based on conversations we all have
| not yet had.

What you write here matches my own feelings exactly. It seems easier
to base them on the infoset model, but there may be advantages to
basing them on the core model and providing shorthands for the more
specific constructs in the infoset model. Which approach is the best
is too early to tell, but we should start out with the infoset and
look for ways to generalize.

I should add that before the core model can serve as a basis for TMQL
and TMCL it will need a lot more work. It's not inconceivable that
it may be turned into a suitable foundation, however.

* Lars Marius Garshol
|
| - what is the relationship to the XTM 1.0 and ISO 13250
|   specifications to be? how much of these two specifications 
|   should be replaced by the new model-based ones?

* Steven R. Newcomb
|
| Second question first: I believe we must keep virtually
| *everything* that's currently found in both of these
| specifications (except for their bugs, of course).

Sorry, this question was confusing. When I said "specifications" I
meant "how much of the _text_ of these specifications". I think
compatibility with the older specifications is very important, and
that we should keep as much of it as we can.

| This may require us to explicitly resolve any existing
| vendor-specific differences of interpretation in one way or another.

This model work _will_ require that. The current specifications are so
vague that there is bound to be quite a few cases of this. That is why
I am so anxious to do this work. It does no one any good to have
implementations that are not fully interoperable.

| (All such differences can always be resolved.  The worst case
| scenario is that we are forced to recognize distinct syntaxes for
| Vendor A and for Vendor B.

I can't imagine that this would ever happen. Firstly, the syntaxes are
not important. Secondly, they are already agreed upon. Thirdly, I
don't think any of the vendors are crazy enough to want to force a
schism upon us, whether in model or in syntax.

| Now the first question: I believe that the XTM 1.0 syntax and the
| existing "HyTime-based" syntax are really just two distinct syntaxes
| for one and the same set of assertion types, which I've been
| collectively calling "The Standard Application of Topic Maps."

I agree, except that I've been calling that very same thing "topic
maps". 

* Lars Marius Garshol
|
| - where should the models go, once they are complete? Are they ISO
|   13250 2nd edition?  Should they be a normative technical report?  Or
|   what?

* Steven R. Newcomb
|
| My own preference is to see everything brought together in a single
| comprehensive standard that fully validates and protects existing
| investments in software and instances, while at the same time
| providing for a future in which we will have to adapt to conditions
| that are unforeseeable today.

This seems reasonable to me. The question is: what is the timeframe
for achieving this? How much time, if any, would we save by choosing a
different route? Would that time saved make an inferior route worth it?

--Lars M.

[1] I'm referring to the values of these properties: [base locator],
    [source locators], [subject indicators], [subject address],
    [value], and [resource]. That is, every value of type locator or
    string.