Response to Martin Bryan's comments

This is an early draft of a response to Martin Bryan's comments on SC34/WG3 N0299. A more fully considered response is likely to be forthcoming once it is clear whether those comments will be submitted as a formal document or not.

General comments

The Scope of ISO 13250 clearly states that one of the goals of the standard is that Topic Maps can be used:

To qualify the content and/or data contained in information objects as topics to enable navigational tools such as indexes, cross-references, citation systems, or glossaries. This design goal meant that it was considered important to the developers of ISO 13250 that Topic Maps could be embedded within other documents, to which they could make reference to identify those parts of the document that referred to specific topics. SAM does not appear to adequately allow for such embedded Topic Maps.

SAM as specified in N0299 is entirely agnostic as to how and where topic maps are stored. To the best of the authors' knowledge there is nothing in the text that either allows or disallows topic maps to be embedded in other documents. Any formulations or design decisions that prevent this (or make it needlessly difficult) should be considered bugs and reported to the authors.

SAM does not appear to allow for the use of facets to filter information views.

In SAM it is possible to create topics which represent information resources directly, thus removing the need for facets, as the information represented using facets in ISO 13250:2000 can now be represented directly using topics, associations, and occurrences. See this message from Steve Pepper for additional information.

The details of how to convert HyTM facet elements into topic and association items in the SAM is not yet decided. This must be decided as part of the work of writing the new HyTM syntax specification.

Specific comments

Abstract

The statement that "This specification supersedes [ISO13250] and [XTM]" is unacceptable. At best it can complement the existing standard.

The statement is admittedly somewhat stronger than it ought to be. The current plan of WG3 is that the Reference Model, SAM, HyTM-in-SAM, and XTM-in-SAM documents together will supersede ISO 13250:2000 and become the second edition of ISO 13250. See N0278 for more information.

The draft will be modified accordingly.

1. Introduction

The second paragraph suggests that "Topic maps may be represented ... mentally in the minds of humans." It is difficult to see how topic maps can be used within human minds: they were specifically designed for use by computers!.

While topic maps are certainly designed to be used by computer this does not preclude a human being from building a mental representation of a topic map.

Issue (term-theme): The concept of theme was deliberately introduced into ISO 13250 to avoid the ambiguity that occurred when the word topics was used to describe the set of referenced topic map concepts used to scope an item. Dropping this term is likely to lead to ambiguities within SAM.

The background for the issue is that the XTM 1.0 specification does not make use of the term theme, apparently without problems. It seems that there are 3 concepts that need to be referenced:

The set of subjects that make up the scope of a topic characteristic assignment. For this the term 'scope' seems suitable.
The use of a subject to scope a topic characteristic assignment. For this the phrase 'scoping topic' would seem to work, but 'theme' might be preferrable. (This is what 'theme' originally signified.)
The context in which a topic characteristic assignment is valid. For this the term 'scope' seems suitable.

Issue (string-normalization): Normalization Form C should be adopted in conformance with W3C rules for XML.

That is an option, but it is not clear that it is necessary to require a particular normalization form, as the results of processing would be identical regardless of whether NFC or NFD is employed. For implementors normalization form D may have advantages, and so it may be better to leave the choice of normalization form open, although it seems clear that NFKC and NFKD are inappropriate for the [value] property. For the locator.[reference] property they might be more appropriate than NFC/NFD.

Requiring normalization also significantly increases the effort necessary to implement topic maps, thus raising the bar for implementors. This may not be worth it for the sake of slightly improved string comparison.

The opinion has been recorded. More detailed arguments to support it would be very much welcome.

2. The metamodel

The enforced merging of all sets within Topic Maps is an error. This was proposed as part of the original design of ISO 13250 and found not to be relevant in many instances. For this reason ISO 13250 only allows for merging based on scoped topic names or subject identity. Two topics may reference exactly the same set of scopes, occurrences and associations without being merged if they have differently scoped identical names. Similarly a topic can refer to the same objects more than once without generating an error. Sets should not cause duplicated entries to be removed from a topic map as there may be facets associated with these topics that could disambiguate them (for example, they could have different start and end dates of applicability without this being defined using a scope statement).

[[[THINK ABOUT THIS]]]

3.1 Locator items

Issue (locator-reference): To fully represent ISO 13250 Topic Maps SAM must allow for locators that do not address information resources.

Does this mean that HyTime (and thus also ISO 13250:2000) allows locators that do not reference information resources? If so, examples would be welcome.

[notation]: Referring to RFC 2396 is likely to only be relevant for a short while. Already IETF have published a draft on Internationalized Resource Identifiers (IRI) that may well replace this specification in the longer term. The requirement that all other values must begin with "x-" is unsupportable given that IRI might be needed in place of URI by the time SAM is published by ISO.

It is true that RFC 2396 will eventually be replaced by some other specification. This is a general problem with IETF recommendations: they have no version-independent identifiers, making it impossible to refer to the current version of the URI specification.

The "x-" prefix was introduced in order to ensure that others than ISO could define values, without any risk that values defined by ISO would collide with values introduced by third parties. Third parties would of course not have any guarantee of uniqueness, but they would at least be able to use this mechanism, and if their values have sufficient merit ISO would be able to adopt them.

Admittedly, the issue of how to keep the set of 'official' values for this property up to date is thorny, and so it has been allocated the ID "prop-notation-interp" to ensure that it is properly discussed and settled.

3.2 Source locators

The third paragraph is illogical. A topic may be used as a) information on a subject b) a constraining scope of other topics and/or as c) the identification of an occurrence role, association role, etc. Therefore multiple "information items" (this term is not defined in SAM so its scope is unclear) may point to the same source.

The third paragraph reads: Source locators are used in this specification to define reification, in the syntax specifications to ensure that information loaded from different information resources is correctly collated, and it is expected that other specifications and technologies will use them to define mechanisms for referencing topic map constructs.. The authors fail to see any contradictions between this paragraph and the statements made in the comments. Further explanation of the problem seems necessary.

The term "information item" is explained in section 2 as follows: Throughout this document the term "information item" is used to refer to information items in general, while particular information item types are referred to as "topic items", "base name items", and so on.

3.3 The topic map item

[source locators]: Under what circumstances can the set of locators items for the source locator be empty or contain more than one entry? (The second sentence is ungrammatical!)

The set of source locators of a topic map item can be empty when the topic map is created from a HyTM document (where the topicmap element has no ID), from an XTM document where the topicMap element had no ID, or when the topic map item is created by means not explicitly described by ISO 13250 2nd ed.

A topic map item can have more than one source locator when a topic map item has been assigned source locators by means not specified in the standard (as is explicitly allowed), for example by a human author in order to make the topic map easier to refer to.

The authors will modify the second sentence.

There seems to be no means of identifying which themes (scopes) have been added to all entries in a topic map as a result of an added themes element.

That is correct. This information is not considered to have model significance beyond the addition of topic items to all [scope] properties.

There is no mechanism for recording facets used to manage ISO 13250 Topic Maps.

This issue was addressed above.

[reifier]: How can a "topic item" reify a "topic map"? (They are two different element types, and the source document of a map identified by the source locator cannot possibly point to an element of type "topic").

For a topic to reify a topic map means that the subject of the topic is the topic map in which the topic is contained. This enables the topic map to make statements about itself, such as what its name is, when it was published, who its authors are, what version number it has, what class of topic maps it is an instance of, where to locate its schema, and much more. Practice has shown this mechanism to be extraordinarily useful.

Syntactically, there is no reliable way for a topic to express that it reifies the topic map in HyTM, while in XTM this is straightforward:

  <topicMap id='reified-topic-map'
     xmlns='http://www.topicmaps.org/xtm/1.0/'
     xmlns:xlink='http://www.w3.org/1999/xlink'>   
  
    <topic id='tm'>
      <subjectIdentity>
        <subjectIndicatorRef xlink:href="#reified-topic-map"/>
      </subjectIdentity>
      <baseName>
        <baseNameString>Topic map</baseNameString>
      </baseName>   
    </topic>   
  </topicMap>

The topic with ID "tm" in this topic map reifies the containing topic map in a way that is reliable and explicitly allowed by the XTM specification, although the mechansm is unfortunately underspecified in XTM, and the definition contains some mistakes.

Note that it is always the [subject indicators] property of the topic information item that refers to the information item being reified, rather than the other way around.

Issue (prop-schema): Given the possible multiple types of schema/DTD for Topic Maps based on ISO 13250 architectural forms, SAM should record the source of this information.

The opinion is recorded, but it is not clear how SAM would record this, nor how the schema itself would be represented, nor if it should be represented in this way at all. Note also that the issue is about constraints on instances of the SAM model, not on syntactical expressions of it. Would such schemas/DTDs contain implicit constraints on model instances? If they are recorded, how should they be represented? More feedback on these issues is encouraged.

3.4 Topic items

ISO 13250 deliberately allows subject identity to be "inferred from the topic's characteristics." SAM does not seem to allow for this.

The authors have assumed that human beings will use topic characteristics as a last resort identification of subject identity in the absence of subject identifiers or subject addresses, but this was not explicitly stated.

The issue of whether or not this should be stated has been given the ID "subject-identity-establish" in order that it may be settled properly.

Issue(subject-vs-resource): ISO 13250 specifically states that the subject identify "may or may not be machine-interpretable, or may or may not be online". As noted above, it can also be "inferred from the topic's characteristics." Therefore SAM should not confuse subject and resource as they are clearly two different things.

The term "resource" as defined in RFC 2396 means "anything that has identity", and there is no requirement that it be machine-interpretable or online. The definition in RFC 2396 says explicitly that a human being is a resource, for example. (All this is in the quote from RFC 2396 given in N0299.) From this it would seem that the comment above is caused confusion as to what RFC 2396 actually means by the term "resource".

Issue(term-subject-identity): There should be a clear mapping between all ISO 13250 defined terms (such as subject identity) with terms in SAM.

The editors agree, and have given this issue the ID "def-terms".

3.4.1 Establishing subject identity

In what way does subject address differ from source locator? Would it not be better to have a single concept for all references to external sources of information?

The [subject address] of a topic contains the address of the information resource that is the subject of the topic. In creating a topic to represent, say, the home page of ISO, one would set [subject address] of the topic to Locator(notation="URI", address="http://www.iso.ch").

This is entirely different from source locators, which point back to the syntactical constructs that caused the topic item to come into existence, such as topic elements. These elements cannot be said to be the subject of the topic, they just constitute its syntactical expression in serialized form.

The other two ways of referring to external information: subject indicators and occurrences are distinguished from subject addresses and source locators because they have different semantics. This is reflected by the processing rules.

3.4.5 Reification

The apparent contradiction between the first sentences of the first and fourth paragraphs needs further explanation.

The second sentence of the fourth paragraph was meant to constitute the explanation, but apparently this is insufficient. The explanation will be extended.

3.4.6 Formal properties

Why is the identifier of a topic not one of its properties?

If by "identifier" the SGML ID of the topic element is meant this information will be recorded in the [source locators] property of the topic item. Simply recording the ID might be sufficient for HyTM, but for XTM it is insufficient, as topic maps created from XTM serializations may span multiple XTM documents, and IDs might therefore clash. Source locators (essentially locators of the form file#id) are used to avoid collisions and at the same time preserve the the identifier in its entirety.

Why are scoping topics not identified as part of the model?

If by "scoping topics" is meant "topics used in scopes" they are identified in the sense that they can be found by scanning the [scopes] properties of the items in the information set. This was thought to be sufficient identification of the scoping topics, as they were not considered to have any model signficance beyond that of, say, typing topics.

Why is subjectAddress required? If the value is null what is implied?

Why [subject address] is part of the model is explained in the response to the comments on section 3.4.1 above.

If its value is null it means either that the topic's subject is not an information resource, or that it is an information resource, but that the address of that information resource has not been specified.

Under what conditions can a topic not have a source locator? (Is this for when the topic is generated because it was an implied role? Is so, say so.)

A topic item can have no source locators when:

the information set was created by deserializing an XTM document which contained a subjectIndicatorRef element (for example as child of scope) for which there was no corresponding topic element,
the information set was created by deserializing a HyTM document, as the topics representing the "sort name" and "display name" published subjects will not have topic elements with SGML IDs, or
the information set was created by means not specified by ISO 13250 2nd ed, such as programmatically using an API, by mapping from RDF, or by some other method.

Re 1 [source locators], under what conditions could more than one source locator appear in the "set of locator items"?

A topic item can have more than one source locator when

it is the result of merging topic items created by more than one topic element, or
when the topic has been assigned source locators by means not constrained by the standard, such as by a human author in order to make it easier to refer to the topic.

2 [classes] is not shown in the model diagram.

Indeed. It is shown as "types", which is an inconsistency. That will be corrected.

3.5 Base name items

The fourth sentence is incorrect and highly misleading. "John Smith" is a terrible base name, as are organization names, as they cannot be guaranteed to be unique. As topics with similar base names have to be merged it is vital that shared strings not be used as base names.

The text of N0299 assumes that the topic naming constraint has been removed.

The first sentence in the second paragraph is incorrect. Base names do not need to have scopes. They can be within topics of unconstrained scopes with no scope assigned to them. The following sentence is also misleading in that ISO 13250 allows multiple base names to be applied without assigning them separate scopes.

Base names must either have a constraining scope, or be in the unconstrained scope, which means that they must have a scope, whether the unconstrained or some other. There is no meaningful distinction between the unconstrained scope and the null scope or unspecified scope. The sentence as written therefore seems correct to the authors.

SAM also allows multiple base names to appear in the same scope, but it makes it clear that applications have no other criterion for choosing which name to display in situations where only a single name can be displayed. If authors are satisfied with the choice of name to be displayed being random that is their affair; the standard is just making it clear what the consequences of their choices are.

The requirement that variant names be associated with specific base names is incorrect. A topic with more than one base name would have to repeat the variants associated with that base name for each base name. (This was a deliberate design feature of ISO 13250 as it was realised that the same symbol could be used for a topic which had been assigned multiple base names, either as synonyms or as language-specific versions.)

Incorrect seems to be overstating the case somewhat. Certainly, when deserializing instances of the HyTM syntax variant names that apply to multiple base names will have to cause multiple variant items to be created. To the authors this seems acceptable, but the ID "variant-in-basename" has been assigned to this issue to ensure that it is properly tracked and settled.

SAM contains no method for indicating within the model that the sorting order of names should be different from that of the string contents. The use of variant name scopes as specified in section 5.3 needs to be specifically mentioned either in this section or the next one.

The authors do not understand the first sentence of this comment. Elaboration would be welcome.

The authors agree with the second sentence. An editorial note has been added to the text so that this will be corrected.

Under what conditions could more than one sort locator be assigned to a display name?

Is the question "under what conditions could more than one topic representing the "suitability for sorting" published subject be added to the [scope] property of a variant item"? If so, the answer is that the [scope] property is a set, which means that duplicates will be silently removed. This can therefore not happen.

Re Issue(prop-value), the name [label] would be a distinct improvement.

The opinion has been recorded.

3.6 Variant names

The second sentence of the first paragraph does not conform to ISO 13250, which allows unconstrained display and sort names.

SAM allows variants that correspond to the unconstrained display and sort names of ISO 13250:2000 as a display name is represented within SAM as a variant item with a topic representing the "suitability for sorting" published subject in its [scope] property. As this means that the [scope] of a display name will contain at least one topic, the requirements of both SAM and ISO 13250:2000 are satisfied.

ISO 13250 allows unconstrained scopes for variant names, so the statement in 4 [scope] that the set must be "non-empty" invalidates SAM with regard to ISO 13250. The claim that the scope of a variant must be a superset of the scope property of the base name item of its parent is also invalid. A variant of an unconstrained base name may be constrained by a scope statement. (Though, as stated above, this only occurs because of the unreasonable requirement that variants be subordinated to base names rather than topics.)

The issue of the [scope] property is discussed above.

As the comments say "a variant of an unconstrained base name may be constrained by a scope statement," and if it is then the resulting scope of the variant item will be a superset of that of the base name item. The authors assume that the opposite is meant, as this is possible according to ISO 13250:2000, and would indeed constitute a contradiction between the SAM and ISO 13250:2000. This issue has been assigned the ID "prop-variant-scope-superset" in order that it can be properly tracked and settled.

The final sentence seems to imply that items with the same variant information should be merged. As it was a deliberate design goal of ISO 13250 that the same display name/symbol could be used for more than one name any implication that a combination of [value], [resource] and [scope] must be unique would be unreasonable.

The final sentence does indeed imply that equal variant items that appear in the [variant] property of the same base name item should be merged. It is correct that merging all variant names that compare as equal across the entire information set would be unreasonable, but the text only implies that this merging should be performed for each base name item.

The use of variant name scopes as specified in section 5.3 needs to be specifically mentioned in this section.

The authors think this belongs in section 3.6, and have accordingly added an editorial note.

3.7 Occurrence items

The second sentence of the first paragraph requires an occurrence type to be a "subject" yet in ISO 13250 an occurrence role does not have to equate to a "topic" in the topic map. SAM does not seem to differentiate between implied subjects and specific subjects in a way that would allow ISO 13250 topic maps to be recorded accurately.

In ISO 13250:2000 an occurrence role type must be a topic, while an occurrence role name is the generic identifier of the element type derived from the occurrence architectural form. The occurrence role type of ISO 13250:2000 corresponds to the [occurrence type] of SAM, while the ocurrence role name of ISO 13250:2000 is not considered to have model significance, and so is left out entirely. Arguments for why the occurrence role name should be considered significant are welcome.

In general SAM does not differentiate between implied and explicitly specified subjects, nor does there seem to be a need to do so. Arguments for why such a distinction is necessary are welcome.

Occurrence type, as defined in SAM, is compulsory, yet it can have a null value. In ISO 13250 occurrence role type is optional, and does not necessarily supply the occurrence role name (as SAM seems to expect it to do). It is unclear what the purpose of a null value for a SAM occurrence type is, or what should happen when an ISO 13250 Topic Map uses implied occurrence role names and no occurrence role type pointer.

As explained above the occurrence role name has no representation in SAM, so [occurrence type] corresponds to occurrence role type only. If, in a HyTM document, the occurrence role type is not explicitly specified the [occurrence type] property of the corresponding occurrence item will be null, regardless of whether an occurrence role name was given or not.

The last sentence of the first paragraph implies that you must either point to something within the topic map or to something within another file. It does not seem to allow for the case when an ISO 13250 topic map is embedded within another file and references occurrences of topics within that file.

The authors accept that this sentence needs improvement, and have correspondingly added an editorial note. Later drafts will address this issue.

The first sentence of the second paragraph is incorrect. Occurrences may have unconstrained scope. (ISO 13250 does not recognize the "occurrence" default type introduced for XTM. If this is imposed on the standard, then a means of clearly distinguishing default types from user-defined types needs to be added to the model so that imposed subjects specific to the management of the model can be distinguished from the subjects that the author specified as part of the Topic Map.)

As was explained above, the unconstrained scope is also a scope, and thus all topic characteristic assignments will have a scope.

The comments in parenthesis address issue xtm-def-occurrence-type and have been duly recorded.

Under what conditions would the resource property be null?

This would happen when the occurrence resource is given as a string instead of being referred to using a locator. This could never happen in the HyTM syntax, but corresponds to the use of a resourceData element in an occurrence element in XTM 1.0.

The final sentence seems to imply that items with the same occurrence information should be merged. Any implication that a combination of [value], [resource], [scope] and [occurrence type] must be unique would be an unjustifiable added constraint to ISO 13250.

Again, this applies only within the [occurrences] property of a single topic item.

No mechanism is provided in SAM for recording the constraints on traversal direction recorded in the linktrav and listtrav attributes of an ISO 13250 occurrence.

That is correct. The issue of whether this information needs to be accorded model significance has been given the ID "occurrence-traversal".

3.8 Association items

The first sentence of the second paragraph is incorrect: ISO 13250 associations may have unconstrained scope.

As was explained above, the unconstrained scope is also a scope, and thus all topic characteristic assignments will have a scope.

Under what circumstances would the set of locator items in the source locators property be empty, or contain multiple entries?

The property would be empty when the association item was created by deserialization of a HyTM document, or an XTM 1.0 document where the association element had no id attribute. It could also be empty if the association were created by a human author or automatically generated by software.

The property would contain multiple entries if the item resulted from the merging of multiple association items with source locators of their own. It could also do so if additional source locators were assigned by human authors or by software.

Association type, as defined in SAM, is compulsory, yet it can have a null value. In ISO 13250 associations linktype and type attributes are both optional, so that the type attribute does not necessarily supply the association type name. This relationship should, therefore, be identified as being optional.

This issue seems to be the same as that of occurrence type discussed above, and the answer given above applies equally to this issue.

The final sentence seems to imply that items with the same association information should be merged. Any implication that a combination of [roles], [scope] and [association type] must be unique would be an unjustifiable added constraint to ISO 13250.

The impliciation is not that associations must be unique, but that repeated assertions of the same association (Puccini was born in Lucca, Puccini was born in Lucca, Puccini was born in Lucca, and so on) will be collapsed to a single association item. This is the approach taken by XTM 1.0, and ISO 13250:2000 is silent on the subject. If this is considered unacceptable a clear statement to that effect, as well as arguments for why it is unacceptable would be welcome.

3.9 Association role items

Under what circumstances would the set of locator items in the source locators property be empty, or contain multiple entries?

It would be empty if the association role item were deserialized from a HyTM document, or if it were deserialized from an XTM 1.0 document. It might contain multiple entries if these were assigned by means not constrained by the standard.

No mechanism is provided in SAM for recording the constraints on traversal direction recorded in the linktrav and listtrav attributes of an ISO 13250 association role.

That is correct. The issue of whether this information should be recorded has been given the ID "association-traversal".

4.1 Merging topics

The suggestion in the second item of the first list that topics sharing a common source locator be merged suggests that there needs to be a requirement that such locators be specified in a specific notation that is able to identify specific elements within the source document, yet this constraint is not specifically stated.

It is true that this is a constraint, and that it is not explicitly stated. The authors agree that it should be stated, and have put in an editorial note to that effect.

4.4 Merging other information items

This section requires the merging of information that is not identified as requiring merging within ISO 13250. As noted above, it provides an unnecessary constraint on ISO 13250, where facets may be being used to differentiate between different instances that share the same properties.

[[[THINK ABOUT THIS]]]

5 Published subjects

How are users expected to know when something is a published subject? (They cannot possibly be expected to know of all the published subject indicators in existence, any more than a computer system can.)

This has also been recognized as an issue by the OASIS Published Subjects TC, which has assigned it the identifier ISSUE 1.

It is not clear whether this issue should be resolved as part of the SAM or as part of the PubSubj TC guidelines, or whether it is actually necessary to resolve it at all. In recognition of this, the issue has been given the ID "psi-identification".

How does one uniquely identify the set public subjects defined in SAM?

If the question is "how does one create a topic representing the set of all published subjects defined as part of SAM in such a way that it can be reliably merged" the answer is that this has not yet been resolved. A likely answer is that a PSI will be defined for this purpose.

This issue has been given the ID "psi-set-psi". Note that it depends on PubSubj TC ISSUE 23.

5.3 Variant name scopes

The first sentence of the second paragraph claims that sort names are suitable as display names. (Did you really mean #sort here, or should it have been #display? If so ignore the rest of this paragraph.)

This was a simple copy-and-paste error; it should have been #display, and this has now been corrected.