ISO/IEC JTC1/SC34N0

ISO/IEC

ISO/IEC JTC1/SC34

Information Technology —

Document Description and Processing Languages

Title: Topic Maps — Canonicalization
Source: Kal Ahmed, JTC1 / SC34
Project: ISO 13250: Topic Maps
Project editor: Steven R. Newcomb, Michel Biezunski, Martin Bryan
Status: Committee draft
Action: For ballot
Date: 2004-11-01
Summary:
Distribution: SC34 and Liaisons
Refer to:
Supercedes:
Reply to: Dr. James David Mason
(ISO/IEC JTC1/SC34 Chairman)
Y-12 National Security Complex
Information Technology Services
Bldg. 9113 M.S. 8208
Oak Ridge, TN 37831-8208 U.S.A.
Telephone: +1 865 574-6973
Facsimile: +1 865 574-1896
E-mail: mailto:mxm@y12.doe.gov
http://www.y12.doe.gov/sgml/sc34/sc34oldhome.htm

Mr. G. Ken Holman
(ISO/IEC JTC 1/SC 34 Secretariat - Standards Council of Canada)
Crane Softwrights Ltd.
Box 266,
Kars, ON K0A-2E0 CANADA
Telephone: +1 613 489-0999
Facsimile: +1 613 489-0995
Network: jtc1sc34@scc.ca

Topic Maps — Canonicalization

Contents

1 Scope
2 Normative references
3 Terms and definitions
3   Normalisation of Locator References
4   Canonical Sort Order
4.1   Introduction
4.2   Information Type and Basic Type Sort Order
4.3   Comparison Of String Property Values
4.4   Comparison Of Set Property Values
4.5   Comparison Order For Locator Items
4.6   Canonical Sort Order For Topic Items
4.7   Canonical Sort Order For Topic Name Items
4.8   Canonical Sort Order For Variant Items
4.9   Canonical Sort Order For Occurrence Items
4.10   Canonical Sort Order For Association Items
4.11   Canonical Sort Order For Association Role Items
5   Transformation Of Topic Map Data Model To CXTM XML Infoset
5.1   Introduction
5.2   Encoding of string properties
5.3   Encoding of positional values
5.4   Default property values for element information items
5.5   Default property values for attribute information items
5.6   CXTM Document Information Item
5.7   Constructing a representation of a topic map information item
5.8   Constructing a representation of a topic item
5.9   Constructing a representation of the topic name item
5.10   Constructing a representation of a variant item
5.11   Constructing a representation of an occurrence item
5.12   Constructing a representation of an association item
5.13   Constructing a representation of the association role item
5.14   Constructing a representation of a locator item
5.15   Constructing a representation of the [reifier] property
5.16   Constructing a representation of the [reified] property
5.17   Constructing a representation of the [scope] property
5.18   Constructing a representation of the [source locators] property
5.19   Constructing a representation of the [type] property
5.20   Constructing a representation of the [value] property

Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.

ISO/IEC 13250-4 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information Technology, Subcommittee SC 34, Document Description and Processing Languages.

ISO/IEC 13250 consists of the following parts, under the general title Topic Maps:

Introduction

Topic maps are abstract structures that can encode knowledge and connect this encoded knowledge to relevant information resources. Topic maps are organized around topics, which represent subjects of discourse; associations, representing relationships between the subjects; and occurrences, which connect the subjects to pertinent information resources.

Topic maps may be represented in many ways: using topic map syntaxes in files, inside databases, as internal data structures in running programs, and even mentally in the minds of humans. All these forms are different ways of representing the same abstract structure, the Topic Maps Data Model defined in Part 2 of this standard.

Canonicalization is the process of serializing a data structure in such a way that two data structures considered to be the same result in the same serialization and two data structures not considered to be the same result in two different serializations. A canonical form enables direct comparison of two data model instances to determine equality by comparison of their canonical serialization.

This part of ISO/IEC 13250 defines a canonical sort order for any set of of information items from the Topic Maps Data Model and a transformation of an instance of the Topic Maps Data Model to an instance of the XML Infoset model. The canonical sort order defined here can be applied not only to the set properties defined by the Topic Maps Data Model but also to other sets of topic map information items such as those generated as the result of processing a query.

This part of ISO/IEC 13250 also defines a transformation from the Topic Maps Data Model to the XML Infoset. Applications which serialize the XML Infoset model created by applying the transformation defined in this part of ISO/IEC 13250 must do so according to [XML-C14N]. When this serialization is performed, the resulting output string is a canonical representation of the Topic Maps Data Model instance.

Topic Maps — Canonicalization

1 Scope

This part of ISO/IEC 13250 specifies an algorithm for the canonicalization of an instance of the Topic Maps Data Model. It defines a canonical ordering for every information item defined by the Topic Maps Data Model and an XML serialisation of the information items and all of their properties. When the XML is serialized in accordance with [XML-C14N], the serialized file is the canonical representation of the Topic Maps Data Model instance.

2 Normative references

The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

NOTE:

Each of the following documents has a unique identifier that is used to cite the document in the text. The unique identifier consists of the part of the reference up to the first comma.

ISO/IEC 10646-1, ISO/IEC 10646-1:2000: Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 1: Architecture and Basic Multilingual Plane, ISO, 2000

ISO/IEC 10646-2, ISO/IEC 10646-2:2001 Information technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 2: Supplementary Planes, ISO, 2001

Unicode, The Unicode Standard, Version 3.0, The Unicode Consortium, Reading, Massachusetts, USA, Addison-Wesley Developer's Press, 2000, ISBN 0-201-61633-5

RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, The Internet Engineering Taskforce, August 1998, http://www.ietf.org/rfc/rfc2396.txt

XML-C14N, Canonical XML, Version 1.0, World Wide Web Consortium, 15th March 2001, http://www.w3.org/TR/2001/REC-xml-c14n-20010315

XML Infoset, XML Information Set, World Wide Web Consortium, 24 October 2001, http://www.w3.org/TR/2001/REC-xml-infoset-20011024

3 Normalisation of Locator References

Whenever the [reference] property of the locator item is used for comparison or for creating values in the XML Infoset instance created by the canonicalization process, it must be first converted to a normalised form.

In the following description of the normalisation procedure for the [reference] property of a locator item, the terms "fragment identifier", "query" and "path segment" are defined in [RFC 2396]

The process of converting the value of the [reference] property of the locator item to its normalised form is as follows:

  1. Let the value P be the value of the [reference] property of the locator item with any fragment identifier and query removed and any trailing "/" character removed.

  2. If the value of the [reference] property starts with P, then the normalised value of the [reference] property is the substring starting from, and including, the character immediately following the string that matches P, with any leading "/" character removed.

  3. If the value of the [reference] property does not start with P and P can be interpreted as a URI with at least one path segment, then remove the last path segment from P and any trailing "/" character and repeat from step (2).

  4. If the value of the [reference] property is not modified by the steps above, then the value of the [reference] property is the normalised value of the property.

NOTE:

This process may result in the value of the [reference] property no longer being a syntactically valid or resolvable URI, however the CXTM process described by This part of ISO/IEC 13250 does not require a conforming application to dereference these addresses.

4 Canonical Sort Order

4.1 Introduction

When transforming an instance of the Topic Maps Data Model to an instance of the XML Infoset model, all properties in the Topic Maps Data Model which are lists of information items must be encoded in the XML Infoset model by encoding each list item in the canonical sort order for the list. The clauses 4.2 to 4.11 define the canonical sort order for each information item type.

4.2 Information Type and Basic Type Sort Order

The following sort order applies to all information items and all instances of the basic types defined by the Topic Maps Data Model.

  1. NULL

  2. Topic map

  3. Topic

  4. Topic name

  5. Variant name

  6. Occurrence

  7. Association

  8. Association role

  9. Locator

  10. String

  11. Set

4.3 Comparison Of String Property Values

String values are compared on a character by character basis from the start of the string to the end. When the first pair of characters with different character codes are found, then the string containing the character with the lower code sorts lower than the string containing the character with the higher code.

If two strings have exactly the same normalised form, then they will be considered equal.

NOTE:

The Topic Maps Data Model requires that all string values are expressed in Unicode Normalisation Form C, hence this algorithm is sufficient to provide canonical string comparison.

4.4 Comparison Of Set Property Values

  1. Sets sort in order of the number of elements in the collection. A set with fewer elements sorts lower than a set with more elements.

  2. For sets of equal size, first sort the items of each set into their canonical ordering. Starting with the lowest item in each sorted set, perform a pair-wise comparison of items in each collection until a non-equal comparison is found. The collections then sort in the order of the non-equal items in each collection.

  3. Sets with exactly the same members will be considered equal.

4.5 Comparison Order For Locator Items

Locators are compared by comparing their properties in the following order.

  1. The normalised form of the [reference] property

  2. [notation]

4.6 Canonical Sort Order For Topic Items

Topic items are compared by comparing their properties in the following order.

  1. [subject identifiers]

  2. [subject locators]

  3. [source locators]

NOTE:

A combination of these three properties are all that is required to compare two topics. Part 2 of this standard requires that all topic items have at least one value for one of these properties and should two topics match in any one of these three properties, they must be merged.

4.7 Canonical Sort Order For Topic Name Items

Topic name items are compared by comparing their properties in the following order.

  1. [value]

  2. [type]

  3. [scope]

  4. [parent]

4.8 Canonical Sort Order For Variant Items

Variant items are compared by comparing their properties in the following order.

  1. [value]

  2. [resource]

  3. [scope]

  4. [parent]

4.9 Canonical Sort Order For Occurrence Items

Occurrence items are compared by comparing their properties in the following order.

  1. [value]

  2. [resource]

  3. [type]

  4. [scope]

  5. [parent]

4.10 Canonical Sort Order For Association Items

Association items are compared by comparing their properties in the following order.

  1. [type]

  2. [roles]

  3. [scope]

4.11 Canonical Sort Order For Association Role Items

Association role items are compared by comparing their properties in the following order.

  1. [role playing topic]

  2. [type]

  3. [parent]

5 Transformation Of Topic Map Data Model To CXTM XML Infoset

5.1 Introduction

The transformation process creates an instance of the XML Infoset data model from an instance of the Topic Maps data model. The XML Infoset data model created is rooted with a CXTM document information item.

The names of properties in the Topic Maps Data Model and in the XML Infoset data model are written in square brackets: [property name].

Throughout the rest of this clause the value of the [parent] property of element information items and attribute information items is not specified. The [parent] property of an element information item must always be set to the element or document information item of which the element information item is a direct child. The [parent] property of an attribute information item must be set to the element information item of which the attribute is a child.

5.2 Encoding of string properties

Before encoding a string property as a sequence of character information items, the string must be normalised according to Unicode Normalization Form C (Unicode Standard Annex #15, Unicode Normalization Forms, [Unicode]). Each character information item must have the following properties:

  1. [character code] The ISO 10646 ([ISO/IEC 10646-1] and [ISO/IEC 10646-2]) character code for the character.

  2. [element content whitespace] unknown for whitespace characters (characters with character codes #x20, #x9, #xD and #xA), and false for all other characters.

  3. [parent] the containing element or attribute information item.

5.3 Encoding of positional values

When the position of an item in a list is to be encoded, the encoded value is the index of that item in the list counting from 1 as the index of the first list item.

5.4 Default property values for element information items

All element information items created by the canonicalization process must have the following property values:

  1. [namespace name] No value.

  2. [prefix] No value.

  3. [namespace attributes] The empty set.

  4. [in-scope namespaces] The empty set.

  5. [base URI] No value.

  6. [parent] The element information item or document information item of which the element is a direct child.

5.5 Default property values for attribute information items

All attribute information items created by the canonicalization process must have the following property values:

  1. [namespace name] No value.

  2. [prefix] No value.

  3. [attribute type] unknown.

  4. [references] unknown.

  5. [specified] the boolean value true.

  6. [owner element] the element information item that this attribute information item belongs to.

5.6 CXTM Document Information Item

There is exactly one CXTM document information item in the XML Infoset generated by the canonicalization of the Topic Maps Data Model.

The CXTM document information item has the following properties:

  1. [children] A list containing only the representation of the topic map information item in the Topic Maps Data Model instance.

  2. [document element] The element information item that represents the topic map information item in the Topic Maps Data Model instance.

  3. [notations] The empty set.

  4. [unparsed entities] The empty set.

  5. [base URI] No value.

  6. [standalone] No value.

  7. [version] No value.

  8. [all declarations processed] False.

5.7 Constructing a representation of a topic map information item

A topic map information item in the Topic Maps Data Model is represented by an element information item with the following properties:

  1. [local name] The string "topicMap"

  2. [children] A list of element information items in the following order:

    1. A representation of each topic information item in the [topics] property of the topic map information item in canonical sort order.

    2. A representation of each association information item in the [associations] property of the topic map information item in canonical sort order.

  3. [attributes] If the value of the [reifier] property is not null, then a representation of the [reifier] property otherwise an empty list.

5.8 Constructing a representation of a topic item

A topic item is represented by an element information item in the XML Infoset. The element information item has the following properties.

  1. [local name] The string "topic"

  2. [children] A list of element information items in the following order:

    1. If the value of [subject identifiers] property of the topic item is not the empty set, then an element information item with the following properties:

      1. [local name] The string "subjectIdentifiers"

      2. [children] A representation of each of the locator items of the [subject identifiers] property in canonical sort order.

      3. [attributes] An empty list.

    2. If the value of the [subject locators] property of the topic item is not the empty set, then an element information item with the following properties:

      1. [local name] The string "subjectLocators"

      2. [children] A representation of each of the locator items of the [subject locators] property in canonical sort order.

      3. [attributes] An empty list.

    3. If the value of the [source locators] property of the topic item is not the empty set, then a representation of the [source locators] property.

    4. A representation of each of the topic name items of the [topic names] property in canonical sort order.

    5. A representation of each of the occurrence items of the [occurrences] property in canonical sort order.

    6. For each of the association role items of the [roles played] property in canonical sort order, an element information item with the following properties.

      1. [local name] set to the string "rolePlayed"

      2. [children] An empty list

      3. [attributes] A list containing the following attribute information items:

        1. [local name] set to the string "ref"

        2. [normalized value] A sequence of character information items representing a string value constructed a concatenation of:

          1. The string "association.".

          2. The position of the association item which is the value of the [parent] property of the association role item, in the canonically sorted [associations] property of the parent topic map item.

          3. The string ".role.".

          4. The position of the reified association role item in the canonically sorted [roles] property of the parent association item.

  3. [attributes] If the value of the [reified] property is not null, then a representation of the [reified] property, otherwise an empty list.

5.9 Constructing a representation of the topic name item

Each topic name item in the Topic Maps Data Model is represented by an element information item with the following properties.

  1. [local name] The string "topicName"

  2. [children] A list of element information items in the following order:

    1. A representation of the [value] property.

    2. If the value of the [type] property is not null, a representation of the [type] property.

    3. If the value of the [scope] property is not the empty set, a representation of the [scope] property.

    4. A representation of each of the variant items of the [variants] property in canonical sort order.

    5. If the value of the [source locators] property is not the empty set, a representation of the [source locators] property.

  3. [attributes] If the value of the [reifier] property is not null, a representation of the [reifier] property otherwise an empty list.

5.10 Constructing a representation of a variant item

A variant item in the Topic Maps Data Model is represented by an element information item with the following properties:

  1. [local name] The string "variant"

  2. [children] A list of element information items in the following order:

    1. If the value of the [value] property is not null, a representation of the [value] property.

    2. If the value of the [resource] property is not null, a representation of the locator item that is the value of the [resource] property.

    3. If the value of the [scope] property is not the empty set, a representation of the [scope] property.

    4. If the value of the [source locators] property is not the empty set, a representation of the [source locators] property.

  3. [attributes] If the value of the [reifier] property is not null, then a representation of the [reifier] property otherwise an empty list.

5.11 Constructing a representation of an occurrence item

An occurrence item in the Topic Maps Data Model is represented by an element information with the following properties:

  1. [local name] The string "occurrence"

  2. [children] A list of element information items in the following order:

    1. If the value of the [value] property is not null, a representation of the [value] property.

    2. If the value of the [resource] property is not null, a representation of the locator item that is the value of the [resource] property.

    3. If the value of the [type] property is not null, a representation the [type] property.

    4. If the value of the [scope] property is not the empty set, a representation of the [scope] property.

    5. If the value of the [source locators] property is not the empty set, a representation of the [source locators] property.

  3. [attributes] If the value of the [reifier] property is not null, a representation of the [reifier] property otherwise an empty list.

5.12 Constructing a representation of an association item

An association item in the Topic Maps Data Model is represented by an element information item with the following properties:

  1. [local name] The string "association"

  2. [children] A list of element information items in the following order:

    1. If the value of the [type] property is not null, a representation of the [type] property.

    2. A representation of each of the items of the [roles] property in canonical sort order.

    3. If the value of the [scope] property is not the empty set, a representation of the [scope] property.

    4. If the value of the [source locators] property is not the empty set, a representation of the [source locators] property.

  3. [attributes] If the value of the [reifier] property is not null, then a representation of the [reifier] property otherwise an empty list.

5.13 Constructing a representation of the association role item

An association role item in the Topic Maps Data Model is represented by an element information item with the following properties:

  1. [local name] The string "role"

  2. [children] A list of element information items in the following order.

    1. An element information item with the following properties:

      1. [local name] The string "rolePlayer"

      2. [children] The empty list

      3. [attributes] A list containing one attribute information item with the following properties:

        1. [local name] The string "topicref"

        2. [normalized value] The position of the topic information item that is the value of the encoded property within the canonically sorted list of all topic items in the Topic Maps Data Model being encoded.

    2. If the value of the [type] property is not null, a representation of the [type] property.

    3. If the value of the [source locators] property is not the empty set, a representation of the [source locators] property.

  3. [attributes] If the value of the [reifier] property is not null, a representation of the [reifier] property otherwise an empty list.

5.14 Constructing a representation of a locator item

A locator item in the Topic Maps Data Model is represented by an element information item with the following properties:

  1. [local name] The string "locator"

  2. [children] An empty list.

  3. [attributes] A list containing the following attribute information items:

    1. An attribute information item with the following properties:

      1. [local name] set to "address"

      2. [normalized value] a sequence of character information items representing the normalised form of the [reference] property of the locator item.

    2. An attribute information item with the following properties:

      1. [local name] set to the string "notation"

      2. [normalized value] a sequence of character information items representing the value of the [notation] property of the locator item.

5.15 Constructing a representation of the [reifier] property

The [reifier] property of a topic map item, topic name item, variant item, occurrence item, association item or association role item is represented as an attribute information item with the following properties:

  1. [local name] The string "reifier"

  2. [normalized value] The position of the topic item that is the value of the [reifier] property in the canonically sorted list of all topic items in the Topic Maps Data Model being encoded.

5.16 Constructing a representation of the [reified] property

The [reified] property of a topic item is represented as an attribute information item with the following properties:

  1. [local name] The string "reified"

  2. [normalized value] A sequence of character information items representing a string value constructed as follows:

5.17 Constructing a representation of the [scope] property

The [scope] property of a topic name item, variant item, occurrence item or association item is represented by an element information item with the following properties:

  1. [local name] The string "scope"

  2. [children] A list of one element information item for each topic item in the value of the [scope] property in canonical sort order. Each element information item has the following properties:

    1. [local name] The string "scopingTopic"

    2. [children] An empty list.

    3. [attributes] A list containing a single attribute information item with the following properties:

      1. [local name] The string "topicref"

      2. [normalized value] The position of the topic item within the canonically sorted list of all topic items in the Topic Maps Data Model being encoded.

  3. [attributes] An empty list.

5.18 Constructing a representation of the [source locators] property

The [source locators] property of an information item in the Topic Maps Data Model is represented by an element information item with the following properties:

  1. [local name] The string "sourceLocators"

  2. [children] A representation of each of the locator items of the [source locators] property in canonical sort order.

  3. [attributes] An empty list.

5.19 Constructing a representation of the [type] property

The [type] property of a topic name item, occurrence item, association item or association role item is represented by an element information item with the following properties:

  1. [local name] The string "type"

  2. [children] An empty list.

  3. [attributes] A list containing an attribute information item with the following properties:

    1. [local name] The string "topicref"

    2. [normalized value] The of the position of the topic item that is the value of the [type] property within the canonically sorted list of all topic items in the Topic Maps Data Model being encoded.

5.20 Constructing a representation of the [value] property

A [value] property in the Topic Maps Data Model is represented by an element information item with the following properties:

  1. [local name] The string "value"

  2. [children] A sequence of character information items representing the string value of the [value] property.

  3. [attributes] An empty list.

A RELAX-NG Compact Syntax Schema for CXTM Documents (informative)

      
topicMap = 
  element topicMap {
  attlist.reifier, topic*, association*
}

attlist.reifier &=
  attribute reifier { xsd:integer }?

topic = element topic {
  attlist.reified, 
  subjectIdentifiers?, 
  subjectLocators?, 
  sourceLocators?, 
  topicName*, 
  occurrence*, 
  rolePlayed*
}

attlist.reified &=
  attribute reified { text }?

subjectIdentifiers = element subjectIdentifiers {
  locator+
}

subjectLocators = element subjectLocators {
  locator+
}

sourceLocators = element sourceLocators {
  locator+
}

topicName = element topicName {
  attlist.reifier, value, type?, scope?, variant*, sourceLocators?
}

variant = element variant {
  attlist.reifier, value?, resource?, scope?, sourceLocators?
}

occurrence = element occurrence {
  attlist.reifier, value?, resource?, type?, scope?, sourceLocators?
}

rolePlayed = element rolePlayed {
  attribute ref { text }
}

association = element association {
  attlist.reifier, type?, role*, scope?, sourceLocators?
}

role = element role {
  attlist.reifier, rolePlayer?, type?, sourceLocators?
}

rolePlayer = element rolePlayer {
  attlist.topicref
}

attlist.topicref &= 
  attribute topicref {xsd:integer}

value = element value { text }

type = element type { attlist.topicref }

locator = element locator {
  attlist.locator
}

attlist.locator &=
  attribute notation { xsd:string },
  attribute address  { xsd:string }
  
scope = element scope {
  scopingTopic+
}

scopingTopic = element scopingTopic {
  attlist.topicref
}

resource = element resource {
  attlist.locator
}

start = topicMap