Early in the development I decide to serialize the text and annotations in XML with all problems XML offers. XML is a tree oriented markup language and this is a big contradictory to the free positioning of annotations. I decide to use XML cause I hate it to write own parsers and designing document formats. Additionaly the most developers today understand XML and the mechanism behind.
The overlapping insufficient of XML is solved with the introduction of the a:id and a:ref attributes. The deserializer algorithm of ther version 1.0 format past together annotation elements with the same a:id and a:ref (Note: the a: prefix is only a placeholder for the namespace mapping).
A simple XML example:
<example>
<a:b attr='1'>
<a:b attr='2'>
</a:b>
</a:b>
<example>
IN XML this is not a problem. But in combination with the overlapping solution it comes to a real problem.
There are now two possible solutions which I want to discus following. First the creation of two new attributes, a:spos for the starting and a:epos for the ending position of the annotation. This is possible for XML cause the annotation namespace is protected and not usable for user attributes. The main problem of this solution is the robustness. Cause of the robustnes I decide prior not store the start and ending position with its attributes in a special header outside the text.
<example>
<head>
<a a:spos='2' a:epos='4' a:weight='100' abc:xyz='dummy' />
</head>
<text>This is an example text</text>
<example>
The example is an annotation of 'is'. But this is not very
robust for manual changing of the text or problems in serialization
and derserialization th the XML processor, language encoding and so
on.
With this technic an algorithm checks for nested annotations with the same namespace and localname. Then the annotationsnamespace http://www.speexx.de/ocean/ns/annotation/1.0/# is extended with a countnumber (encoded with radix 36), finished by the number sign (#), followed by the namespace of the element. For this new namespace a new prefixmapping is created.
<example xmlns:a1='urn:speexx-de:dummynamespace'
xmlns:a2='http://www.speexx.de/ocean/ns/annotation/1.0/#1#urn:speexx-de:dummynamespace'>
<a1:b attr='1'>
<a2:b attr='2'>
</a2:b>
</a1:b>
<example>
This is a little bit complex but robuster than solution #1 and I
decide to implement this solution.