General
Default implementation
It is simply an API to annotate plain textdocuments with sematic
information. To do this, it is possible to annotate a range of
characters with a so called Annotation which is
defined as the range of characters, the namespace, the localname
and some attributes.
No, it is not possible to annotate other documents than plain text
documents. To annotate well documented or propritary documents it is
necessary to extract the text part of the documents and create a new
document to annotate with the CoreBuilder API.
The combination of namesapces (as URIs) and localnames defines a greate scope of indivdual annotations. There is no need of defining a central repository of annotations. Each domain is able to define there own annotations.
Use the weight attribute of an Annotation.
The value of this attribute is an signed 32 bit integer. But it is
recommended, that the weight is a range from 0 (lowest weight) to
100 (highest weight). Default is 100.
Yes, you will find the definitions in Mimetype and file extension.
XML is not a very good format for serializing annotated documents. XML is format free and has no build-in support for overlapping elements. But is easy to emulate overlapping.
It is possible to use RDF for semantic information. But it is not usefull, cause RDF is a framework to add semantic information outside the document. It is not very usefull to use RDF inside a document.
Additionaly RDF is very heavyweight for that problem. Its not easy to store all attributes and position information in a RDF graph. And if its done the related document is not very robust. If you add or remove a character in the annotated text the position information gets invalid.
And at last RDF blows up the documents very much (much more than the default implementation).
The serialization mechanism is emulated by a a:id and
its counterpart, the a:ref attribute. All annotations
have an internal ID. If there are overlappings annotations the
XML elements are splitted in id- and
ref-parts.
Example:
There are two annotations, one in the range of 2-6 and the second
second in the range of 4 to 8. The first annotation gets the ID
a1. The second annotation gets the ID a2.
In XML now the first annotation elements starts at position 2 with
the attribute a:id='a1'. At position 4 the second
annotations starts with the attribute a:id='a2'.
At position 6 all elements will close. And after closing, the
second annotation will be reopen with the attribute
a:ref='a2'. The deserializer rebuilds the annotation
position with the help of this data.
Yes, there is a namespace defined for the annotations defined and the namesapce is also dereferencable under http://www.speexx.de/ocean/ns/annotation/1.0/#.
This is a quite simple mechanism. For each namespace the serializer
defines a prefix which is used in the XML document. The
namespace-prefix mapping will be done for all namespaces with the
namespace attributes (xmlns:xyz) in the root
element.
At this time I don't know.
The test work fine with the JDK 1.5 and the Java SDK 1.4.2 from IBM. Both for Linux.