Gabi und Sascha
Tags - Kategorien : Alle | Berlin | Bücher | Fotografie | Java | Linkhalde | Weichware | Verfassung

ε·ο·s logo With support of the Hadoop streaming API it is possible to read in XML. In the εοs-toolkit the streaming API is used to convert Medline documents in EosDocuments in a cluster. The streaming API is not easy to use.

First: the implementation (Hadoop 0.16.2) propagates not the correct record to the mapper process. I must adjust the record by removing a possible open or closing parent XML element.

Second: the Mapper doesn't get the value in the value parameter of the map-method. The value comes in the key part of the map-method. I don't find any documentation of this behavior in the Hadoop docs. Only a look to the code in StreamXmlRecordReader shows it.

See εοs-toolkit converter contribution for more information.