Extending Apache Tika Capabilities
Apache Tika is a toolkit for extracting metadata and textual content from various document formats. Tika itself provides implementation for parsing some document formats while it relies on external libraries (such as Apache PDFBox and Apache POI) for parsing many more.
Tika provides a uniform Java API for all of the supported document formats to make life easier for the user. Additionally, Tika provides functionality for detecting document type and content language.
In my earlier …