Finding the Knowledge Needles in the Data Haystack
The explosion of unstructured data available for research and development is a general phenomenon, but it has already become a performance defining factor in the medical and Biotechnology / Pharmaceutical areas: without ICT-based support tools for automated mining of document databases, determination and retrieval of strategically important scientific and business information is either untenable or becomes a significant drain on manpower resources. The situation in Pharmaceutical and bio-chemistry sectors is made more extreme by the reliance on multi-modal information in publications and documents as chemical structures are not just represented in text form but also as structure diagrams.
A particular, representative focal point is patent search in the pharmaco-chemical context: mining of patent documents requires a combination of text mining based on domain-specific vocabularies and ontologies combined with information extraction from (printed versions of) chemical structure diagrams. With databases containing millions of complex documents, the automated data analysis process is one whose computational requirements require high-performance computing and in order to meet the needs of the many industrial small and medium enterprises in the sector, a solution delivery approached based on remote service computing as offered by Cloud and SaaS solutions.