Fraunhofer SCAI is internationally one of the leading groups in the challenging field of information extraction from the biomedical literature. SCAI has a special focus on biological name recognition and disambiguation of synonyms.
Fraunhofer SCAI brings its expertise in information extraction, and the already developed software tools to the project. The SCAI approach is based on methods from Computational Linguistics and Bioinformatics. The established name recognition of biomedical and chemical terms is of central importance for the extraction of textual information. The assignment of different synonyms for defined entities such as genes or chemical compounds (disambiguation) and the recognition of speech varieties are highly relevant for the pharmaceutical industry. The Institute SCAI has developed in cooperation with Aventis Pharma an internationally competitive platform for the identification and normalization of named entities (ProMiner). In addition to the extraction from texts, a novel prototype for the reconstruction of chemical structures from images has been developed in the recent years (chemoCR). Chemical depictions can be found in many scientific publications and in chemistry related patents. The pre-processing and analysis of the layout of complex documents, such as scientific papers and patents, is of utmost importance. SCAI currently has a unique selling proposition with the combination of the tools ProMiner (text) and chemoCR (pictures). In the field of text mining, SCAI could show the excellent quality of the solution in public competitions. Worldwide only few academic groups are conducting research on the problem of chemical image mining.