SciPuRe: a new Representation of textual data for entity identification from scientific publications

A new paper accepted in Wims'2020: SciPuRe: a new Representation of textual data for entity identification from scientific publications

content of scientific documents faces numbers of challenges. One of them is the assessment of the extracted entities for further process, especially the identification of false positives. We present in this paper SciPuRe (Scientific Publication Representation): a new representation of entities.

The extraction process presented in this paper is driven by
an Ontological and Terminological Resource (OTR). It is applied
to the extraction of entities associated with food packaging permeabilities,
that can be symbolic (e.g. the Packaging "low density
polyethylene") or quantitative (e.g. the Temperature "25", "◦ " or
the H2O_Permeability "4.34 ∗ 10−3", " 3 −2 −1 "). A representation
of each entity, composed of a set of features, is built
during the extraction process. These features can be gathered in
three categories: Ontological, Lexical and Structural. The features
of SciPuRe are used to compute Relevance scores that consider the
different information available for each entity extracted. Such Relevance
scores inform the usefulness of SciPuRe and can then be used
to rank the extraction results and discard false positives.

Modification date : 18 July 2023 | Publication date : 02 June 2020 | Redactor : PB