Modern systems that deal with inference in texts need automatized methods to extract meaning representations (MRs) from texts at scale. Open Information Extraction (IE) is a prominent way of extracting all potential relations from a given text in a comprehensive manner. Previous work in this area has mainly focused on the extraction of isolated relational tuples. Ignoring the cohesive nature of texts where important contextual information is spread across clauses or sentences, state-of-the- art Open IE approaches are thus prone to generating a loose arrangement of tuples that lack the expressiveness needed to infer the true meaning of complex assertions. To overcome this limitation, we present a method that allows existing Open IE systems to enrich their output with additional meta information. By leveraging the semantic hierarchy of minimal propositions generated by the discourse-aware Text Simplification (TS) approach presented in Niklaus et al. (2019), we propose a mechanism to extract semantically typed relational tuples from complex source sentences. Based on this novel type of output, we introduce a lightweight semantic representation for Open IE in the form of normalized and context-preserving relational tuples. It extends the shallow semantic representation of state-of-the-art approaches in the form of predicate-argument structures by capturing intra-sentential rhetorical structures and hierarchical relationships between the relational tuples. In that way, the semantic context of the extracted tuples is preserved, resulting in more informative and coherent predicate-argument structures which are easier to interpret. In addition, in a comparative analysis, we show that the semantic hierarchy of minimal propositions benefits Open IE approaches in a second dimension: the canonical structure of the simplified sentences is easier to process and analyze, and thus facilitates the extraction of relational tuples, resulting in an improved precision (up to 32%) and recall (up to 30%) of the extracted relations on a large benchmark corpus.
Christina Marianne Niklaus, Matthias Cetto, André Freitas, Siegfried Handschuh
30 Mar 2023