Natural Language Processing (NLP) constitutes a fundamental module for a plethora of domains where unstructured text is a predominant source. Despite the keen interest of both industry and research community in developing NLP tools, current industrial solutions still suffer from two main cons. First, the architectures underlying existing systems do not satisfy critical requirements of large-scale processing, completeness, and versatility. Second, the algorithms typically employed for entity recognition and disambiguation-a core task common to all modern NLP systems-are still not well-suited for deployment in a real industrial environment, for evident issues of efficiency and result interpretability. In this paper we present Hermes, a novel NLP tool that overcomes the two main limitations of existing solutions. By employing an efficient and extendable distributed-messaging architecture, Hermes achieves the critical requirements of large-scale processing, completeness, and versatility. Moreover, our tool includes an entity-disambiguation algorithm enhanced with a two-level hashing-based approximation technique to considerably improve efficiency, as a well as a densest-subgraphextraction method to increase result interpretability.
Advancing NLP via a distributed-messaging approach
Gullo F;
2016-01-01
Abstract
Natural Language Processing (NLP) constitutes a fundamental module for a plethora of domains where unstructured text is a predominant source. Despite the keen interest of both industry and research community in developing NLP tools, current industrial solutions still suffer from two main cons. First, the architectures underlying existing systems do not satisfy critical requirements of large-scale processing, completeness, and versatility. Second, the algorithms typically employed for entity recognition and disambiguation-a core task common to all modern NLP systems-are still not well-suited for deployment in a real industrial environment, for evident issues of efficiency and result interpretability. In this paper we present Hermes, a novel NLP tool that overcomes the two main limitations of existing solutions. By employing an efficient and extendable distributed-messaging architecture, Hermes achieves the critical requirements of large-scale processing, completeness, and versatility. Moreover, our tool includes an entity-disambiguation algorithm enhanced with a two-level hashing-based approximation technique to considerably improve efficiency, as a well as a densest-subgraphextraction method to increase result interpretability.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.