The TaLTaC software package as a tool of lexical and textual analysis, versions 1.0 e 2.0, lived over the last decades (1999--2015). It appears now to have met its technological limits. The TaLTaC version 3.0 (from now on T3) has been redesigned to overcome those limits. The process included: (i) recoding of all inner software components with modern web-related languages and standards; (ii) adoption of a new kind of database (NoSQL) capable to handle corpora in the order of magnitude of gigabytes; (iii) new criteria for data storage and data processing. The software architecture is modular and allows to decouple user interaction from actual data computing. The two main components are: the GUI (graphical user interface), based on HTML5/CSS/Js and the back-end processing CORE. The new design also made it possible to run T3 among the mainstream operating systems: Os X, Windows, and Linux. From a single parsing operation, T3 produces many vocabularies for multi-level lexical analysis. This allows one to disambiguate, in a semiautomatic fashion, between the different text graphical forms on the basis of concordance. I also allows for a virtual transformation of simple forms into multi-words.

TaLTaC 3.0. A Multi-level Web Platform for Textual Big Data in the Social Sciences

De Gasperis Giovanni
2017-01-01

Abstract

The TaLTaC software package as a tool of lexical and textual analysis, versions 1.0 e 2.0, lived over the last decades (1999--2015). It appears now to have met its technological limits. The TaLTaC version 3.0 (from now on T3) has been redesigned to overcome those limits. The process included: (i) recoding of all inner software components with modern web-related languages and standards; (ii) adoption of a new kind of database (NoSQL) capable to handle corpora in the order of magnitude of gigabytes; (iii) new criteria for data storage and data processing. The software architecture is modular and allows to decouple user interaction from actual data computing. The two main components are: the GUI (graphical user interface), based on HTML5/CSS/Js and the back-end processing CORE. The new design also made it possible to run T3 among the mainstream operating systems: Os X, Windows, and Linux. From a single parsing operation, T3 produces many vocabularies for multi-level lexical analysis. This allows one to disambiguate, in a semiautomatic fashion, between the different text graphical forms on the basis of concordance. I also allows for a virtual transformation of simple forms into multi-words.
2017
978-3-319-55476-1
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11697/120822
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact