The TaLTaC software package as a tool of lexical and textual analysis, versions 1.0 e 2.0, lived over the last decades (1999--2015). It appears now to have met its technological limits. The TaLTaC version 3.0 (from now on T3) has been redesigned to overcome those limits. The process included: (i) recoding of all inner software components with modern web-related languages and standards; (ii) adoption of a new kind of database (NoSQL) capable to handle corpora in the order of magnitude of gigabytes; (iii) new criteria for data storage and data processing. The software architecture is modular and allows to decouple user interaction from actual data computing. The two main components are: the GUI (graphical user interface), based on HTML5/CSS/Js and the back-end processing CORE. The new design also made it possible to run T3 among the mainstream operating systems: Os X, Windows, and Linux. From a single parsing operation, T3 produces many vocabularies for multi-level lexical analysis. This allows one to disambiguate, in a semiautomatic fashion, between the different text graphical forms on the basis of concordance. I also allows for a virtual transformation of simple forms into multi-words.
TaLTaC 3.0. A Multi-level Web Platform for Textual Big Data in the Social Sciences
De Gasperis Giovanni
2017-01-01
Abstract
The TaLTaC software package as a tool of lexical and textual analysis, versions 1.0 e 2.0, lived over the last decades (1999--2015). It appears now to have met its technological limits. The TaLTaC version 3.0 (from now on T3) has been redesigned to overcome those limits. The process included: (i) recoding of all inner software components with modern web-related languages and standards; (ii) adoption of a new kind of database (NoSQL) capable to handle corpora in the order of magnitude of gigabytes; (iii) new criteria for data storage and data processing. The software architecture is modular and allows to decouple user interaction from actual data computing. The two main components are: the GUI (graphical user interface), based on HTML5/CSS/Js and the back-end processing CORE. The new design also made it possible to run T3 among the mainstream operating systems: Os X, Windows, and Linux. From a single parsing operation, T3 produces many vocabularies for multi-level lexical analysis. This allows one to disambiguate, in a semiautomatic fashion, between the different text graphical forms on the basis of concordance. I also allows for a virtual transformation of simple forms into multi-words.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.