ESPLORES TEXT PROCESSING is a comprehensive module whit advanced features of tokenization, stemming and stopword filtering.
Text mining concerns itself with discovering structure and patterns in unstructured data. There are many different approaches to this task, some focused on taxonomies and ontologies, some focused on semantics and natural language processing, while others use various algorithms to categorize and summarize.
Unstructured data are estimated to make up over 80% of enterprise data. Under the GDPR, it would be a costly mistake to have data that has been forgotten, ignored, and kept in an unstructured state. If a customer decides they would like to enact their right to be forgotten, that won’t even be technically possible with traditional analytics tools that base their approach only on structured data.
ESPLORES TEXT PROCESSING eliminates this risk by enabling the user to easily access the contents of the unstructured data disseminated in the various corporate network folders, allowing to obtain excellent results in terms of comprehension of the text contents in documents (contracts, deeds, resolutions, determinations, sentences, etc.).
- Stemming: algorithm able to trace every single word to its root (e.g., the "variants" Computer, Computing, Compute, Computed, Computable -> can be traced back to the root -> Comput)
- Tokenization: ability to analyze a stream of lexemes by categorizing them into tokens
- Stopwords removal: algorithm able to identify those words that, due to their high frequency in a language, are usually considered not very significant (e.g., to, from, on, ...)
- Language Detection: comes in the category of NLP (Natural Language Processing). With this algorithm, Esplores can lead back a value from human language.
- Summarization: is part of the characteristics of machine learning and data mining of Esplores, consists in the ability to create a "summary" of the text contained in the original document.
- Sentence: is a group of words that makes complete sense.
- Keyword extraction: is tasked with the automatic identification of terms that best describe the subject of a document.
- Ranking: is the feature that allows you to understand what the most recurring terms are, in which documents are located in order to create useful search keys to the needs of the user.
- Advanced Analytics (only if ESPLORES CORE is installed).