ReaderBench - Semantic Models and Topic Mining
- Creator: UPB
- Publisher: Rage project
- Owner: Dascalu Mihai email
Extracts the keywords of a text together with their relevance scores and semantic links between them.
Short non-technical description:
Extracts keywords and topics of a text, together with the corresponding relevance scores and semantic links between them.
This component represents a core constituent within all ReaderBench modules in terms of discourse analysis and text mining.
Given an input text, this component returns the list of concepts, their relevance and the links between them.
The component is available in the following languages: English and French. Dutch and Romanian languages will be available soon.
Technical description:
ReaderBench introduced a generalized model for assessment based on the cohesion graph, applicable to both plain essay- or story-like texts and CSCL conversations, in particular chats, forum discussion threads or blog communities.
Text cohesion, viewed as lexical, grammatical and semantic relationships that link together textual units, is defined within our implemented model in terms of semantic similarity measured through semantic distances in: lexicalized ontologies (e.g. WordNet), Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA).
Additionally, specific natural language processing techniques are applied to reduce noise and to improve the system’s accuracy: tokenizing, splitting, part of speech tagging, parsing stop words elimination, dictionary-only words selection, stemming, lemmatizing, named entity recognition and co-reference resolution.
Moreover, we have developed a topic mining module that integrates the previously defined semantic models (available for English, French and Italian).
Support levels: The component is available "as is" without warranties or conditions of any kind. Reported bugs will be fixed. Continued support for new versions of the OS and game engines. New features will be added according to the developer's roadmap. New features can be added upon request (requires a service contract).
Detailed description:
The ReaderBench framework can be either cloned from our GitLab Repository or simply used as deployment library.
The Repository contains three projects:
- The ReaderBench Core
- The ReaderBench Desktop Client
- The ReaderBench API
The ReaderBench Core can be accessed to explore the Natural Language Processing functionalities and operations performed by ReaderBench. You may either clone this project and explore its contents, or you can simply use it as a Maven dependency by cloning it from our Artifactory server.
The ReaderBench Desktop Client can be used to test ReaderBench functionalities with the help of a Java Swing interface. This project uses the ReaderBench Core, so you may use it as a guide into integrating ReaderBench in your projects.
The ReaderBench API can be used to explore how the ReaderBench Application Programming Interface works. Similar to the ReaderBench Desktop Client, you may discover how to integrate the ReaderBench Core into a project.
Language: English, French
Access URL: https://git.readerbench.com/ReaderBench/ReaderBench.git
Download: ReaderBench-Semantic-Models-and-Topic-Mining.zip
keywords extraction
topic mining
semantic models
topics
Source code:
Documentation:
- http://readerbench.com/docs/semantic-models/manual
- https://git.readerbench.com/ReaderBench/ReaderBench/blob/v3.0/README.md
- https://git.readerbench.com/ReaderBench/ReaderBench/wikis/home
- http://readerbench.com/docs/api
Setup files:
- http://readerbench.com/docs/semantic-models/sdd
- https://git.readerbench.com/ReaderBench/ReaderBench/wikis/how-to/how-to-install-and-run-readerbench
- https://owncloud.readerbench.com/index.php/s/w33mnCcpH1Bp1zs/download?path=/&files=README.txt
Test:
Game development environment: Other
Target platform: Other
Programming language: Java
Version: 3.0
Version notes: Stable version after major project split.
Development status: Completed
Commit URL: https://git.readerbench.com/ReaderBench/ReaderBench/tags/v2.3.5
Type: Apache 2.0 (Apache License 2.0)