If you want to create an individual educational trajectory, you should have a specific goal. EducationalGoalsClassifier is a classification model that determines the degree of specificity of the user's educational goals in the distance education system. The algorithm is based on a topic modelling with additive regularization. The model explicitly takes into account the assumption of splitting texts, words and topics into specific and non-specific.
It also can provide interesting diagrams "topics - specificity" and "tokens - specificity" and lists of the most characteristic tokens for each topic.
The degree of specificity was calculated for all documents (educational goals). The degree of specificity for a document is defined as the likelihood that it relates to any of the specific topics. The figure shows the specificity of all documents.
Specific topics: topic_3, topic_4, topic_5, topic_6, topic_7, topic_8.
Non specific topics: topic_0, topic_1, topic_2.
It can be seen that documents related to specific topics really have high specificity, and low specificity to non-specific ones.
This is the sequence in which to use Jupyter notebooks.
-
Знакомство с данными.ipynb - Getting to know the data
-
Предобработка и анализ исходных данных.ipynb - Preprocessing and analysis of initial data
-
Baseline. ARTM и логистическая регрессия.ipynb - Baseline. Additive regularization topic model and logistic regression
-
Main Decision
-
Main Decision. Ф00 по D0. Подбор параметров.ipynb - Main Decision. Φ_00 (block of Φ matrix) is built according to non-specific documents
-
Main Decision. Ф00 случайно. Подбор параметров.ipynb - Main Decision. Φ_00 (block of Φ matrix) is built randomly