Warning! This project moved to aclai-lab/ModalDecisionTrees.jl.
This package provides algorithms for learning decision trees and decision forests with enhanced abilities. Leveraging the express power of Modal Logic, these models can extract temporal/spatial patterns, and can natively handle time series and images (without any data preprocessing). Currently available via MLJ.jl.
Features & differences with DecisionTree.jl:
- Ability to handle attributes that are
AbstractVector{<:Real}
orAbstractMatrix{<:Real}
; - Supports multimodal learning (i.e., learning from combinations of scalars, time series and images);
- Fully optimized implementation (fancy data structures, multithreading, memoization, minification, Pareto-based pruning optimizations, etc);
- A unique algorithm that extends CART and C4.5;
- Slightly different set of hyperparameters (e.g., no
min_samples_split
,post_prune
&merge_purity_threshold
).
Current limitations (also see TODOs):
- Only supports numeric features;
- Only supports classification tasks;
- Does not support
missing
orNaN
values.
Checkout the 8-minute lightning talk at JuliaCon 2022!
# Install packages
using Pkg; Pkg.add(url="https://github.com/giopaglia/ModalDecisionTrees.jl")
Pkg.add("MLJ")
# Import packages
using MLJ
using ModalDecisionTrees
using Random
# Load an example dataset (a temporal one)
X, y = @load_japanesevowels
N = length(y)
# Instantiate an MLJ machine based on a Modal Decision Tree with ≥ 4 samples at leaf
mach = machine(ModalDecisionTree(min_samples_leaf=4), X, y)
# Split dataset
p = randperm(N)
train_idxs, test_idxs = p[1:round(Int, N*.8)], p[round(Int, N*.8)+1:end]
# Fit
fit!(mach, rows=train_idxs)
# Perform predictions, compute accuracy
yhat = predict(mach, X[test_idxs,:])
accuracy = sum(yhat .== y[test_idxs])/length(yhat)
# Print model
report(mach).print_model(3)
# Access raw model
model = fitted_params(mach).model
- Enable loss functions different from Shannon's entropy (untested)
- Enable regression (untested)
- Proper test suite
- Visualizations of modal rules/patterns
Most of the works in symbolic learning are based either on Propositional Logics (PLs) or First-order Logics (FOLs); PLs are the simplest kind of logic and can only handle tabular data, while FOLs can express complex entity-relation concepts. Machine Learning with FOLs enables handling data with complex topologies, such as time series, images, or videos; however, these logics are computationally challenging. Instead, Modal Logics (e.g. Interval Logic) represent a perfect trade-off in terms of computational tractability and expressive power, and naturally lend themselves for expressing some forms of temporal/spatial reasoning.
Recently, symbolic learning techniques such as Decision Trees, Random Forests and Rule-Based models have been extended to the use of Modal Logics of time and space. Modal Decision Trees and Modal Random Forests have been applied to classification tasks, showing statistical performances that are often comparable to those of functional methods (e.g., neural networks), while providing, at the same time, highly-interpretable classification models. Examples of these tasks are COVID-19 diagnosis from cough/breath audio [1], [2], land cover classification from aereal images [3], EEG-related tasks [4], and gas turbine trip prediction. This technology also offers a natural extension for multimodal learning [5].
The package is developed by Giovanni Pagliarini (@giopaglia) and Federico Manzella (@ferdiu).
Thanks to ACLAI Lab @ University of Ferrara.
Thanks to Ben Sadeghi (@bensadeghi), original author of DecisionTree.jl.