Releases: lanl/T-ELF
v0.0.11
- Adds acronym identification and substitution for acronyms capability to Vulture.
- Fixes the dependency list in .yml files for installation.
v0.0.10
Adds a new text mining tool named Cheetah for fast search by keywords and phrases.
v0.0.9
- Fixed bug where masking in NMFk was not passed properly to the NMF optimization.
- Fixed bug where Vulture did not check cleaning steps in dataframe cleaning.
- Adding ability for
operator
based module in Vulture. Not Vulture supportscleaning
andoperator
modules. - Adding
NER operator
module to Vulture.
v0.0.8
- Fixes a bug where consensus matrix would not fit in MPI communication for large matrices if multi-node factorization is performed.
- Fixes a bug where consensus matrix calculation would not do unpruning for matrices that are pruned.
v0.0.7
Hot fix for a bug where WNMFk was not updating H latent factor with non-negativity constraint.
v0.0.6
- Fixes a bug in WNMFk that would result in issues in using GPUs
- Several Vulture bug fixes:
- fixed a bug where case sensitivity would affect stopwords removal
- fixed a bug where the SubstitutionCleaner.lower attribute being set to True would cause the entire document to be converted to lowercase instead of just ignoring case in substitution matching
- fixed a bug where input substitutions dictionary would be modified by reference in SubstitutionCleaner
- fixed a bug where empty strings would be output by Vulture.dataframe_clean() (in the case of invalid input documents such as non english text). Now these values are set to np.nan in the output DataFrame
v0.0.5
New Features
- Adds ability to run TriNMFk without having to run NMFk first.
- Adds WNMFk for recommendation systems.
Bugs
- Fixes bug with
n_jobs
when using perturb multi-processing.
v0.0.4
Merge pull request #69 from lanl/develop rebase to main. version change
v0.0.3
Fixes bug with RESCALk.
v0.0.2
Tensor Extraction of Latent Features (T-ELF)
T-ELF is one of the machine learning software packages developed as part of the R&D 100 winning SmartTensors AI project at Los Alamos National Laboratory (LANL). T-ELF presents an array of customizable software solutions crafted for analysis of datasets. Acting as a comprehensive toolbox, T-ELF specializes in data pre-processing, extraction of latent features, and structuring results to facilitate informed decision-making. Leveraging high-performance computing and cutting-edge GPU architectures, our toolbox is optimized for analyzing large datasets from diverse set of problems.
Central to T-ELF's core capabilities lie non-negative matrix and tensor factorization solutions for discovering multi-faceted hidden details in data, featuring automated model determination facilitating the estimation of latent factors or rank. This pivotal functionality ensures precise data modeling and the extraction of concealed patterns. Additionally, our software suite incorporates cutting-edge modules for both pre-processing and post-processing of data, tailored for diverse tasks including text mining, Natural Language Processing, and robust tools for matrix and tensor analysis and construction.
T-ELF's adaptability spans across a multitude of disciplines, positioning it as a robust AI and data analytics solution. Its proven efficacy extends across various fields such as Large-scale Text Mining, High Performance Computing, Computer Security, Applied Mathematics, Dynamic Networks and Ranking, Biology, Material Science, Medicine, Chemistry, Data Compression, Climate Studies, Relational Databases, Data Privacy, Economy, and Agriculture.
Installation
Step 1: Install the Library
Option 1: Install via PIP
conda create --name TELF python=3.11.5
source activate TELF # or <conda activate TELF>
pip install git+https://github.com/lanl/T-ELF.git
Option 2: Install from Source
git clone https://github.com/lanl/T-ELF.git
cd T-ELF
conda create --name TELF python=3.11.5
source activate TELF # or <conda activate TELF>
pip install -e . # or <python setup.py install>
Option 3: Install via Conda
git clone https://github.com/lanl/T-ELF.git
cd T-ELF
conda env create --file environment_gpu.yml # use <conda env create --file environment_cpu.yml> for CPU only
conda activate TELF_conda
conda develop .
Step 2: Install Spacy NLP model and NLTK Packages
python -m spacy download en_core_web_lg
python -m nltk.downloader wordnet omw-1.4
Step 3: Install Cupy if using GPU (Optional - Skip if used Option 3 in Step 1)
conda install -c conda-forge cupy
Step 4: Install MPI if using HPC (Optional)
module load <openmpi> # On a HPC Node
pip install mpi4py # or <conda install -c conda-forge mpi4py> depending on the system
Jupyter Setup Tutorial for using the examples (Link)
Other Considerations
On some Linux devices, based on how CUDA was configured, you may get an error when using a GPU. Install cudatoolkit
to resolve the error:
conda install cudatoolkit
conda install cudnn
Capabilities
Please see our 📃 Publications for the capabilities
Modules
TELF.factorization
Method | Dense | Sparse | GPU | CPU | Multiprocessing | HPC | Description | Example | Release Status |
---|---|---|---|---|---|---|---|---|---|
NMFk | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | NMF with Automatic Model Determination | Link | ✅ |
Custom NMFk | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | Use Custom NMF Functions with NMFk | Link | ✅ |
TriNMFk | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | NMF with Automatic Model Determination for Clusters and Patterns | Link | ✅ | |
RESCALk | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | RESCAL with Automatic Model Determination | Link | ✅ |
RNMFk | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | Recommender NMFk | Link | ✅ |
SymNMFk | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | NMFk with Symmetric Clustering | Link | ✅ | |
BNMFk | Boolean NMFk | 🔜 | |||||||
HNMFk | Hierarchical NMFk | 🔜 | |||||||
SPLIT NMFk | Joint NMFk factorization of multiple data via SPLIT | 🔜 | |||||||
SPLIT Transfer Classifier | Supervised transfer learning method via SPLIT and NMFk | 🔜 | |||||||
CP-ALS | Alternating least squares algorithm for canonical polyadic decomposition | 🔜 | |||||||
CP-APR | Alternating Poisson regression algorithm for canonical polyadic decomposition | 🔜 | |||||||
NTDS_FAPG | Non-negative Tucker Tensor Decomposition | 🔜 |
TELF.pre_processing
Method | Multiprocessing | HPC | Description | Example | Release Status |
---|---|---|---|---|---|
Vulture | ✔️ | ✔️ | Advanced text processing tool for cleaning and NLP | Link | ✅ |
Beaver | ✔️ | ✔️ | Fast matrix and tensor building tool for text mining | Link | ✅ |
iPenguin | Online Semantic Scholar information retrieval tool | 🔜 | |||
Orca | Duplicate author detector for text mining and information retrival | 🔜 | |||
TELF.post_processing
| Method | ...