Skip to content

Releases: lanl/T-ELF

v0.0.11

26 Mar 16:20
a7e1036
Compare
Choose a tag to compare
  • Adds acronym identification and substitution for acronyms capability to Vulture.
  • Fixes the dependency list in .yml files for installation.

v0.0.10

13 Mar 23:40
18eebe8
Compare
Choose a tag to compare

Adds a new text mining tool named Cheetah for fast search by keywords and phrases.

v0.0.9

07 Mar 22:18
d12a05b
Compare
Choose a tag to compare
  • Fixed bug where masking in NMFk was not passed properly to the NMF optimization.
  • Fixed bug where Vulture did not check cleaning steps in dataframe cleaning.
  • Adding ability for operator based module in Vulture. Not Vulture supports cleaning and operator modules.
  • Adding NER operator module to Vulture.

v0.0.8

08 Feb 21:06
3106597
Compare
Choose a tag to compare
  • Fixes a bug where consensus matrix would not fit in MPI communication for large matrices if multi-node factorization is performed.
  • Fixes a bug where consensus matrix calculation would not do unpruning for matrices that are pruned.

v0.0.7

07 Feb 18:21
493521d
Compare
Choose a tag to compare

Hot fix for a bug where WNMFk was not updating H latent factor with non-negativity constraint.

v0.0.6

29 Jan 19:12
c4f6bd6
Compare
Choose a tag to compare
  • Fixes a bug in WNMFk that would result in issues in using GPUs
  • Several Vulture bug fixes:
    • fixed a bug where case sensitivity would affect stopwords removal
    • fixed a bug where the SubstitutionCleaner.lower attribute being set to True would cause the entire document to be converted to lowercase instead of just ignoring case in substitution matching
    • fixed a bug where input substitutions dictionary would be modified by reference in SubstitutionCleaner
    • fixed a bug where empty strings would be output by Vulture.dataframe_clean() (in the case of invalid input documents such as non english text). Now these values are set to np.nan in the output DataFrame

v0.0.5

10 Jan 21:01
b5c6b3d
Compare
Choose a tag to compare

New Features

  • Adds ability to run TriNMFk without having to run NMFk first.
  • Adds WNMFk for recommendation systems.

Bugs

  • Fixes bug with n_jobs when using perturb multi-processing.

v0.0.4

04 Dec 21:24
4497bf4
Compare
Choose a tag to compare
Merge pull request #69 from lanl/develop

rebase to main. version change

v0.0.3

29 Nov 19:27
7bb5821
Compare
Choose a tag to compare

Fixes bug with RESCALk.

v0.0.2

23 Nov 00:19
4ea56a2
Compare
Choose a tag to compare

Tensor Extraction of Latent Features (T-ELF)

Build Status License Python Version

T-ELF is one of the machine learning software packages developed as part of the R&D 100 winning SmartTensors AI project at Los Alamos National Laboratory (LANL). T-ELF presents an array of customizable software solutions crafted for analysis of datasets. Acting as a comprehensive toolbox, T-ELF specializes in data pre-processing, extraction of latent features, and structuring results to facilitate informed decision-making. Leveraging high-performance computing and cutting-edge GPU architectures, our toolbox is optimized for analyzing large datasets from diverse set of problems.

Central to T-ELF's core capabilities lie non-negative matrix and tensor factorization solutions for discovering multi-faceted hidden details in data, featuring automated model determination facilitating the estimation of latent factors or rank. This pivotal functionality ensures precise data modeling and the extraction of concealed patterns. Additionally, our software suite incorporates cutting-edge modules for both pre-processing and post-processing of data, tailored for diverse tasks including text mining, Natural Language Processing, and robust tools for matrix and tensor analysis and construction.

T-ELF's adaptability spans across a multitude of disciplines, positioning it as a robust AI and data analytics solution. Its proven efficacy extends across various fields such as Large-scale Text Mining, High Performance Computing, Computer Security, Applied Mathematics, Dynamic Networks and Ranking, Biology, Material Science, Medicine, Chemistry, Data Compression, Climate Studies, Relational Databases, Data Privacy, Economy, and Agriculture.

Installation

Step 1: Install the Library

Option 1: Install via PIP

conda create --name TELF python=3.11.5
source activate TELF # or <conda activate TELF>
pip install git+https://github.com/lanl/T-ELF.git

Option 2: Install from Source

git clone https://github.com/lanl/T-ELF.git
cd T-ELF
conda create --name TELF python=3.11.5
source activate TELF # or <conda activate TELF>
pip install -e . # or <python setup.py install>

Option 3: Install via Conda

git clone https://github.com/lanl/T-ELF.git
cd T-ELF
conda env create --file environment_gpu.yml # use <conda env create --file environment_cpu.yml> for CPU only
conda activate TELF_conda
conda develop .

Step 2: Install Spacy NLP model and NLTK Packages

python -m spacy download en_core_web_lg
python -m nltk.downloader wordnet omw-1.4

Step 3: Install Cupy if using GPU (Optional - Skip if used Option 3 in Step 1)

conda install -c conda-forge cupy

Step 4: Install MPI if using HPC (Optional)

module load <openmpi> # On a HPC Node
pip install mpi4py # or <conda install -c conda-forge mpi4py> depending on the system

Jupyter Setup Tutorial for using the examples (Link)

Other Considerations

On some Linux devices, based on how CUDA was configured, you may get an error when using a GPU. Install cudatoolkit to resolve the error:

conda install cudatoolkit
conda install cudnn

Capabilities

Please see our 📃 Publications for the capabilities

Modules

TELF.factorization

Method Dense Sparse GPU CPU Multiprocessing HPC Description Example Release Status
NMFk ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ NMF with Automatic Model Determination Link
Custom NMFk ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Use Custom NMF Functions with NMFk Link
TriNMFk ✔️ ✔️ ✔️ ✔️ ✔️ NMF with Automatic Model Determination for Clusters and Patterns Link
RESCALk ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ RESCAL with Automatic Model Determination Link
RNMFk ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ Recommender NMFk Link
SymNMFk ✔️ ✔️ ✔️ ✔️ ✔️ NMFk with Symmetric Clustering Link
BNMFk Boolean NMFk 🔜
HNMFk Hierarchical NMFk 🔜
SPLIT NMFk Joint NMFk factorization of multiple data via SPLIT 🔜
SPLIT Transfer Classifier Supervised transfer learning method via SPLIT and NMFk 🔜
CP-ALS Alternating least squares algorithm for canonical polyadic decomposition 🔜
CP-APR Alternating Poisson regression algorithm for canonical polyadic decomposition 🔜
NTDS_FAPG Non-negative Tucker Tensor Decomposition 🔜

TELF.pre_processing

Method Multiprocessing HPC Description Example Release Status
Vulture ✔️ ✔️ Advanced text processing tool for cleaning and NLP Link
Beaver ✔️ ✔️ Fast matrix and tensor building tool for text mining Link
iPenguin Online Semantic Scholar information retrieval tool 🔜
Orca Duplicate author detector for text mining and information retrival 🔜

TELF.post_processing

| Method | ...

Read more