Skip to content
Change the repository type filter

All

    Repositories list

    • Jupyter Notebook
      0000Updated Jan 18, 2025Jan 18, 2025
    • Code for FinerWeb-10BT – tools for cleaning web data line by line using LLMs
      Python
      MIT License
      1000Updated Jan 16, 2025Jan 16, 2025
    • Code for the large LUMI run of ECCO ocr correction
      Python
      Apache License 2.0
      0000Updated Jan 16, 2025Jan 16, 2025
    • 0400Updated Jan 14, 2025Jan 14, 2025
    • Clusters with keywords grouped based on their word embeddings
      0000Updated Jan 14, 2025Jan 14, 2025
    • Turku NLP list of publications
      TeX
      2000Updated Jan 14, 2025Jan 14, 2025
    • Jupyter Notebook
      Apache License 2.0
      0300Updated Jan 13, 2025Jan 13, 2025
    • Python
      1100Updated Jan 1, 2025Jan 1, 2025
    • A Jekyll version of the "Editorial" theme by HTML5 UP.
      JavaScript
      Other
      150200Updated Dec 19, 2024Dec 19, 2024
    • Handwritten text recognition pipeline for table data
      Jupyter Notebook
      Apache License 2.0
      0000Updated Dec 19, 2024Dec 19, 2024
    • Handwritten text recognition annotations
      0000Updated Dec 16, 2024Dec 16, 2024
    • Code to try out ocr postcorrection with language models
      Jupyter Notebook
      0010Updated Dec 16, 2024Dec 16, 2024
    • Jupyter Notebook
      4400Updated Dec 9, 2024Dec 9, 2024
    • HTML
      0360Updated Dec 3, 2024Dec 3, 2024
    • Python
      0000Updated Nov 28, 2024Nov 28, 2024
    • Different vLLM setups on different machines
      Python
      0000Updated Nov 15, 2024Nov 15, 2024
    • Materials for the University of Turku course TKO_8965 Deep Learning in Human Language Technology (previously named TKO_2101 Natural Language Processing)
      Jupyter Notebook
      Other
      111900Updated Oct 15, 2024Oct 15, 2024
    • Functions and codes used to determine probabilities on OCR errors and simulate them
      Python
      Apache License 2.0
      0200Updated Oct 10, 2024Oct 10, 2024
    • Code and data for multilingual situational analysis of web registers using LLMs.
      Apache License 2.0
      0000Updated Oct 4, 2024Oct 4, 2024
    • TurCORE

      Public
      Turkish Corpus of Online REgisters (TurCORE)
      0000Updated Oct 3, 2024Oct 3, 2024
    • Introduction to Natural Language Processing
      Jupyter Notebook
      Other
      26500Updated Sep 17, 2024Sep 17, 2024
    • 0000Updated Sep 4, 2024Sep 4, 2024
    • Jupyter Notebook
      Apache License 2.0
      0100Updated Aug 8, 2024Aug 8, 2024
    • Jupyter Notebook
      0000Updated May 28, 2024May 28, 2024
    • Python
      2300Updated May 10, 2024May 10, 2024
    • A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages. Top ranker in the CoNLL-18 Shared Task.
      Python
      Apache License 2.0
      31112110Updated May 7, 2024May 7, 2024
    • 0200Updated May 2, 2024May 2, 2024
    • Data collection for Finnish language using OpenAssistant-platform
      Python
      Apache License 2.0
      3.3k060Updated Mar 27, 2024Mar 27, 2024
    • Repository for all things related to classifying whether a text is toxic or not using data from https://github.com/TurkuNLP/wikipedia-toxicity-data-fi
      Python
      0200Updated Mar 7, 2024Mar 7, 2024
    • Emotion analysis for Finnish parliamentary speeches
      Jupyter Notebook
      0200Updated Feb 26, 2024Feb 26, 2024