Skip to content
Change the repository type filter

All

    Repositories list

    • litellm

      Public
      Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
      Python
      Other
      1.9k000Updated Jan 11, 2025Jan 11, 2025
    • A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      5.1k002Updated Jan 9, 2025Jan 9, 2025
    • Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
      C++
      Other
      139000Updated Dec 20, 2024Dec 20, 2024
    • vllm

      Public
      vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      5.1k8820Updated Dec 20, 2024Dec 20, 2024
    • LMCache

      Public
      ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
      Python
      Apache License 2.0
      36000Updated Dec 19, 2024Dec 19, 2024
    • The driver for LMCache core to run in vLLM
      Python
      Apache License 2.0
      13000Updated Dec 19, 2024Dec 19, 2024
    • Python
      7000Updated Dec 19, 2024Dec 19, 2024
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      Apache License 2.0
      134000Updated Dec 16, 2024Dec 16, 2024
    • ROCm Implementation of torchac_cuda from LMCache
      Cuda
      1000Updated Dec 16, 2024Dec 16, 2024
    • etalon

      Public
      LLM Serving Performance Evaluation Harness
      Python
      Apache License 2.0
      7000Updated Dec 16, 2024Dec 16, 2024
    • Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
      Python
      MIT License
      119001Updated Dec 7, 2024Dec 7, 2024
    • Efficient Triton Kernels for LLM Training
      Python
      BSD 2-Clause "Simplified" License
      239000Updated Dec 6, 2024Dec 6, 2024
    • Efficient LLM Inference over Long Sequences
      Python
      Apache License 2.0
      17000Updated Nov 29, 2024Nov 29, 2024
    • JamAIBase

      Public
      The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
      Python
      Apache License 2.0
      2476510Updated Nov 29, 2024Nov 29, 2024
    • A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel
      Python
      4000Updated Nov 24, 2024Nov 24, 2024
    • ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
      Cuda
      Apache License 2.0
      44000Updated Nov 21, 2024Nov 21, 2024
    • Go ahead and axolotl questions
      Python
      Apache License 2.0
      914000Updated Nov 16, 2024Nov 16, 2024
    • Typescript Documentation of JamAISDK
      HTML
      0000Updated Nov 14, 2024Nov 14, 2024
    • skypilot

      Public
      SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
      Python
      Apache License 2.0
      547000Updated Nov 7, 2024Nov 7, 2024
    • This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
      Shell
      Apache License 2.0
      0100Updated Oct 26, 2024Oct 26, 2024
    • ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
      Python
      BSD 3-Clause "New" or "Revised" License
      1.4k000Updated Oct 26, 2024Oct 26, 2024
    • A Python client for the Unstructured hosted API
      Python
      MIT License
      17001Updated Oct 14, 2024Oct 14, 2024
    • EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
      Python
      12972Updated Oct 6, 2024Oct 6, 2024
    • Go
      1000Updated Sep 26, 2024Sep 26, 2024
    • PowerToys

      Public
      Windows system utilities to maximize productivity
      C#
      MIT License
      6.7k000Updated Aug 9, 2024Aug 9, 2024
    • Arena-Hard-Auto: An automatic LLM benchmark.
      Jupyter Notebook
      Apache License 2.0
      84000Updated Jul 15, 2024Jul 15, 2024
    • Python
      Apache License 2.0
      132000Updated Jul 11, 2024Jul 11, 2024
    • Python
      Apache License 2.0
      54000Updated Jul 9, 2024Jul 9, 2024
    • Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
      HTML
      Apache License 2.0
      816100Updated Jul 9, 2024Jul 9, 2024
    • workshop

      Public
      Jupyter Notebook
      0000Updated Jun 25, 2024Jun 25, 2024