Releases: xorbitsai/inference
Releases · xorbitsai/inference
v1.2.0
What's new in 1.2.0 (2025-01-10)
These are the changes in inference v1.2.0.
New features
- FEAT: support HunyuanVideo by @qinxuye in #2721
- FEAT: support hunyuan-dit text2image by @qinxuye in #2727
- FEAT: support cline for vllm engine by @hwzhuhao in #2734
- FEAT: [UI] theme switch by @Minamiyama in #1335
- FEAT: support qwen2vl run on ascend npu by @Xu-pixel in #2741
- FEAT: [UI] Add language toggle for i18n support. by @yiboyasss in #2744
- FEAT: Support cogagent-9b by @amumu96 in #2740
- FEAT: Xavier: Share KV cache between VLLM replicas by @ChengjieLi28 in #2732
- FEAT: [UI] Add gguf_quantization, gguf_model_path, and cpu_offload for image models. by @yiboyasss in #2753
- FEAT: Support Marco-o1 by @Jun-Howie in #2749
Enhancements
- ENH: [UI] Update Button Style and Interaction Logic for Editing Cache in Model Card. by @yiboyasss in #2746
- ENH: Improve error message by @codingl2k1 in #2738
Bug fixes
- BUG: adapt mlx-vlm v0.1.7 by @qinxuye in #2724
- BUG: pin mlx<0.22.0 to prevent qwen2_vl failing in mlx-vlm by @qinxuye in #2752
Others
- FIX: [UI] Resolve bug preventing '/' input in model_path. by @yiboyasss in #2747
- FIX: [UI] Fix dark mode background bug. by @yiboyasss in #2748
- CHORE: Update new models in readme by @codingl2k1 in #2713
New Contributors
Full Changelog: v1.1.1...v1.2.0
v1.1.1
What's new in 1.1.1 (2024-12-27)
These are the changes in inference v1.1.1.
New features
- FEAT: support F5-TTS-MLX by @qinxuye in #2671
- FEAT: Support qwen2.5-coder-instruct model for tool calls by @Timmy-web in #2681
- FEAT: Support minicpm-4B on vllm by @Jun-Howie in #2697
- FEAT: support scheduling-policy for vllm by @hwzhuhao in #2700
- FEAT: Support QvQ-72B-Preview by @Jun-Howie in #2712
- FEAT: support SD3.5 series model by @qinxuye in #2706
Enhancements
- ENH: Guided Decoding OpenAIClient compatibility by @wxiwnd in #2673
- ENH: resample f5-tts-mlx ref audio when sample rate not synching. by @qinxuye in #2678
- ENH: support no images for MLX vlm by @qinxuye in #2670
- ENH: Update fish speech 1.5 by @codingl2k1 in #2672
- ENH: Update cosyvoice 2 by @codingl2k1 in #2684
- REF: Reduce code redundancy by setting default values by @pengjunfeng11 in #2711
Bug fixes
- BUG: Fix f5tts audio ref by @codingl2k1 in #2680
- BUG:
glm4-chat
cannot apply for continuous batching with transformers backend by @ChengjieLi28 in #2695
New Contributors
- @Timmy-web made their first contribution in #2681
Full Changelog: v1.1.0...v1.1.1
v1.1.0
What's new in 1.1.0 (2024-12-13)
These are the changes in inference v1.1.0.
New features
- FEAT: Support F5 TTS by @codingl2k1 in #2626
- FEAT: [UI] Add a hint for model running. by @yiboyasss in #2657
- FEAT: support VL models for MLX by @qinxuye in #2638
- FEAT: Add support for CLIP model by @Second222None in #2637
- FEAT: support llama-3.3-instruct by @qinxuye in #2661
Enhancements
- ENH: Optimize error message when user parameters are passed incorrectly by @namecd in #2623
- ENH: bypass the sampling parameter skip_special_tokens to vLLM backend by @zjuyzj in #2655
- ENH: unify prompt_text as cosyvoice for fish speech by @qinxuye in #2658
- ENH: Update glm4 chat model to new weights by @codingl2k1 in #2660
- ENH: upgrade sglang in Docker by @amumu96 in #2668
Bug fixes
- BUG: Cleanup Isolation tasks by @codingl2k1 in #2603
- BUG: fix qwq gguf download hub for modelscope by @redreamality in #2647
- BUG: fix ImportError when optional dependency FlagEmbedding is not installed by @zjuyzj in #2649
- BUG: use stream_generate in MLX by @qinxuye in #2635
- BUG:
stop
parameter leads to failure withtransformers
backend by @ChengjieLi28 in #2663 - BUG: fix FishSpeech Negative code found by @themanforfree in #2667
Documentation
- DOC: update new models by @qinxuye in #2632
- DOC: add doc about offline usage for SenseVoiceSmall by @qinxuye in #2654
Others
- FIX: fix launching bge-m3 with hybrid mode by @pengjunfeng11 in #2641
New Contributors
- @namecd made their first contribution in #2623
- @redreamality made their first contribution in #2647
- @Second222None made their first contribution in #2637
- @themanforfree made their first contribution in #2667
Full Changelog: v1.0.1...v1.1.0
v1.0.1
What's new in 1.0.1 (2024-11-29)
These are the changes in inference v1.0.1.
New features
- FEAT: Fish speech stream by @codingl2k1 in #2562
- FEAT: support sparse vector for bge-m3 by @pengjunfeng11 in #2540
- FEAT: whisper support for Mac MLX by @qinxuye in #2576
- FEAT: support guided decoding for vllm async engine by @wxiwnd in #2391
- FEAT: support QwQ-32B-Preview by @qinxuye in #2602
- FEAT: support glm-edge-chat model by @amumu96 in #2582
Enhancements
- ENH: Support fish speech reference audio by @codingl2k1 in #2542
Bug fixes
- BUG: GTE-qwen2 Embedding Dimension error by @cyhasuka in #2565
- BUG: request_limits does not work with streaming interfaces by @ChengjieLi28 in #2571
- BUG: Fix Codestral v0.1 URI for Pytorch Format by @danialcheung in #2590
- BUG: Correct the input bytes data by langchain_openai #2589 by @xiyuan-lee in #2600
Documentation
New Contributors
- @pengjunfeng11 made their first contribution in #2540
- @danialcheung made their first contribution in #2590
- @xiyuan-lee made their first contribution in #2600
Full Changelog: v1.0.0...v1.0.1
v1.0.0
What's new in 1.0.0 (2024-11-15)
These are the changes in inference v1.0.0.
New features
- FEAT: Basic cancel support for image model by @codingl2k1 in #2528
- FEAT: Add qwen2.5-coder 0.5B 1.5B 3B 14B 32B by @frostyplanet in #2543
- FEAT: support kvcache in multi-round chat for MLX by @qinxuye in #2534
Enhancements
- ENH: add normalize to rerank model by @hustyichi in #2509
- ENH: Update fish audio by @codingl2k1 in #2555
Bug fixes
Documentation
- DOC: Add paper citation by @luweizheng in #2533
Full Changelog: v0.16.3...v1.0.0
v0.16.3
What's new in 0.16.3 (2024-11-08)
These are the changes in inference v0.16.3.
New features
- feat: Add support for Llama 3.2-Vision models by @vikrantrathore in #2376
Enhancements
- ENH: Display model name in process by @frostyplanet in #1891
- REF: Remove replica total count in internal
replica_model_uid
by @ChengjieLi28 in #2516
Bug fixes
- BUG: Compat with ChatTTS 0.2.1 by @codingl2k1 in #2520
- BUG: transformers logs missing by @ChengjieLi28 in #2530
Full Changelog: v0.16.2...v0.16.3
v0.16.2
What's new in 0.16.2 (2024-11-01)
These are the changes in inference v0.16.2.
New features
- FEAT: add download from openmind_hub by @cookieyyds in #2504
Enhancements
- BLD: Remove Python 3.8 & Support Python 3.12 by @ChengjieLi28 in #2503
Bug fixes
- BUG: fix bge-reranker-v2-minicpm-layerwise rerank issue by @hustyichi in #2495
Documentation
- DOC: modify NPU doc by @qinxuye in #2485
- DOC: Add doc for ocr by @codingl2k1 in #2492
New Contributors
- @hustyichi made their first contribution in #2495
- @cookieyyds made their first contribution in #2504
Full Changelog: v0.16.1...v0.16.2
v0.16.1
What's new in 0.16.1 (2024-10-25)
These are the changes in inference v0.16.1.
New features
- FEAT: Add support for Qwen/Qwen2.5-Coder-7B-Instruct gptq format by @frostyplanet in #2408
- FEAT: Support GOT-OCR2_0 by @codingl2k1 in #2458
- FEAT: [UI] Image model with the lora_config. by @yiboyasss in #2482
- FEAT: added MLX support for Flux.1 by @qinxuye in #2459
Enhancements
- ENH: Support ChatTTS 0.2 by @codingl2k1 in #2449
- ENH: Pending queue for concurrent requests by @codingl2k1 in #2473
Bug fixes
- BUG: Remove duplicated call of model_install by @frostyplanet in #2457
- BUG: fix embedding model gte-Qwen2 dimensions by @JinCheng666 in #2479
Documentation
New Contributors
- @JinCheng666 made their first contribution in #2479
Full Changelog: v0.16.0...v0.16.1
v0.16.0
What's new in 0.16.0 (2024-10-18)
These are the changes in inference v0.16.0.
New features
- FEAT: Adding support for awq/gptq vLLM inference to VisionModel such as Qwen2-VL by @cyhasuka in #2445
- FEAT: Dynamic batching for the state-of-the-art FLUX.1
text_to_image
interface by @ChengjieLi28 in #2380 - FEAT: added MLX for qwen2.5-instruct by @qinxuye in #2444
Enhancements
- ENH: Speed up cli interaction by @frostyplanet in #2443
- REF: Enable continuous batching for LLM with transformers engine by default by @ChengjieLi28 in #2437
Documentation
New Contributors
Full Changelog: v0.15.4...v0.16.0
v0.15.4
What's new in 0.15.4 (2024-10-12)
These are the changes in inference v0.15.4.
New features
- FEAT: Llama 3.1 Instruct support tool call by @codingl2k1 in #2388
- FEAT: qwen2.5 instruct tool call by @codingl2k1 in #2393
- FEAT: add whisper-large-v3-turbo audio model by @hwzhuhao in #2409
- FEAT: Add environment variable setting to increase the retry attempts after model download failures by @hwzhuhao in #2411
- FEAT: support getting progress for image model by @qinxuye in #2395
- FEAT: support qwenvl2 vllm engine by @amumu96 in #2428
Enhancements
- ENH: Launch the ChatTTS model with kwargs by @codingl2k1 in #2425
- REF: refactor controlnet for image model by @qinxuye in #2346
Bug fixes
- BUG: Pin ChatTTS<0.2 by @codingl2k1 in #2419
- BUG: tool call streaming output has duplicated list by @ChengjieLi28 in #2416
Full Changelog: v0.15.3...v0.15.4