
简绍
Xorbits Inference (Xinference) 是一个开源平台,用于简化各种 AI 模型的运行和集成。借助 Xinference,您可以使用任何开源 LLM、嵌入模型和多模态模型在云端或本地环境中运行推理,并创建强大的 AI 应用。
docker 安装方式
docker 下载对应的 xinference
docker pull xprobe/xinference
docker 运行,注意 路径改成自己的,
docker run -d --name xinference --gpus all -v E:/docker/xinference/models:/root/models -v E:/docker/xinference/.xinference:/root/.xinference -v E:/docker/xinference/.cache/huggingface:/root/.cache/huggingface -e XINFERENCE_HOME=/root/models -p 9997:9997 xprobe/xinference:latest xinference-local -H 0.0.0.0
-d
: 让容器在后台运行。--name xinference
: 为容器指定一个名称,这里是xinference。--gpus all
: 允许容器访问主机上的所有GPU,这对于需要进行大量计算的任务(如机器学习模型的推理)非常有用。-v E:/docker/xinference/models:/root/models
,-v E:/docker/xinference/.xinference:/root/.xinference
,-v E:/docker/xinference/.cache/huggingface:/root/.cache/huggingface
: 这些参数用于将主机的目录挂载到容器内部的特定路径,以便于数据持久化和共享。例如,第一个挂载是将主机的E:/docker/xinference/models目录映射到容器内的/root/models目录。-e XINFERENCE_HOME=/root/models
: 设置环境变量XINFERENCE_HOME,其值为/root/models,这可能是在容器内配置某些应用行为的方式。-p 9997:9997
: 将主机的9997端口映射到容器的9997端口,允许外部通过主机的该端口访问容器的服务。xprobe/xinference:latest
: 指定要使用的镜像和标签,这里使用的是xprobe/xinference镜像的latest版本。xinference-local -H 0.0.0.0
: 在容器启动时执行的命令,看起来像是以本地模式运行某个服务,并监听所有网络接口。
访问地址
对应官网
https://inference.readthedocs.io/zh-cn/latest/index.html
在 dify 中 添加 xinference 容器
docker dify 添加 docker 容器内ip 配置
http://host.docker.internal:9997
内置大语言模型
MODEL NAME |
ABILITIES |
COTNEXT_LENGTH |
DESCRIPTION |
---|---|---|---|
generate |
2048 |
Aquila2 series models are the base language models |
|
chat |
2048 |
Aquila2-chat series models are the chat models |
|
chat |
16384 |
AquilaChat2-16k series models are the long-text chat models |
|
generate |
4096 |
Baichuan2 is an open-source Transformer based LLM that is trained on both Chinese and English data. |
|
chat |
4096 |
Baichuan2-chat is a fine-tuned version of the Baichuan LLM, specializing in chatting. |
|
chat |
131072 |
C4AI Command-R(+) is a research release of a 35 and 104 billion parameter highly performant generative model. |
|
generate |
100000 |
Code-Llama is an open-source LLM trained by fine-tuning LLaMA2 for generating and discussing code. |
|
chat |
100000 |
Code-Llama-Instruct is an instruct-tuned version of the Code-Llama LLM. |
|
generate |
100000 |
Code-Llama-Python is a fine-tuned version of the Code-Llama LLM, specializing in Python. |
|
chat |
131072 |
the open-source version of the latest CodeGeeX4 model series |
|
generate |
65536 |
CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. |
|
chat |
65536 |
CodeQwen1.5 is the Code-Specific version of Qwen1.5. It is a transformer-based decoder-only language model pretrained on a large amount of data of codes. |
|
generate |
8194 |
CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. |
|
chat |
8194 |
CodeShell is a multi-language code LLM developed by the Knowledge Computing Lab of Peking University. |
|
generate |
32768 |
Codestrall-22B-v0.1 is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash |
|
chat, vision |
4096 |
The CogAgent-9B-20241220 model is based on GLM-4V-9B, a bilingual open-source VLM base model. Through data collection and optimization, multi-stage training, and strategy improvements, CogAgent-9B-20241220 achieves significant advancements in GUI perception, inference prediction accuracy, action space completeness, and task generalizability. |
|
chat, vision |
8192 |
CogVLM2 have achieved good results in many lists compared to the previous generation of CogVLM open source models. Its excellent performance can compete with some non-open source models. |
|
chat, vision |
8192 |
CogVLM2-Video achieves state-of-the-art performance on multiple video question answering tasks. |
|
chat |
32768 |
csg-wukong-1B is a 1 billion-parameter small language model(SLM) pretrained on 1T tokens. |
|
generate |
4096 |
DeepSeek LLM, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. |
|
chat |
4096 |
DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. |
|
generate |
16384 |
Deepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. |
|
chat |
16384 |
deepseek-coder-instruct is a model initialized from deepseek-coder-base and fine-tuned on 2B tokens of instruction data. |
|
chat |
163840 |
DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. |
|
chat |
131072 |
deepseek-r1-distill-llama is distilled from DeepSeek-R1 based on Llama |
|
chat |
131072 |
deepseek-r1-distill-qwen is distilled from DeepSeek-R1 based on Qwen |
|
generate |
128000 |
DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. |
|
chat |
128000 |
DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. |
|
chat |
128000 |
DeepSeek-V2-Chat-0628 is an improved version of DeepSeek-V2-Chat. |
|
chat |
128000 |
DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions. |
|
chat |
163840 |
DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. |
|
chat, vision |
4096 |
DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios. |
|
chat |
8192 |
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
|
chat |
8192 |
Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. |
|
chat, vision |
8192 |
GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
|
chat |
8192 |
The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. |
|
chat, vision |
8192 |
The GLM-Edge series is our attempt to face the end-side real-life scenarios, which consists of two sizes of large-language dialogue models and multimodal comprehension models (GLM-Edge-1.5B-Chat, GLM-Edge-4B-Chat, GLM-Edge-V-2B, GLM-Edge-V-5B). Among them, the 1.5B / 2B model is mainly for platforms such as mobile phones and cars, and the 4B / 5B model is mainly for platforms such as PCs. |
|
chat, tools |
131072 |
GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
|
chat, tools |
1048576 |
GLM4 is the open source version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI. |
|
chat |
4096 |
OpenFunctions is designed to extend Large Language Model (LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context. |
|
generate |
1024 |
GPT-2 is a Transformer-based LLM that is trained on WebTest, a 40 GB dataset of Reddit posts with 3+ upvotes. |
|
chat |
32768 |
The second generation of the InternLM model, InternLM2. |
|
chat |
32768 |
InternLM2.5 series of the InternLM model. |
|
chat |
262144 |
InternLM2.5 series of the InternLM model supports 1M long-context |
|
chat, tools |
32768 |
InternLM3 has open-sourced an 8-billion parameter instruction model, InternLM3-8B-Instruct, designed for general-purpose usage and advanced reasoning. |
|
chat, vision |
32768 |
InternVL 1.5 is an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. |
|
chat, vision |
32768 |
InternVL 2 is an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. |
|
generate |
4096 |
Llama-2 is the second generation of Llama, open-source and trained on a larger amount of data. |
|
chat |
4096 |
Llama-2-Chat is a fine-tuned version of the Llama-2 LLM, specializing in chatting. |
|
generate |
8192 |
Llama 3 is an auto-regressive language model that uses an optimized transformer architecture |
|
chat |
8192 |
The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. |
|
generate |
131072 |
Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture |
|
chat, tools |
131072 |
The Llama 3.1 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. |
|
generate, vision |
131072 |
The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image… |
|
chat, vision |
131072 |
Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image… |
|
chat, tools |
131072 |
The Llama 3.3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.. |
|
chat, tools |
32768 |
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat |
4096 |
MiniCPM is an End-Size LLM developed by ModelBest Inc. and TsinghuaNLP, with only 2.4B parameters excluding embeddings. |
|
chat, vision |
8192 |
MiniCPM-Llama3-V 2.5 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. |
|
chat, vision |
32768 |
MiniCPM-V 2.6 is the latest model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. |
|
chat |
32768 |
MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models. |
|
chat |
8192 |
Mistral-7B-Instruct is a fine-tuned version of the Mistral-7B LLM on public datasets, specializing in chatting. |
|
chat |
8192 |
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. |
|
chat |
32768 |
The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. |
|
chat |
131072 |
Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities. |
|
chat |
1024000 |
The Mistral-Nemo-Instruct-2407 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-Nemo-Base-2407 |
|
generate |
8192 |
Mistral-7B is a unmoderated Transformer based LLM claiming to outperform Llama2 on all benchmarks. |
|
chat |
65536 |
The Mixtral-8x22B-Instruct-v0.1 Large Language Model (LLM) is an instruct fine-tuned version of the Mixtral-8x22B-v0.1, specializing in chatting. |
|
chat |
32768 |
Mistral-8x7B-Instruct is a fine-tuned version of the Mistral-8x7B LLM, specializing in chatting. |
|
generate |
32768 |
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. |
|
chat, vision |
2048 |
OmniLMM is a family of open-source large multimodal models (LMMs) adept at vision & language modeling. |
|
chat |
8192 |
Openhermes 2.5 is a fine-tuned version of Mistral-7B-v0.1 on primarily GPT-4 generated data. |
|
generate |
2048 |
Opt is an open-source, decoder-only, Transformer based LLM that was designed to replicate GPT-3. |
|
chat |
4096 |
Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. |
|
chat |
4096 |
Orion-14B series models are open-source multilingual large language models trained from scratch by OrionStarAI. |
|
generate |
2048 |
Phi-2 is a 2.7B Transformer based LLM used for research on model safety, trained with data similar to Phi-1.5 but augmented with synthetic texts and curated websites. |
|
chat |
128000 |
The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. |
|
chat |
4096 |
The Phi-3-Mini-4k-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets. |
|
generate |
4096 |
Platypus-70B-instruct is a merge of garage-bAInd/Platypus2-70B and upstage/Llama-2-70b-instruct-v2. |
|
chat, vision |
32768 |
QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. |
|
chat |
32768 |
Qwen-chat is a fine-tuned version of the Qwen LLM trained with alignment techniques, specializing in chatting. |
|
chat, vision |
4096 |
Qwen-VL-Chat supports more flexible interaction, such as multiple image inputs, multi-round question answering, and creative capabilities. |
|
chat, tools |
32768 |
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. |
|
chat, tools |
32768 |
Qwen1.5-MoE is a transformer-based MoE decoder-only language model pretrained on a large amount of data. |
|
generate, audio |
32768 |
Qwen2-Audio: A large-scale audio-language model which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. |
|
chat, audio |
32768 |
Qwen2-Audio: A large-scale audio-language model which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. |
|
chat, tools |
32768 |
Qwen2 is the new series of Qwen large language models |
|
chat, tools |
32768 |
Qwen2 is the new series of Qwen large language models. |
|
chat, vision |
32768 |
Qwen2-VL: To See the World More Clearly.Qwen2-VL is the latest version of the vision language models in the Qwen model familities. |
|
generate |
32768 |
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. |
|
generate |
32768 |
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). |
|
chat, tools |
32768 |
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). |
|
chat, tools |
32768 |
Qwen2.5 is the latest series of Qwen large language models. For Qwen2.5, we release a number of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters. |
|
chat, vision |
128000 |
Qwen2.5-VL: Qwen2.5-VL is the latest version of the vision language models in the Qwen model familities. |
|
chat |
32768 |
QwQ-32B-Preview is an experimental research model developed by the Qwen Team, focused on advancing AI reasoning capabilities. |
|
generate |
8192 |
We introduce SeaLLM-7B-v2, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages |
|
generate |
8192 |
We introduce SeaLLM-7B-v2.5, the state-of-the-art multilingual LLM for Southeast Asian (SEA) languages |
|
generate |
4096 |
Skywork is a series of large models developed by the Kunlun Group · Skywork team. |
|
generate |
4096 |
Skywork is a series of large models developed by the Kunlun Group · Skywork team. |
|
chat |
4096 |
We introduce Starling-7B, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). The model harnesses the power of our new GPT-4 labeled ranking dataset |
|
chat |
8192 |
The TeleChat is a large language model developed and trained by China Telecom Artificial Intelligence Technology Co., LTD. The 7B model base is trained with 1.5 trillion Tokens and 3 trillion Tokens and Chinese high-quality corpus. |
|
generate |
2048 |
The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. |
|
chat |
100000 |
||
chat |
2048 |
WizardMath is an open-source LLM trained by fine-tuning Llama2 with Evol-Instruct, specializing in math. |
|
generate |
2048 |
XVERSE is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. |
|
chat |
2048 |
XVERSEB-Chat is the aligned version of model XVERSE. |
|
generate |
4096 |
The Yi series models are large language models trained from scratch by developers at 01.AI. |
|
generate |
4096 |
Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
|
chat |
4096 |
Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
|
chat |
16384 |
Yi-1.5 is an upgraded version of Yi. It is continuously pre-trained on Yi with a high-quality corpus of 500B tokens and fine-tuned on 3M diverse fine-tuning samples. |
|
generate |
262144 |
The Yi series models are large language models trained from scratch by developers at 01.AI. |
|
chat |
4096 |
The Yi series models are large language models trained from scratch by developers at 01.AI. |
|
generate |
131072 |
Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.Excelling in long-context understanding with a maximum context length of 128K tokens.Supporting 52 major programming languages, including popular ones such as Java, Python, JavaScript, and C++. |
|
chat |
131072 |
Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters.Excelling in long-context understanding with a maximum context length of 128K tokens.Supporting 52 major programming languages, including popular ones such as Java, Python, JavaScript, and C++. |
|
chat, vision |
4096 |
Yi Vision Language (Yi-VL) model is the open-source, multimodal version of the Yi Large Language Model (LLM) series, enabling content comprehension, recognition, and multi-round conversations about images. |
嵌入模型
- bce-embedding-base_v1
- bge-base-en
- bge-base-en-v1.5
- bge-base-zh
- bge-base-zh-v1.5
- bge-large-en
- bge-large-en-v1.5
- bge-large-zh
- bge-large-zh-noinstruct
- bge-large-zh-v1.5
- bge-m3
- bge-small-en-v1.5
- bge-small-zh
- bge-small-zh-v1.5
- e5-large-v2
- gte-base
- gte-large
- gte-Qwen2
- jina-clip-v2
- jina-embeddings-v2-base-en
- jina-embeddings-v2-base-zh
- jina-embeddings-v2-small-en
- jina-embeddings-v3
- m3e-base
- m3e-large
- m3e-small
- multilingual-e5-large
- text2vec-base-chinese
- text2vec-base-chinese-paraphrase
- text2vec-base-chinese-sentence
- text2vec-base-multilingual
- text2vec-large-chinese
图像模型
音频模型
以下是 Xinference 中内置的音频模型列表:
- Belle-distilwhisper-large-v2-zh
- Belle-whisper-large-v2-zh
- Belle-whisper-large-v3-zh
- ChatTTS
- CosyVoice-300M
- CosyVoice-300M-Instruct
- CosyVoice-300M-SFT
- CosyVoice2-0.5B
- F5-TTS
- F5-TTS-MLX
- FishSpeech-1.5
- Kokoro-82M
- MeloTTS-Chinese
- MeloTTS-English
- MeloTTS-English-v2
- MeloTTS-English-v3
- MeloTTS-French
- MeloTTS-Japanese
- MeloTTS-Korean
- MeloTTS-Spanish
- SenseVoiceSmall
- whisper-base
- whisper-base-mlx
- whisper-base.en
- whisper-base.en-mlx
- whisper-large-v3
- whisper-large-v3-mlx
- whisper-large-v3-turbo
- whisper-large-v3-turbo-mlx
- whisper-medium
- whisper-medium-mlx
- whisper-medium.en
- whisper-medium.en-mlx
- whisper-small
- whisper-small-mlx
- whisper-small.en
- whisper-small.en-mlx
- whisper-tiny
- whisper-tiny-mlx
- whisper-tiny.en
- whisper-tiny.en-mlx
重排序模型
以下是 Xinference 中内置的重排序模型列表:
视频模型
以下是 Xinference 中内置的视频模型列表: