用FastAPI创建一个输入提示词和所使用的LLM名称和向量搜索方式的API,返回LLM输出文本,其中用到OpenAI GPT 4o3和AWS Bedrock上的多个LLM模型的API,通过内部的类配置使用的模型和向量数据搜索类型,向量数据搜索类型包括faiss向量数据库和AWS Kendra向量数据库搜索服务,这样的逻辑用设计模式中的工厂模式实现,用Python实现Docker打包项目Python代码并在AWS ECR上注册,在AWS ECS容器中运行,已注册则直接使用现有的。
使用工厂模式实现LLM和向量搜索的灵活切换。以下是实现步骤:
- 修改后的项目结构:
fastapi-on-ecs/
├─ app/
│ ├─ src/
│ │ ├─ factories.py
│ │ ├─ llms/
│ │ │ ├─ base.py
│ │ │ ├─ openai.py
│ │ │ ├─ bedrock.py
│ │ ├─ vector_db/
│ │ │ ├─ base.py
│ │ │ ├─ faiss.py
│ │ │ ├─ kendra.py
│ ├─ main.py
│ ├─ Dockerfile
│ ├─ deploy.sh
│ ├─ requirements.txt
- 修改后的main.py:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from src.factories import LLMFactory, VectorSearchFactory
app = FastAPI()
class InferenceRequest(BaseModel):
prompt: str
llm_name: str
vector_search_type: str
@app.post("/generate")
async def generate_text(request: InferenceRequest):
try:
# 向量搜索
vector_search = VectorSearchFactory.create(request.vector_search_type)
context = vector_search.search(request.prompt)
# LLM推理
llm = LLMFactory.create(request.llm_name)
response = llm.generate(f"Context: {context}\nPrompt: {request.prompt}")
return {"response": response}
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/models")
async def list_models():
return {
"llm_models": LLMFactory.list_models(),
"vector_dbs": VectorSearchFactory.list_vector_dbs()
}
@app.get("/")
def root():
return {"message": "Welcome to LLM Inference API"}
- 工厂实现 (src/factories.py):
from typing import Dict, Type
from src.llms.base import BaseLLM
from src.llms.openai import OpenAIGPT
from src.llms.bedrock import BedrockLLM
from src.vector_db.base import BaseVectorDB
from src.vector_db.faiss import FAISSDB
from src.vector_db.kendra import KendraDB
class LLMFactory:
_models: Dict[str, Type[BaseLLM]] = {
"gpt-4o3": OpenAIGPT,
"ai21-jamba": BedrockLLM,
"claude-3-opus": BedrockLLM,
# 其他模型映射...
}
@classmethod
def create(cls, model_name: str) -> BaseLLM:
if model_name not in cls._models:
raise ValueError(f"Unsupported model: {model_name}")
return cls._models[model_name](model_name)
@classmethod
def list_models(cls):
return list(cls._models.keys())
class VectorSearchFactory:
_dbs: Dict[str, Type[BaseVectorDB]] = {
"faiss": FAISSDB,
"kendra": KendraDB
}
@classmethod
def create(cls, db_type: str) -> BaseVectorDB:
if db_type not in cls._dbs:
raise ValueError(f"Unsupported vector DB: {db_type}")
return cls._dbs[db_type]()
@classmethod
def list_vector_dbs(cls):
return list(cls._dbs.keys())
- 基础类实现 (src/llms/base.py):
from abc import ABC, abstractmethod
class BaseLLM(ABC):
@abstractmethod
def generate(self, prompt: str) -> str:
pass
- OpenAI实现 (src/llms/openai.py):
import openai
from .base import BaseLLM
class OpenAIGPT(BaseLLM):
def __init__(self, model_name: str):
self.model_name = model_name
def generate(self, prompt: str) -> str:
response = openai.ChatCompletion.create(
model=self.model_name,
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
- Bedrock实现 (src/llms/bedrock.py):
import boto3
from .base import BaseLLM
class BedrockLLM(BaseLLM):
def __init__(self, model_id: str):
self.client = boto3.client('bedrock-runtime')
self.model_id = model_id
def generate(self, prompt: str) -> str:
body = {
"prompt": prompt,
"maxTokens": 1024
}
response = self.client.invoke_model(
modelId=self.model_id,
body=json.dumps(body)
return json.loads(response['body'].read())['completion']
- 向量数据库实现示例 (src/vector_db/faiss.py):
import faiss
import numpy as np
from .base import BaseVectorDB
class FAISSDB(BaseVectorDB):
def __init__(self):
self.index = faiss.IndexFlatL2(768)
# 加载预构建的索引
def search(self, query: str) -> str:
# 实现向量搜索逻辑
return "FAISS search results"
- 更新requirements.txt:
fastapi==0.109.2
uvicorn==0.27.1
openai>=1.0.0
boto3>=1.34.0
faiss-cpu>=1.7.4
部署步骤:
- 构建并推送Docker镜像:
export AWS_ACCOUNT_ID=123456789012
export AWS_REGION=us-west-2
export REGISTRY_NAME=llm-api
export TAG=latest
# 构建镜像
docker build --platform linux/amd64 -t $REGISTRY_NAME .
# 推送镜像
aws ecr get-login-password | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
docker tag $REGISTRY_NAME $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$REGISTRY_NAME:$TAG
docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$REGISTRY_NAME:$TAG
- ECS任务定义需要包含以下权限(通过IAM角色):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "bedrock:*",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "kendra:*",
"Resource": "*"
}
]
}
该实现的关键点:
- 使用工厂模式灵活切换模型和向量数据库
- 通过boto3集成AWS Bedrock服务
- 提供标准化的API接口
- 容器化部署支持
- 完善的错误处理机制
测试API:
curl -X POST http://<ALB_DNS>/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Explain quantum computing",
"llm_name": "claude-3-opus",
"vector_search_type": "kendra"
}'