OpenAI API - Concept 核心概念说明-EW帮帮网

文章目录

文本生成与提示
图像与视觉
Audio and speech
结构化输出
会话状态
流式 API 响应
文件输入
推理模型

文本生成与提示

学习如何提示模型生成文本。

https://platform.openai.com/docs/guides/text

使用 OpenAI API，您可以使用大型语言模型根据提示生成文本，就像您使用ChatGPT一样。模型可以生成几乎任何类型的文本响应——如代码、数学方程式、结构化 JSON 数据或类似人类的散文。
以下是使用 Chat Completions API 的简单示例。
从简单的提示生成文本

import OpenAI from "openai";
const client = new OpenAI();

const completion = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
        {
            role: "user",
            content: "Write a one-sentence bedtime story about a unicorn.",
        },
    ],
});

console.log(completion.choices[0].message.content);

模型生成的内容数组位于响应的 choices 属性中。在这个简单的例子中，我们只有一个输出，看起来像这样：

[
    {
        "index": 0,
        "message": {
            "role": "assistant",
            "content": "Under the soft glow of the moon, Luna the unicorn danced through fields of twinkling stardust, leaving trails of dreams for every child asleep.",
            "refusal": null
        },
        "logprobs": null,
        "finish_reason": "stop"
    }
]

除了纯文本外，您还可以让模型以 JSON 格式返回结构化数据——此功能称为结构化输出。

消息角色和指令遵循

您可以使用 消息角色 以不同的权限级别向模型提供指令（提示）。

使用不同的角色生成带有消息的文本

import OpenAI from "openai";
const client = new OpenAI();

const completion = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [
        {
            role: "developer",
            content: "Talk like a pirate."
        },
        {
            role: "user",
            content: "Are semicolons optional in JavaScript?",
        },
    ],
});

console.log(completion.choices[0].message);

The OpenAI 模型规范描述了我们的模型如何对不同角色的消息赋予不同级别的优先级。

`developer`	`user`	`assistant`
`开发者`消息是由应用程序开发者提供的指令，其权重高于`用户`消息。	`用户`消息是由最终用户提供的指令，其权重低于`开发者`消息。	由模型生成的消息具有`助手`角色。

一段多轮对话可能包含几种类型的消息，以及你和服务器提供的其他内容类型。了解如何管理会话状态的更多信息。

选择模型

在通过API生成内容时，一个关键的选择是您想使用哪个模型 - 上面的代码示例中的 model 参数。您可以在这里找到所有可用模型的完整列表。

我应该选择哪个模型？

选择文本生成模型时，以下是一些需要考虑的因素。

推理模型 会生成一系列内部思维链来分析输入提示，擅长理解复杂任务和多步骤规划。它们通常比 GPT 模型更慢，且使用成本更高。
GPT 模型 快速、成本效益高，且高度智能，但需要更明确的指令来完成任务的说明。
大模型和小（迷你）模型 在速度、成本和智能之间提供了权衡。大模型在理解提示和解决跨领域问题方面更有效，而小模型通常更快且更便宜。像 GPT-4o mini 这样的小型模型也可以通过微调和蒸馏大型模型的成果来训练，从而在特定任务上表现出色。

当不确定时，gpt-4o 提供了智能、速度和成本效益的坚实组合。

提示工程

为模型生成内容创建有效的指令是一个称为提示工程的过程。由于从模型生成的内容是非确定性的，因此要建立一个能从模型生成正确内容的提示，需要艺术和科学的结合。你可以在这里找到关于提示工程的更完整的探讨，但这里有一些一般性的指导原则：

对模型的指示要详细，以消除您希望模型如何回应的模糊性。
向模型举例说明您期望的输入类型，以及您希望该输入产生的输出类型–这种技术称为few-shot learning。
在使用推理模型时，用目标和预期结果来描述要完成的任务，而不是如何完成任务的具体步骤说明。
为你的提示创建评估（evals），使用与你期望在生产中看到的数据相似的测试数据。由于不同模型的结果固有差异，因此使用评估来查看提示的执行情况是确保提示按预期运行的最佳方法。

要想从模型中获得出色的结果，通常只需对提示进行迭代，但您也可以探索微调

图像与视觉

了解如何使用视觉能力来理解图像。

https://platform.openai.com/docs/guides/images

愿景是指使用图像作为模型的输入提示，并根据图像内的数据进行响应的能力。了解哪些模型具备视觉能力请访问模型页面。要生成图像作为输出，请查看我们的专门的图像生成模型。
您可以将图像作为生成请求的输入，通过提供图像文件的完整合格URL，或者提供作为Base64编码的数据URL的图像。

传递一个 URL

分析图像内容

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                },
            },
        ],
    }],
)

print(response.choices[0].message.content)

传递一个 Base64 编码的图像

import base64
from openai import OpenAI

client = OpenAI()

# Function to encode the image
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

# Path to your image
image_path = "path_to_your_image.jpg"

# Getting the Base64 string
base64_image = encode_image(image_path)

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                { "type": "text", "text": "what's in this image?" },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}",
                    },
                },
            ],
        }
    ],
)

print(completion.choices[0].message.content)

图像输入要求

输入图像必须满足以下要求才能在API中使用。

文件类型	大小限制	其他要求
- PNG (.png) - JPEG (.jpeg 和 .jpg) - WEBP (.webp) - 非动画 GIF (.gif)	每张图像最多 20MB - 低分辨率： 512px x 512px - 高分辨率： 768px（短边） x 2000px（长边）	- 无水印或标志 - 无文本 - 无 NSFW 内容 - 足够清晰，以便人类理解

指定图像输入细节级别

detail 参数告诉模型在处理和理解图像时使用何种细节级别（low、high 或 auto 以让模型决定）。如果您省略该参数，模型将使用 auto。将其放在您的 image_url 之后，如下所示：

"image_url": {
    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
    "detail": "high",
},

您可以通过使用 "detail": "low" 来节省token 并加快响应速度。这允许模型以85个token 的预算处理图像。模型接收图像的512px x 512px低分辨率版本。如果您的情况不需要模型以高分辨率细节查看（例如，如果您是在询问图像中的主要形状或颜色），这将是可以的。

或者通过使用 "detail": "high" 给模型更多细节以生成其理解。这允许模型查看低分辨率图像（使用85个标记），然后使用每个512px x 512px图块170个标记创建详细的裁剪。

请注意，上述图像处理的代币预算目前不适用于 GPT-4o mini 模型，但图像处理成本与 GPT-4o 相当。对于图像处理的精确和最新估计，请使用此处提供的图像定价计算器这里。

提供多个图像输入

Chat Completions API 可以接收和处理 Base64 编码格式或图像 URL 的多个图像输入。该模型会处理每张图片，并使用来自所有图片的信息生成对提示的回复。

多个图像输入

from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What are in these images? Is there any difference between them?",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    },
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
                    },
                },
            ],
        }
    ],
    max_tokens=300,
)
print(response.choices[0])

这里，模型被展示了同一图像的两个副本。它可以回答关于两个图像或每个图像独立的问题。

局限性

虽然具有视觉能力的模型非常强大并且可以在许多情况下使用，但了解这些模型的局限性是非常重要的。以下是已知的局限性：

医学图像：该模型不适用于解释如CT扫描等专业医学图像，也不应用于医疗建议。
非英语：当处理包含非拉丁字母文字的图像时，例如日语或韩语，该模型可能表现不佳。
小文本：放大图像中的文本以提高可读性，但避免裁剪重要细节。
旋转：该模型可能误解旋转或颠倒的文字和图像。
视觉元素：该模型可能难以理解颜色或样式——如实线、虚线或点线——变化的图表或文本。
空间推理：该模型在需要精确空间定位的任务上存在困难，例如识别棋盘位置。
准确性：在特定场景下，该模型可能生成不正确的描述或标题。
图像形状：该模型在处理全景和鱼眼图像时存在困难。
元数据和调整大小：该模型不处理原始文件名或元数据，并且图像在分析之前被调整大小，影响其原始尺寸。
计数：该模型可能对图像中的对象给出近似计数。
CAPTCHAS：出于安全考虑，我们的系统阻止提交CAPTCHAS。

计算成本

图像输入按token 计费，就像文本输入一样。图像的token 成本由两个因素决定：大小和细节。

任何具有 "detail": "low" 的图像成本为 85 token 。要计算具有 "detail": "high" 的图像的成本，我们执行以下操作：

缩放到适合 2048px x 2048px 的正方形，保持原始宽高比
缩放到图像的最短边长为 768px
计算图像中 512px 方格的数量——每个方格成本为 **170 token **
将 **85 token ** 添加到总成本

成本计算示例

在 "detail": "high" 模式下，一个 1024 x 1024 的方形图像需要 765 个标记
- 1024 小于 2048，所以没有初始调整大小。
- 最短的边是 1024，所以我们把图像缩小到 768 x 768。
- 需要 4 个 512px 的方形瓷砖来表示图像，所以最终的标记成本是 170 * 4 + 85 = 765。
一个 2048 x 4096 的图像在 "detail": "high" 模式下需要 1105 个标记
- 我们将图像缩小到 1024 x 2048 以适应 2048 平方的尺寸。
- 最短的边是 1024，所以我们进一步将其缩小到 768 x 1536。
- 需要 6 个 512px 的瓷砖，所以最终的标记成本是 170 * 6 + 85 = 1105。
一个 4096 x 8192 的图像在 "detail": "low" 模式下最多需要 85 个标记
- 无论输入大小如何，低细节图像都是固定成本。

我们在token 级别处理图像，因此我们处理的每张图像都会计入您的每分钟token (TPM) 上限。请参阅 “计算成本 ”部分，了解用于确定每幅图像token 数的公式详情。

Audio and speech

https://platform.openai.com/docs/guides/audio

探索 OpenAI API 中的音频和语音功能。

OpenAI API 提供一系列音频功能。如果您知道自己想要构建什么，请在下面找到您的用例，开始使用。如果您不确定从哪里开始，请阅读本页的概述。

使用音频构建

构建音频引擎
构建交互式语音驱动应用程序
转录音频
将语音即时准确地转换为文本。
说话文本
将文本实时转换为自然语音。

音频用例之旅

LLMs 可以通过使用声音作为输入，创建声音作为输出，或两者兼具来处理音频。OpenAI 有几个 API 端点可以帮助您构建音频应用程序或语音代理。

语音代理

语音代理能够理解音频来处理任务并以自然语言进行回应。有两种主要方法来构建语音代理：一种是使用语音到语音模型和实时API，另一种是通过连接语音到文本模型、文本语言模型来处理请求，以及文本到语音模型来回应。语音到语音的方式具有更低的延迟和更自然的表现，但将语音代理串联起来是扩展基于文本的代理为语音代理的可靠方法。如果您已经使用了代理SDK，您可以使用串联方法扩展您现有的代理以实现语音功能。

流式音频

实时处理音频以构建语音代理和其他低延迟应用，包括转录用例。您可以使用实时API 在模型中流式传输音频。我们先进的语音模型提供自动语音识别，以提高准确性、低延迟交互和多语言支持。

文本转语音

要将文本转换为语音，请使用 Audio API 的 audio/speech 端点。与此端点兼容的模型有 gpt-4o-mini-tts、tts-1 和 tts-1-hd。使用 gpt-4o-mini-tts，您可以要求模型以特定的方式或音调说话。

语音转文本

对于语音转文本，使用音频 API audio/transcriptions 端点。与此端点兼容的模型有 gpt-4o-transcribe、gpt-4o-mini-transcribe 和 whisper-1。使用流式传输，您可以持续传入音频并获取连续的文本流返回。

选择正确的API

有多种API可用于转录或生成音频：

API	支持的模态	流式传输支持
Realtime API	音频和文本输入输出	音频流输入输出
Chat Completions API	音频和文本输入输出	音频流出
Transcription API	音频输入	音频流出
Speech API	文本输入和音频输出	音频流出

通用用途API与专用API

主要区别在于通用用途API与专用API。使用实时和聊天补全API，您可以使用我们最新模型的本地音频理解和生成能力，并将它们与其他功能如函数调用相结合。这些API可用于广泛的用例，您可以选择您想要使用的模型。

另一方面，转录、翻译和语音API是专门针对特定模型进行优化的，并且仅用于单一目的。

与模型对话与控制脚本

选择正确的API的另一种方法是问自己你需要多少控制。为了设计对话交互，其中模型通过语音思考和回应，使用实时或聊天补全API，具体取决于你是否需要低延迟。

你事先不知道模型会说什么，因为它会直接生成音频响应，但对话会感觉很自然。

为了获得更多控制和可预测性，你可以使用语音转文本/LLM/文本转语音模式，这样你就可以确切知道模型会说什么，并可以控制响应。请注意，使用这种方法会增加延迟。

这就是音频API的作用：将LLM与audio/transcriptions和audio/speech端点配对，以获取用户的语音输入，处理并生成文本响应，然后将其转换为用户可以听到的语音。

将音频添加到现有应用

例如 GPT-4o 或 GPT-4o mini 这样的模型是原生多模态的，这意味着它们可以理解和生成多种模态作为输入和输出。

如果您已经有一个基于文本的 LLM 应用程序，并使用 Chat Completions 端点，您可能希望添加音频功能。例如，如果您的聊天应用程序支持文本输入，您可以添加音频输入和输出——只需在 modalities 数组中包含 audio 并使用音频模型，如 gpt-4o-audio-preview。

音频目前尚不支持在 Responses API 中。

模型输出的音频

创建一个类似于人类的音频响应来应对提示

import base64
from openai import OpenAI

client = OpenAI()

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Is a golden retriever a good family dog?"
        }
    ]
)

print(completion.choices[0])

wav_bytes = base64.b64decode(completion.choices[0].message.audio.data)
with open("dog.wav", "wb") as f:
    f.write(wav_bytes)

音频输入到模型

使用音频输入来提示模型

import base64
import requests
from openai import OpenAI

client = OpenAI()

# Fetch the audio file and convert it to a base64 encoded string
url = "https://cdn.openai.com/API/docs/audio/alloy.wav"
response = requests.get(url)
response.raise_for_status()
wav_data = response.content
encoded_string = base64.b64encode(wav_data).decode('utf-8')

completion = client.chat.completions.create(
    model="gpt-4o-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": [
                { 
                    "type": "text",
                    "text": "What is in this recording?"
                },
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": encoded_string,
                        "format": "wav"
                    }
                }
            ]
        },
    ]
)

print(completion.choices[0].message)

结构化输出

确保响应符合 JSON 架构。

https://platform.openai.com/docs/guides/structured-outputs

尝试一下

在 Playground 中尝试一下，或生成一个可用的模式定义来实验结构化输出。

简介

JSON 是世界上应用数据交换中最广泛使用的格式之一。

结构化输出是一个功能，确保模型始终生成符合您提供的 JSON 模式的响应，因此您无需担心模型会省略必需的键，或者产生无效的枚举值。

结构化输出的好处包括：

可靠的类型安全： 无需验证或重试格式错误的响应
明确的拒绝： 基于安全性的模型拒绝现在可以编程检测
更简单的提示： 无需使用强烈措辞的提示来实现一致的格式

除了在 REST API 中支持 JSON 模式外，OpenAI 的 Python 和 JavaScript SDK 也使得使用 Pydantic 和 Zod 分别定义对象模式变得容易。下面，您可以查看如何从符合代码中定义的模式的非结构化文本中提取信息。

获取结构化响应

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class CalendarEvent(BaseModel):
    name: str
    date: str
    participants: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract the event information."},
        {"role": "user", "content": "Alice and Bob are going to a science fair on Friday."},
    ],
    response_format=CalendarEvent,
)

event = completion.choices[0].message.parsed

支持的模型

Structured Outputs 在我们最新的大型语言模型中可用，从 GPT-4o 开始：

gpt-4.5-preview-2025-02-27 及以后
o3-mini-2025-1-31 及以后
o1-2024-12-17 及以后
gpt-4o-mini-2024-07-18 及以后
gpt-4o-2024-08-06 及以后

较旧的模型如 gpt-4-turbo 及更早版本可能需要使用 JSON 模式代替。

通函数调用 vs 通过 text.format，何时使用结构化输出

OpenAI API 中结构化输出有两种形式：

当使用函数调用
当使用 json_schema 响应格式

函数调用在构建连接模型和应用程序功能的应用程序时非常有用。

例如，您可以给模型访问查询数据库的函数，以构建一个可以帮助用户处理订单的人工智能助手，或者可以与 UI 交互的函数。

相反，通过 response_format 的结构化输出更适合于您想要指示模型响应用户时使用的结构化模式，而不是当模型调用工具时。

例如，如果您正在构建一个数学辅导应用程序，您可能希望助手使用特定的 JSON 模式来响应用户，以便您可以生成一个 UI，该 UI 可以以不同的方式显示模型输出的不同部分。

简单来说：

如果您要将模型连接到您系统中的工具、函数、数据等，那么您应该使用函数调用
如果您想要在模型响应用户时结构化模型的输出，那么您应该使用结构化的 response_format

本指南的其余部分将专注于 Chat Completions API 中的非函数调用用例。要了解如何使用函数调用与结构化输出一起使用，请查看函数调用指南。

结构化输出与JSON模式

结构化输出是JSON模式的进化。虽然两者都确保生成有效的JSON，但只有结构化输出确保遵守模式。结构化输出和JSON模式都支持在响应API、聊天完成API、助手API、微调API和批量API中。

我们建议在可能的情况下始终使用结构化输出而不是JSON模式。

然而，具有response_format: {type: "json_schema", ...}的结构化输出仅在gpt-4o-mini、gpt-4o-mini-2024-07-18和gpt-4o-2024-08-06模型快照以及之后的版本中受支持。

	结构化输出	JSON模式
输出有效JSON	是	是
遵守模式	是（见支持的模式）	否
兼容模型	`gpt-4o-mini`、`gpt-4o-2024-08-06`和之后的版本	`gpt-3.5-turbo`、`gpt-4-`和`gpt-4o-`模型
启用	`response_format: { type: "json_schema", json_schema: {"strict": true, "schema": ...} }`	`response_format: { type: "json_object" }`

示例

思维链

您可以要求模型以结构化、分步骤的方式输出答案，以引导用户通过解决方案。

结构化输出用于思维链数学辅导

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class Step(BaseModel):
    explanation: str
    output: str

class MathReasoning(BaseModel):
    steps: list[Step]
    final_answer: str

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format=MathReasoning,
)

math_reasoning = completion.choices[0].message.parsed

示例响应

{
  "steps": [
    {
      "explanation": "Start with the equation 8x + 7 = -23.",
      "output": "8x + 7 = -23"
    },
    {
      "explanation": "Subtract 7 from both sides to isolate the term with the variable.",
      "output": "8x = -23 - 7"
    },
    {
      "explanation": "Simplify the right side of the equation.",
      "output": "8x = -30"
    },
    {
      "explanation": "Divide both sides by 8 to solve for x.",
      "output": "x = -30 / 8"
    },
    {
      "explanation": "Simplify the fraction.",
      "output": "x = -15 / 4"
    }
  ],
  "final_answer": "x = -15 / 4"
}

结构化数据提取

您可以为从非结构化输入数据中提取的结构化字段进行定义，例如研究论文。

使用结构化输出从研究论文中提取数据

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class ResearchPaperExtraction(BaseModel):
    title: str
    authors: list[str]
    abstract: str
    keywords: list[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."},
        {"role": "user", "content": "..."}
    ],
    response_format=ResearchPaperExtraction,
)

research_paper = completion.choices[0].message.parsed

示例响应

{
  "title": "Application of Quantum Algorithms in Interstellar Navigation: A New Frontier",
  "authors": [
    "Dr. Stella Voyager",
    "Dr. Nova Star",
    "Dr. Lyra Hunter"
  ],
  "abstract": "This paper investigates the utilization of quantum algorithms to improve interstellar navigation systems. By leveraging quantum superposition and entanglement, our proposed navigation system can calculate optimal travel paths through space-time anomalies more efficiently than classical methods. Experimental simulations suggest a significant reduction in travel time and fuel consumption for interstellar missions.",
  "keywords": [
    "Quantum algorithms",
    "interstellar navigation",
    "space-time anomalies",
    "quantum superposition",
    "quantum entanglement",
    "space travel"
  ]
}

UI生成

您可以通过将其表示为具有约束的递归数据结构（如枚举）来生成有效的HTML。

使用结构化输出生成HTML

from enum import Enum
from typing import List
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class UIType(str, Enum):
    div = "div"
    button = "button"
    header = "header"
    section = "section"
    field = "field"
    form = "form"

class Attribute(BaseModel):
    name: str
    value: str

class UI(BaseModel):
    type: UIType
    label: str
    children: List["UI"] 
    attributes: List[Attribute]

UI.model_rebuild() # This is required to enable recursive types

class Response(BaseModel):
    ui: UI

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a UI generator AI. Convert the user input into a UI."},
        {"role": "user", "content": "Make a User Profile Form"}
    ],
    response_format=Response,
)

ui = completion.choices[0].message.parsed
print(ui)

示例响应

{
    "type": "form",
    "label": "User Profile Form",
    "children": [
        {
            "type": "div",
            "label": "",
            "children": [
                {
                    "type": "field",
                    "label": "First Name",
                    "children": [],
                    "attributes": [
                        {
                            "name": "type",
                            "value": "text"
                        },
                        {
                            "name": "name",
                            "value": "firstName"
                        },
                        {
                            "name": "placeholder",
                            "value": "Enter your first name"
                        }
                    ]
                },
                {
                    "type": "field",
                    "label": "Last Name",
                    "children": [],
                    "attributes": [
                        {
                            "name": "type",
                            "value": "text"
                        },
                        {
                            "name": "name",
                            "value": "lastName"
                        },
                        {
                            "name": "placeholder",
                            "value": "Enter your last name"
                        }
                    ]
                }
            ],
            "attributes": []
        },
        {
            "type": "button",
            "label": "Submit",
            "children": [],
            "attributes": [
                {
                    "name": "type",
                    "value": "submit"
                }
            ]
        }
    ],
    "attributes": [
        {
            "name": "method",
            "value": "post"
        },
        {
            "name": "action",
            "value": "/submit-profile"
        }
    ]
}

审核管理

您可以在多个类别中对输入进行分类，这是进行审核的一种常见方式。

使用结构化输出进行审核管理

from enum import Enum
from typing import Optional
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class Category(str, Enum):
    violence = "violence"
    sexual = "sexual"
    self_harm = "self_harm"

class ContentCompliance(BaseModel):
    is_violating: bool
    category: Optional[Category]
    explanation_if_violating: Optional[str]

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Determine if the user input violates specific guidelines and explain if they do."},
        {"role": "user", "content": "How do I prepare for a job interview?"}
    ],
    response_format=ContentCompliance,
)

compliance = completion.choices[0].message.parsed

示例响应

{
  "is_violating": false,
  "category": null,
  "explanation_if_violating": null
}

如何使用 response_format 来结构化输出

您可以使用新的 SDK 辅助工具将模型的输出解析为所需的格式，或者您可以直接指定 JSON 架构。

注意： 使用任何架构进行的第一个请求将会有额外的延迟，因为我们的 API 在处理架构，但使用相同架构的后续请求将不会有额外的延迟。

SDK 对象

第1步：定义你的对象

首先，你必须定义一个对象或数据结构来表示模型应该遵循的 JSON Schema。请参考本指南顶部的示例。

虽然结构化输出支持大部分 JSON Schema 功能，但由于性能或技术原因，某些功能不可用。有关更多详细信息，请参阅此处。

例如，你可以定义一个像这样的对象：

from pydantic import BaseModel

class Step(BaseModel):
    explanation: str
    output: str

class MathResponse(BaseModel):
    steps: list[Step]
    final_answer: str

数据结构技巧

为了最大化模型生成的质量，我们推荐以下建议：

明确且直观地命名键
为结构中的重要键创建清晰的标题和描述
创建并使用评估来确定最适合您用例的结构

第2步：在API调用中提供您的对象

您可以使用 parse 方法自动将JSON响应解析到您定义的对象中。

在底层，SDK负责提供与您的数据结构对应的JSON模式，然后解析响应为对象。

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format=MathResponse
  )

第3步：处理边缘情况

在某些情况下，模型可能不会生成与提供的JSON模式相匹配的有效响应。

这可能在模型拒绝回答以安全原因，或者例如达到最大token 限制且响应不完整的情况下发生。

try:
    response = client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful math tutor. Guide the user through the solution step by step.",
            },
            {"role": "user", "content": "how can I solve 8x + 7 = -23"},
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "math_response",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "steps": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "explanation": {"type": "string"},
                                    "output": {"type": "string"},
                                },
                                "required": ["explanation", "output"],
                                "additionalProperties": False,
                            },
                        },
                        "final_answer": {"type": "string"},
                    },
                    "required": ["steps", "final_answer"],
                    "additionalProperties": False,
                },
            },
        },
        strict=True,
    )
except Exception as e:
    # handle errors like finish_reason, refusal, content_filter, etc.
    pass

手动模式

步骤 1: 定义你的模式

首先，你必须设计 JSON Schema，该模式将约束模型遵循。参见本指南顶部的示例以获取参考。

虽然结构化输出支持大部分 JSON Schema，但由于性能或技术原因，一些功能不可用。有关更多详细信息，请参阅此处。

JSON Schema 的技巧

为了最大限度地提高模型生成的质量，我们建议以下做法：

清晰直观地命名键
为结构中的重要键创建清晰的标题和描述
创建并使用 evals 以确定最适合您用例的结构

步骤 2：在 API 调用中提供您的 schema

要使用结构化输出，只需指定

response_format: { "type": "json_schema", "json_schema": … , "strict": true }

例如：

response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "math_response",
            "schema": {
                "type": "object",
                "properties": {
                    "steps": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "explanation": {"type": "string"},
                                "output": {"type": "string"}
                            },
                            "required": ["explanation", "output"],
                            "additionalProperties": False
                        }
                    },
                    "final_answer": {"type": "string"}
                },
                "required": ["steps", "final_answer"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
)

print(response.choices[0].message.content)

注意： 使用任何模式发出的第一个请求将会有额外的延迟，因为我们的API正在处理该模式，但使用相同模式的后续请求将不会有额外的延迟。
步骤 3: 处理边缘情况
在某些情况下，模型可能无法生成与提供的 JSON 模式相匹配的有效响应。
这种情况可能发生在拒绝的情况下，如果模型因安全原因拒绝回答，或者例如达到最大token 限制，导致响应不完整。

try:
    response = client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": "You are a helpful math tutor. Guide the user through the solution step by step.",
            },
            {"role": "user", "content": "how can I solve 8x + 7 = -23"},
        ],
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "math_response",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "steps": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "explanation": {"type": "string"},
                                    "output": {"type": "string"},
                                },
                                "required": ["explanation", "output"],
                                "additionalProperties": False,
                            },
                        },
                        "final_answer": {"type": "string"},
                    },
                    "required": ["steps", "final_answer"],
                    "additionalProperties": False,
                },
            },
        },
        strict=True,
    )
except Exception as e:
    # handle errors like finish_reason, refusal, content_filter, etc.
    pass

步骤 4：以类型安全的方式使用生成的结构化数据
通常，在使用结构化输出时，您将在编程语言的类型系统中有一个类型或类来表示 JSON 模式作为一个对象。
一旦您确认已收到匹配您请求的架构的JSON，您现在可以安全地将它解析为相应的类型。
例如：

from pydantic import BaseModel, ValidationError
from typing import List

# Define types that match the JSON Schema using pydantic models
class Step(BaseModel):
    explanation: str
    output: str

class Solution(BaseModel):
    steps: List[Step]
    final_answer: str

...

try:
    # Parse and validate the response content
    solution = Solution.parse_raw(response.choices[0].message.content)
    print(solution)
except ValidationError as e:
    # Handle validation errors
    print(e.json())

使用结构化输出时的拒绝

当使用结构化输出与用户生成输入结合时，OpenAI 模型可能会偶尔因安全原因拒绝满足请求。由于拒绝并不一定遵循您在 response_format 中提供的模式，API 响应将包含一个名为 refusal 的新字段，以指示模型拒绝了请求。

当 refusal 属性出现在您的输出对象中时，您可以在您的 UI 中展示拒绝信息，或者在处理拒绝请求的代码中包含条件逻辑。

class Step(BaseModel):
    explanation: str
    output: str

class MathReasoning(BaseModel):
    steps: list[Step]
    final_answer: str

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format=MathReasoning,
)

math_reasoning = completion.choices[0].message

# If the model refuses to respond, you will get a refusal message
if (math_reasoning.refusal):
    print(math_reasoning.refusal)
else:
    print(math_reasoning.parsed)

API拒绝响应的响应将类似于以下内容：

{
  "id": "chatcmpl-9nYAG9LPNonX8DAyrkwYfemr3C8HC",
  "object": "chat.completion",
  "created": 1721596428,
  "model": "gpt-4o-2024-08-06",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "refusal": "I'm sorry, I cannot assist with that request."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 81,
    "completion_tokens": 11,
    "total_tokens": 92,
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "system_fingerprint": "fp_3407719c7f"
}

小贴士和最佳实践

处理用户生成输入

如果你的应用程序正在使用用户生成输入，请确保你的提示中包含有关如何处理无法产生有效响应的情况的说明。

模型总是会尝试遵守提供的架构，如果输入与架构完全不相关，可能会导致幻觉。

你可以在提示中包含语言，指定如果模型检测到输入与任务不兼容，则返回空参数或特定句子。

处理错误

结构化输出仍然可能包含错误。如果您看到错误，请尝试调整您的指令，在系统指令中提供示例，或将任务拆分为更简单的子任务。有关如何调整输入的更多指导，请参阅提示工程指南。

避免JSON模式差异

为了防止您的JSON Schema和编程语言中相应的类型出现差异，我们强烈建议使用本地的Pydantic/zod sdk支持。

如果您更喜欢直接指定JSON模式，可以添加CI规则，当JSON模式或底层数据对象被编辑时发出警告，或者添加一个CI步骤，从类型定义（或反之）自动生成JSON Schema。

流式处理

您可以使用流式处理来处理模型响应或函数调用参数，当它们生成时，并将它们解析为结构化数据。

这样，您就不必等待整个响应完成后再处理它。如果您想逐个显示JSON字段或立即处理可用的函数调用参数，这特别有用。

我们建议依赖于SDK来处理具有结构化输出的流式传输。

您可以在函数调用指南中找到如何在不使用SDK的 stream 助手的情况下流式传输函数调用参数的示例。

以下是如何使用 stream 助手流式传输模型响应的示例：

from typing import List
from pydantic import BaseModel
from openai import OpenAI

class EntitiesModel(BaseModel):
    attributes: List[str]
    colors: List[str]
    animals: List[str]

client = OpenAI()

with client.beta.chat.completions.stream(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract entities from the input text"},
        {
            "role": "user",
            "content": "The quick brown fox jumps over the lazy dog with piercing blue eyes",
        },
    ],
    response_format=EntitiesModel,
) as stream:
    for event in stream:
        if event.type == "content.delta":
            if event.parsed is not None:
                # Print the parsed data as JSON
                print("content.delta parsed:", event.parsed)
        elif event.type == "content.done":
            print("content.done")
        elif event.type == "error":
            print("Error in stream:", event.error)

final_completion = stream.get_final_completion()
print("Final completion:", final_completion)

您也可以使用 stream 辅助函数来解析函数调用参数：

from pydantic import BaseModel
import openai
from openai import OpenAI

class GetWeather(BaseModel):
    city: str
    country: str

client = OpenAI()

with client.beta.chat.completions.stream(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "What's the weather like in SF and London?",
        },
    ],
    tools=[
        openai.pydantic_function_tool(GetWeather, name="get_weather"),
    ],
    parallel_tool_calls=True,
) as stream:
    for event in stream:
        if event.type == "tool_calls.function.arguments.delta" or event.type == "tool_calls.function.arguments.done":
            print(event)

print(stream.get_final_completion())

支持的架构

Structured Outputs 支持 JSON Schema 语言的子集。

支持的类型

以下类型支持结构化输出：

字符串
数字
布尔值
整数
对象
数组
枚举
anyOf

根对象不得使用 `anyOf`

请注意，模式的最顶级对象必须是一个对象，而不是使用 anyOf。Zod（以其中一个例子）中出现的一种模式是使用区分联合，这在顶级产生一个 anyOf。因此，以下这样的代码将无法工作：

import { z } from 'zod';
import { zodResponseFormat } from 'openai/helpers/zod';

const BaseResponseSchema = z.object({ /* ... */ });
const UnsuccessfulResponseSchema = z.object({ /* ... */ });

const finalSchema = z.discriminatedUnion('status', [
    BaseResponseSchema,
    UnsuccessfulResponseSchema,
]);

// Invalid JSON Schema for Structured Outputs
const json = zodResponseFormat(finalSchema, 'final_schema');

所有字段必须为 `required`

要使用结构化输出，所有字段或函数参数都必须指定为 required。

{
    "name": "get_weather",
    "description": "Fetches the weather in the given location",
    "strict": true,
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The location to get the weather for"
            },
            "unit": {
                "type": "string",
                "description": "The unit to return the temperature in",
                "enum": ["F", "C"]
            }
        },
        "additionalProperties": false,
        "required": ["location", "unit"]
    }
}

尽管所有字段都是必需的（并且模型将为每个参数返回一个值），但可以通过使用与 null 的联合类型来模拟可选参数。

{
    "name": "get_weather",
    "description": "Fetches the weather in the given location",
    "strict": true,
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The location to get the weather for"
            },
            "unit": {
                "type": ["string", "null"],
                "description": "The unit to return the temperature in",
                "enum": ["F", "C"]
            }
        },
        "additionalProperties": false,
        "required": [
            "location", "unit"
        ]
    }
}

对象有嵌套深度和大小的限制

一个模式最多可以有 100 个对象属性，最多 5 层嵌套。

总字符串长度的限制

在架构中，所有属性名称、定义名称、枚举值和常量值的总字符串长度不能超过15,000个字符。

枚举大小限制

一个模式中，所有枚举属性的总枚举值不超过500个。

对于具有字符串值的单个枚举属性，当枚举值超过250个时，所有枚举值的总字符串长度不能超过7,500个字符。

`additionalProperties: false` 必须始终在对象中设置

additionalProperties 控制对象是否允许包含在 JSON 模式中未定义的额外键/值。

结构化输出只支持生成指定的键/值，因此我们要求开发者设置 additionalProperties: false 以选择结构化输出。

{
    "name": "get_weather",
    "description": "Fetches the weather in the given location",
    "strict": true,
    "schema": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The location to get the weather for"
            },
            "unit": {
                "type": "string",
                "description": "The unit to return the temperature in",
                "enum": ["F", "C"]
            }
        },
        "additionalProperties": false,
        "required": [
            "location", "unit"
        ]
    }
}

键排序

当使用结构化输出时，输出将按照模式中键的排序顺序产生。

一些特定类型的关键字尚不支持

以下是一些不支持的显著关键字：

对于字符串： minLength, maxLength, pattern, format
对于数字： minimum, maximum, multipleOf
对于对象： patternProperties, unevaluatedProperties, propertyNames, minProperties, maxProperties
对于数组： unevaluatedItems, contains, minContains, maxContains, minItems, maxItems, uniqueItems

如果您通过提供 strict: true 启用结构化输出，并使用不支持的 JSON 模式调用 API，您将收到错误。

对于 `anyOf`，嵌套的架构必须每个都是有效的 JSON 架构，按照此子集

以下是一个支持的 anyOf 架构示例：

{
    "type": "object",
    "properties": {
        "item": {
            "anyOf": [
                {
                    "type": "object",
                    "description": "The user object to insert into the database",
                    "properties": {
                        "name": {
                            "type": "string",
                            "description": "The name of the user"
                        },
                        "age": {
                            "type": "number",
                            "description": "The age of the user"
                        }
                    },
                    "additionalProperties": false,
                    "required": [
                        "name",
                        "age"
                    ]
                },
                {
                    "type": "object",
                    "description": "The address object to insert into the database",
                    "properties": {
                        "number": {
                            "type": "string",
                            "description": "The number of the address. Eg. for 123 main st, this would be 123"
                        },
                        "street": {
                            "type": "string",
                            "description": "The street name. Eg. for 123 main st, this would be main st"
                        },
                        "city": {
                            "type": "string",
                            "description": "The city of the address"
                        }
                    },
                    "additionalProperties": false,
                    "required": [
                        "number",
                        "street",
                        "city"
                    ]
                }
            ]
        }
    },
    "additionalProperties": false,
    "required": [
        "item"
    ]
}

定义支持

您可以使用定义来定义子模式，这些子模式在您的模式中会被引用。以下是一个简单的示例。

{
    "type": "object",
    "properties": {
        "steps": {
            "type": "array",
            "items": {
                "$ref": "#/$defs/step"
            }
        },
        "final_answer": {
            "type": "string"
        }
    },
    "$defs": {
        "step": {
            "type": "object",
            "properties": {
                "explanation": {
                    "type": "string"
                },
                "output": {
                    "type": "string"
                }
            },
            "required": [
                "explanation",
                "output"
            ],
            "additionalProperties": false
        }
    },
    "required": [
        "steps",
        "final_answer"
    ],
    "additionalProperties": false
}

支持递归模式

使用 # 来指示根递归的示例递归模式。

{
    "name": "ui",
    "description": "Dynamically generated UI",
    "strict": true,
    "schema": {
        "type": "object",
        "properties": {
            "type": {
                "type": "string",
                "description": "The type of the UI component",
                "enum": ["div", "button", "header", "section", "field", "form"]
            },
            "label": {
                "type": "string",
                "description": "The label of the UI component, used for buttons or form fields"
            },
            "children": {
                "type": "array",
                "description": "Nested UI components",
                "items": {
                    "$ref": "#"
                }
            },
            "attributes": {
                "type": "array",
                "description": "Arbitrary attributes for the UI component, suitable for any element",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {
                            "type": "string",
                            "description": "The name of the attribute, for example onClick or className"
                        },
                        "value": {
                            "type": "string",
                            "description": "The value of the attribute"
                        }
                    },
                    "additionalProperties": false,
                    "required": ["name", "value"]
                }
            }
        },
        "required": ["type", "label", "children", "attributes"],
        "additionalProperties": false
    }
}

示例使用显式递归的递归模式：

{
    "type": "object",
    "properties": {
        "linked_list": {
            "$ref": "#/$defs/linked_list_node"
        }
    },
    "$defs": {
        "linked_list_node": {
            "type": "object",
            "properties": {
                "value": {
                    "type": "number"
                },
                "next": {
                    "anyOf": [
                        {
                            "$ref": "#/$defs/linked_list_node"
                        },
                        {
                            "type": "null"
                        }
                    ]
                }
            },
            "additionalProperties": false,
            "required": [
                "next",
                "value"
            ]
        }
    },
    "additionalProperties": false,
    "required": [
        "linked_list"
    ]
}

JSON 模式

JSON 模式是结构化输出功能的更基本版本。虽然 JSON 模式确保模型输出是有效的 JSON，但结构化输出可以可靠地将模型的输出与您指定的模式匹配。如果您的情况支持结构化输出，我们建议您使用结构化输出。

当启用 JSON 模式时，模型的输出将确保是有效的 JSON，除了某些边缘情况，您应该检测并适当处理这些情况。

要使用 Chat Completions 或 Assistants API 启用 JSON 模式，可以将 response_format 设置为 { "type": "json_object" }。如果您正在使用函数调用，JSON 模式始终是启用的。

重要提示：

在使用 JSON 模式时，您必须始终通过对话中的某些消息指示模型生成 JSON，例如通过您的系统消息。如果您没有包含生成 JSON 的明确指令，模型可能会生成无限流空白字符，请求可能会不断运行，直到达到token 限制。为了帮助确保您不会忘记，如果字符串 “JSON” 没有出现在上下文中，API 将抛出错误。
JSON 模式不会保证输出与任何特定模式匹配，只会确保它是有效的并且可以无错误地解析。您应该使用结构化输出以确保它匹配您的模式，或者如果不可能，您应该使用验证库和可能的重试来确保输出匹配您期望的模式。
您的应用程序必须检测和处理可能导致模型输出不是完整 JSON 对象的边缘情况（见下文）

处理边缘情况

we_did_not_specify_stop_tokens = True

try:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-0125",
        messages=[
            {"role": "system", "content": "You are a helpful assistant designed to output JSON."},
            {"role": "user", "content": "Who won the world series in 2020? Please respond in the format {winner: ...}"}
        ],
        response_format={"type": "json_object"}
    )

    # Check if the conversation was too long for the context window, resulting in incomplete JSON 
    if response.choices[0].message.finish_reason == "length":
        # your code should handle this error case
        pass

    # Check if the OpenAI safety system refused the request and generated a refusal instead
    if response.choices[0].message[0].get("refusal"):
        # your code should handle this error case
        # In this case, the .content field will contain the explanation (if any) that the model generated for why it is refusing
        print(response.choices[0].message[0]["refusal"])

    # Check if the model's output included restricted content, so the generation of JSON was halted and may be partial
    if response.choices[0].message.finish_reason == "content_filter":
        # your code should handle this error case
        pass

    if response.choices[0].message.finish_reason == "stop":
        # In this case the model has either successfully finished generating the JSON object according to your schema, or the model generated one of the tokens you provided as a "stop token"

        if we_did_not_specify_stop_tokens:
            # If you didn't specify any stop tokens, then the generation is complete and the content key will contain the serialized JSON object
            # This will parse successfully and should now contain  "{"winner": "Los Angeles Dodgers"}"
            print(response.choices[0].message.content)
        else:
            # Check if the response.choices[0].message.content ends with one of your stop tokens and handle appropriately
            pass
except Exception as e:
    # Your code should handle errors here, for example a network error calling the API
    print(e)

资源

要了解更多关于结构化输出的信息，我们建议浏览以下资源：

查看我们的入门菜谱了解结构化输出
学习如何使用结构化输出构建多智能体系统

会话状态

了解如何在模型交互中管理会话状态。

https://platform.openai.com/docs/guides/conversation-state

OpenAI 提供了几种管理会话状态的方法，这对于在多个消息或会话回合中保留信息非常重要。

手动管理对话状态

虽然每个文本生成请求都是独立和无状态的（除非你正在使用助手指引API），你仍然可以通过将额外的消息作为参数传递给文本生成请求来实施多轮对话。考虑一个敲门笑话：

手动构建过去的对话

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "knock knock."},
        {"role": "assistant", "content": "Who's there?"},
        {"role": "user", "content": "Orange."},
    ],
)

print(response.choices[0].message.content)

通过使用交替的 用户 和 助手 消息，您可以在一次请求中捕获对话的先前状态。
为了手动在生成的响应之间共享上下文，请将模型的先前响应输出作为输入包含在内，并将该输入附加到您的下一个请求中。
在以下示例中，我们要求模型讲一个笑话，然后请求另一个笑话。以这种方式将之前的响应附加到新的请求中，有助于确保对话感觉自然并保留之前交互的上下文。
手动使用 Chat Completions API 管理对话状态。

from openai import OpenAI

client = OpenAI()

history = [
    {
        "role": "user",
        "content": "tell me a joke"
    }
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=history,
)

print(response.choices[0].message.content)

history.append(response.choices[0].message)
history.append({ "role": "user", "content": "tell me another" })

second_response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=history,
)

print(second_response.choices[0].message.content)

OpenAI 对话状态 API

我们的 API 使得自动管理对话状态变得更加容易，因此您无需在每次对话转折时手动传递输入。

我们建议使用 Responses API。因为它具有状态性，所以在对话之间管理上下文变得简单。

如果您正在使用聊天完成端点，您需要手动管理状态，如上所述，或者使用助手 API 创建持久线程。

管理上下文窗口

理解上下文窗口将帮助您成功创建线程式对话并管理模型交互之间的状态。

上下文窗口是指单个请求中可以使用的最大token 数。这个最大token 数包括输入、输出和推理token 。要了解您的模型上下文窗口，请参阅模型详情。

管理文本生成的上下文

随着您的输入变得更加复杂，或者您在对话中包含更多的回合，您需要考虑 **输出token ** 和 上下文窗口 的限制。模型输入和输出按 token 来计量，这些token 从输入中解析出来以分析其内容和意图，并组装起来以生成逻辑输出。模型在文本生成请求的生命周期中对token 使用量有限制。

输出token 是模型针对提示生成的token 。每个模型都有不同的输出token 限制。例如，gpt-4o-2024-08-06 可以生成最多 16,384 个输出token 。
上下文窗口 描述了可用于输入和输出token （以及某些模型中的推理token ）的总token 数。比较我们模型的上下文窗口限制。例如，gpt-4o-2024-08-06 的总上下文窗口为 128k token 。

如果您创建了一个非常大的提示——通常是通过为模型添加额外的上下文、数据或示例——您可能会超出模型的分配上下文窗口，这可能会导致输出被截断。

使用分词工具，该工具是用 tiktoken 库构建的，来查看特定文本字符串中有多少token 。

例如，当使用 o1 模型向 Chat Completions 发送 API 请求时，以下token 计数将应用于上下文窗口总数：

输入token （您在 messages 数组中包含的 Chat Completions 输入）
输出token （对您的提示生成的token ）
推理token （模型用来规划响应）

超出上下文窗口限制生成的token 可能会在 API 响应中被截断。

context window visualization

您可以使用分词工具来估算您的消息将使用的token 数量。

流式 API 响应

了解如何使用服务器端事件从 OpenAI API 流式传输模型响应。

https://platform.openai.com/docs/guides/streaming-responses

默认情况下，当您向 OpenAI API 发起请求时，我们在单个 HTTP 响应中生成模型的整个输出，然后再发送回去。当生成长输出时，等待响应可能需要一些时间。流式响应允许您在模型继续生成完整响应的同时，开始打印或处理模型输出的开头。

启用流式传输

流式聊天完成功能相当简单。然而，我们建议使用流式 Responses API，因为我们设计它是考虑到流式传输的。 Responses API 使用语义事件进行流式传输，并且是类型安全的。

流式传输聊天完成

要流式传输完成，调用聊天完成或旧版完成端点时设置 stream=True。这会返回一个对象，该对象以数据-only 服务器发送事件的形式流式传输响应。

响应以事件流的形式分块返回。您可以使用 for 循环遍历事件流，如下所示：

from openai import OpenAI
client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

for chunk in stream:
    print(chunk)
    print(chunk.choices[0].delta)
    print("****************")

阅读回复

当你流式传输聊天完成时，回复包含一个 delta 字段而不是 message 字段。delta 字段可以包含一个角色token 、内容token 或什么都不包含。

{ role: 'assistant', content: '', refusal: null }
****************
{ content: 'Why' }
****************
{ content: " don't" }
****************
{ content: ' scientists' }
****************
{ content: ' trust' }
****************
{ content: ' atoms' }
****************
{ content: '?\n\n' }
****************
{ content: 'Because' }
****************
{ content: ' they' }
****************
{ content: ' make' }
****************
{ content: ' up' }
****************
{ content: ' everything' }
****************
{ content: '!' }
****************
{}
****************

要将您的聊天完成的文本响应仅进行流式传输，您的代码可能如下所示：

from openai import OpenAI
client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "Say 'double bubble bath' ten times fast.",
        },
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

高级用例

对于更高级的用例，例如流式传输工具调用，请查看以下专用指南：

流式传输函数调用
流式传输结构化输出

文件输入

了解如何将 PDF 文件用作 OpenAI API 的输入。

https://platform.openai.com/docs/guides/pdf-files

OpenAI 具有视觉能力的模型也可以接受 PDF 文件作为输入。提供 PDF 文件，可以是 Base64 编码的数据，也可以是上传文件到 /v1/files 端点后获得的文件 ID，通过 API 或仪表板。

工作原理

为了帮助模型理解PDF内容，我们将提取的文本和每一页的图像都放入模型的上下文中。然后，模型可以使用文本和图像来生成响应。例如，如果图表中包含文本中没有的关键信息，这将非常有用。

上传文件

在下面的示例中，我们首先使用 Files API 上传一个 PDF 文件，然后在向模型发送的 API 请求中引用其文件 ID。

上传一个文件以用于完成操作

from openai import OpenAI
client = OpenAI()

file = client.files.create(
    file=open("draconomicon.pdf", "rb"),
    purpose="user_data"
)

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "file",
                    "file": {
                        "file_id": file.id,
                    }
                },
                {
                    "type": "text",
                    "text": "What is the first dragon in the book?",
                },
            ]
        }
    ]
)

print(completion.choices[0].message.content)

Base64 编码的文件

您也可以将 PDF 文件输入作为 Base64 编码的输入发送。

将文件 Base64 编码以用于完成

import base64
from openai import OpenAI
client = OpenAI()

with open("draconomicon.pdf", "rb") as f:
    data = f.read()

base64_string = base64.b64encode(data).decode("utf-8")

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "file",
                    "file": {
                        "filename": "draconomicon.pdf",
                        "file_data": f"data:application/pdf;base64,{base64_string}",
                    }
                },
                {
                    "type": "text",
                    "text": "What is the first dragon in the book?",
                }
            ],
        },
    ],
)

print(completion.choices[0].message.content)

使用注意事项

以下是使用 PDF 输入时需要注意的一些注意事项。

Token使用

为了帮助模型理解 PDF 内容，我们将提取的文本和每个页面的图像放入模型的上下文中 — 无论页面是否包含图像。在大规模部署解决方案之前，请确保您了解使用 PDF 作为输入的定价和令牌使用影响。有关定价的更多信息。

文件大小限制

您可以在单个请求中向 API 上传最多 100 页和 32MB 的总内容，跨越多个文件输入。

支持的模型

只有支持文本和图像输入的模型（例如 gpt-4o、gpt-4o-mini 或 o1）才能接受 PDF 文件作为输入。在此处查看模型功能。

文件上传目的

您可以将这些文件上传到文件 API，用于任何目的，但我们建议将您计划用作模型输入的文件用作 user_data 目的。

推理模型

探索高级推理和问题解决模型。

https://platform.openai.com/docs/guides/reasoning

推理模型，如 OpenAI o1 和 o3-mini，是使用强化学习训练的新大型语言模型，用于执行复杂的推理。推理模型在回答之前思考，在回答用户之前产生一个长的内部思维链。推理模型在复杂问题解决、编码、科学推理和多步骤计划以实现代理工作流程方面表现出色。
与我们的GPT模型一样，我们提供两种模型：一个更小、更快的模型（o3-mini），每 tokens 成本更低；以及一个更大、速度稍慢且成本更高的模型（o1），但通常能够为复杂任务生成更好的响应，并在各个领域之间具有更好的泛化能力。
新的 o1-pro 模型具有独特的功能，例如在生成响应之前进行多次模型生成回合。为了支持这一功能以及其他未来的高级 API 功能，该模型目前仅在 Responses API 中可用。

开始使用推理

推理模型可以通过Chat Completions端点使用，如这里所示。

在Chat Completions中使用推理模型

from openai import OpenAI

client = OpenAI()

prompt = """
Write a bash script that takes a matrix represented as a string with 
format '[1,2],[3,4],[5,6]' and prints the transpose in the same format.
"""

response = client.chat.completions.create(
    model="o3-mini",
    reasoning_effort="medium",
    messages=[
        {
            "role": "user", 
            "content": prompt
        }
    ]
)

print(response.choices[0].message.content)

推理努力

在上面的示例中，reasoning_effort 参数用于指导模型在创建对提示的响应之前应该生成多少推理标记。

您可以为此参数指定 low、medium 或 high 中的一个，其中 low 会优先考虑速度和经济的标记使用，而 high 则会在生成更多标记和较慢响应的代价下优先考虑更完整的推理。默认值为 medium，这是速度和推理准确度之间的平衡。

推理是如何工作的

推理模型在输入和输出token 的基础上引入了**推理token **。这些模型使用这些推理token 来进行“思考”，将它们对提示的理解分解，并考虑多种生成响应的方法。在生成推理token 后，模型以可见的完成token 的形式提供一个答案，并将推理token 从其上下文中删除。

以下是一个用户和助手之间的多步骤对话示例。每个步骤的输入和输出token 被保留，而推理token 被删除。

Reasoning tokens aren't retained in context

尽管推理token 通过API不可见，但它们仍然占据模型上下文窗口的空间，并按输出token 计费。

管理上下文窗口

在创建补全时，确保上下文窗口中有足够的空间用于推理token 是很重要的。根据问题的复杂性，模型可能会生成从几百到数万不等数量的推理token 。使用的推理token 的确切数量可以在聊天补全响应对象的用法对象中的 completion_tokens_details 下查看。

"usage": {
    "prompt_tokens": 26,
    "completion_tokens": 637,
    "total_tokens": 663,
    "prompt_tokens_details": {
        "cached_tokens": 0,
        "audio_tokens": 0
    },
    "completion_tokens_details": {
        "reasoning_tokens": 448,
        "audio_tokens": 0,
        "accepted_prediction_tokens": 0,
        "rejected_prediction_tokens": 0
    }
}

上下文窗口长度可在模型参考页面找到，并且在不同模型快照中会有所不同。

控制成本

要使用推理模型管理成本，您可以通过使用 max_completion_tokens 参数来限制模型生成的总token 数（包括推理和完成token ）。

在之前的模型中，max_tokens 参数控制生成的token 数和用户可见的token 数，这两个数总是相等的。然而，由于内部推理token 的存在，使用推理模型时，生成的总token 数可能会超过可见token 数。

由于某些应用程序可能依赖于 max_tokens 与从 API 收到的token 数相匹配，我们引入了 max_completion_tokens 以显式控制模型生成的总token 数，包括推理和可见的完成token 。这种显式选择确保在新的模型中使用时，现有的应用程序不会受到影响。对于所有之前的模型，max_tokens 参数继续按之前的方式工作。

为推理分配空间

如果生成的标记达到上下文窗口限制或您设置的 max_completion_tokens 值，您将收到一个聊天完成响应，其中 finish_reason 被设置为 length。这可能发生在任何可见的完成标记产生之前，这意味着您可能会在没有收到可见响应的情况下产生输入和推理标记的成本。

为了防止这种情况，请确保上下文窗口中有足够的空间，或者将 max_completion_tokens 值调整为一个更高的数字。OpenAI 建议在开始尝试这些模型时，至少保留 25,000 个标记用于推理和输出。随着您熟悉您的提示所需的推理标记数量，您可以相应地调整此缓冲区。

提示引导的建议

在引导推理模型与GPT模型时，有一些差异需要考虑。一般来说，推理模型在只有高级指导的任务上会提供更好的结果。这与GPT模型有所不同，GPT模型通常从非常精确的指令中获益。

推理模型就像一位资深同事——你可以给他们一个目标去实现，并信任他们能处理好细节。
GPT模型就像一位初级同事——他们需要明确的指令来创建特定的输出。

有关使用推理模型的最佳实践的更多信息，请参阅本指南。

Prompt 示例

编码（重构）

OpenAI o-series 模型能够实现复杂算法并生成代码。这个提示要求 o1 根据一些特定标准重构一个 React 组件。

重构代码

from openai import OpenAI

client = OpenAI()

prompt = """
Instructions:
- Given the React component below, change it so that nonfiction books have red
  text. 
- Return only the code in your reply
- Do not include any additional formatting, such as markdown code blocks
- For formatting, use four space tabs, and do not allow any lines of code to 
  exceed 80 columns

const books = [
  { title: 'Dune', category: 'fiction', id: 1 },
  { title: 'Frankenstein', category: 'fiction', id: 2 },
  { title: 'Moneyball', category: 'nonfiction', id: 3 },
];

export default function BookList() {
  const listItems = books.map(book =>
    <li>
      {book.title}
    </li>
  );

  return (
    <ul>{listItems}</ul>
  );
}
"""

response = client.chat.completions.create(
    model="o3-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                },
            ],
        }
    ]
)

print(response.choices[0].message.content)

编码（规划）

OpenAI o 系列模型还擅长创建多步骤计划。此示例提示要求 o1 创建完整解决方案的文件系统结构，以及实现所需用例的 Python 代码。

from openai import OpenAI

client = OpenAI()

prompt = """
I want to build a Python app that takes user questions and looks 
them up in a database where they are mapped to answers. If there 
is close match, it retrieves the matched answer. If there isn't, 
it asks the user to provide an answer and stores the 
question/answer pair in the database. Make a plan for the directory 
structure you'll need, then return each file in full. Only supply 
your reasoning at the beginning and end, not throughout the code.
"""

response = client.chat.completions.create(
    model="o3-mini",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt
                },
            ],
        }
    ]
)

print(response.choices[0].message.content)

STEM 研究

OpenAI o 系列模型在 STEM 研究中表现出色。要求支持基础研究任务的提示应该会显示出很好的效果。

from openai import OpenAI

client = OpenAI()

prompt = """
What are three compounds we should consider investigating to 
advance research into new antibiotics? Why should we consider 
them?
"""

response = client.chat.completions.create(
    model="o3-mini",
    messages=[
        {
            "role": "user", 
            "content": prompt
        }
    ]
)

print(response.choices[0].message.content)

2025-03-29（六）

OpenAI API - Concept 核心概念说明

文章目录

文本生成与提示

消息角色和指令遵循

选择模型

我应该选择哪个模型？

提示工程

图像与视觉

传递一个 URL

图像输入要求

指定图像输入细节级别

提供多个图像输入

局限性

计算成本

成本计算示例

Audio and speech

使用音频构建

音频用例之旅

语音代理

流式音频

文本转语音

语音转文本

选择正确的API

通用用途API与专用API

与模型对话与控制脚本

推荐

将音频添加到现有应用

模型输出的音频

音频输入到模型

结构化输出

尝试一下

简介

支持的模型

通函数调用 vs 通过 text.format，何时使用结构化输出

结构化输出与JSON模式

示例

思维链

示例响应

结构化数据提取

示例响应

UI生成

示例响应

审核管理

示例响应

如何使用 response_format 来结构化输出

SDK 对象

第1步：定义你的对象

数据结构技巧

第2步：在API调用中提供您的对象

第3步：处理边缘情况

手动模式

JSON Schema 的技巧

使用结构化输出时的拒绝

小贴士和最佳实践

处理用户生成输入

处理错误

避免JSON模式差异

流式处理

支持的架构

支持的类型

根对象不得使用 anyOf

所有字段必须为 required

对象有嵌套深度和大小的限制

总字符串长度的限制

枚举大小限制

additionalProperties: false 必须始终在对象中设置

键排序

一些特定类型的关键字尚不支持

对于 anyOf，嵌套的架构必须每个都是有效的 JSON 架构，按照此子集

定义支持

支持递归模式

JSON 模式

资源

会话状态

手动管理对话状态

OpenAI 对话状态 API

管理上下文窗口

管理文本生成的上下文

流式 API 响应

启用流式传输

根对象不得使用 `anyOf`

所有字段必须为 `required`

`additionalProperties: false` 必须始终在对象中设置

对于 `anyOf`，嵌套的架构必须每个都是有效的 JSON 架构，按照此子集