【全新第二版《Learn AI-assisted Python Programming》封面】
写在前面
本篇介绍了 Copilot 辅助编程的本地环境配置和它的基本工作流程,整体内容难度不大,主要是照顾零基础的 AI 开发者。最后的数据分析案例强烈建议动手操练,真实感受一下 Copilot 的不确定性。
Ch02: Getting started with Copilot
本章概要
- 本地环境搭建
- Copilot 工作流
- 理解 Copilot 在数据处理任务中的价值
1 本地环境搭建
配合本书内容的基础环境只需要三个工具:
Python
:只要是 Python3 就行(安装地址:https://www.python.org/downloads/
);VSCode
:与 Copilot 集成良好,且对编程入门者友好(安装地址:https://code.visualstudio.com/download
);GitHub Copilot
:一个有效的GitHub
帐号即可。Copilot
对新用户有 30 天试用期,之后按 $10/月或 $100/年收费。
另外,为了方便调试,还要在 VSCode
安装以下扩展插件(extensions):
GitHub Copilot
:Copilot
官方插件。GitHub Copilot Chat
:安装最新版 VSCode 会自动安装的第三方插件,方便打开聊天窗口。Python
(by Microsoft):其实可以一键安装Python Extension Pack
,里面包含该插件在内的常用扩展工具。
截图如下:
【图 2.1 VSCode 的 Python 扩展包,包含 7 个常用扩展工具】
【图 2.2 与 Copilot 相关的第三方扩展工具】
2 手动运行代码
先用 VSCode
打开桌面上的一个空文件夹(例如 ai-demo
),然后用刚才装好的 Python
扩展新建一个虚拟环境:
【图 2.3 用 Python 自带的 venv 模块在 VSCode 界面操作下创建虚拟环境】
【图 2.4 手动创建第一个测试文件并通过 Python 扩展工具运行的实测效果图】
3 检查 Copilot 是否就绪
清空刚才的代码内容,重新输入以下注释内容后回车:
# output "Hello Copilot" to the screen
如果看到下面的自动提示内容,则表明 Copilot 配置成功:
【2.5 Copilot 配置成功后根据 Python 注释自动生成的提示内容】
此时按 Tab 键正式写入相关代码:
【图 2.6 按 Tab 键后自动录入 Copilot 提示代码】
再举一个计算正方形面积的例子。输入以下内容后回车:
side = 5
area = side * side
# Print the side length is *** and the area is ***
Copilot
会根据注释中的提示词自动填空,最终生成语法正确的代码:
【图 2.7 实测 Copilot 自动填充占位符并生成相关代码的截图】
4 Copilot 常见问题排查清单
问题1:回车后继续保持注释状态
解决方法有四:
- 再多按一次回车换行;
- 输入提示的前两个字符(迫使
Copilot
按正常代码处理); - 将
#
标记改写为文档字符串(即"""
,推荐); - 按
Ctrl-Enter
查看相关建议。
问题2:代码提示有误
常见解决方法如下:
- 修改提示词,重新生成代码;
- 按
Ctrl-Enter
查看其他提示结果; - 清除
Copilot
缓存:关闭VSCode
,稍等片刻后重启,再重新生成代码; - 将问题拆解为更小的步骤(第 7 章详述);
- 直接调试代码;
- 求助其他 AI 工具(如
ChatGPT
、DeepSeek
等)。
问题3:输入回车后生成一句新注释:# YOUR CODE HERE
(在此处输入你的代码)
解决方案:这是 Copilot
早期容易见到的问题,可能在让 Copilot
解决一个类似课堂练习的问题时出现。最新的版本几乎很少出现这种状况。解决办法也很简单,通过 Ctrl-Enter
选择其他版本的提示内容即可。
问题4:代码依赖模块缺失
解决方案:用提示词让 Copilot
补齐所需依赖即可。
DIY补充:切换 Copilot 的底层模型
在 VSCode 中输入组合键 Ctrl + I(小写即可),会弹出一个用于提问的临时对话框。此时点击右边的下拉图标,即可切换不同的底层 LLM 大语言模型:
【图 2.8 利用临时对话框切换 Copilot 底层模型的实测截图】
5 Copilot 的基本工作流程
全书遵循的 Copilot
基本工作流程如下:
- 用注释标记
#
或文档字符串"""
录入提示词; - 让
Copilot
根据提示词生成相应回复; - 通读并测试回复内容,确保代码符合预期;
- 检查无误,则转到第 1 步,完成下一步操作;
- 检查有误,则先删除回复内容,再返回第 1 步,修改提示词重新生成。
6 实例验证 Copilot 提示的不确定性
本节通过一个简单的数据分析案例来验证 Copilot
固有的不确定性。
先从 kaggle 下载测试数据集(2019 - 2022 年度美国橄榄球联盟攻防统计数据),格式为 CSV
文件:
【图 2.9 从 Kaggle 官网下载测试数据集(CSV 格式文件)】
需求其实很简单,就是对数据集中的指定站位(四分卫 QB
)进行分类汇总,看看每名队员的传球总距离是多少,最后按倒序输出结果。
6.1 原始对照组
先将数据集文件 nfl_offensive_stats.csv
复制到当前工作目录下,然后按书中给出的如下回复内容运行一遍代码作为参照组:
"""
open the csv file called "nfl_offensive_stats.csv" and
read in the csv data from the file
"""
# import the csv module
import csv
# open the csv file
with open('nfl_offensive_stats.csv', 'r') as f:
# read the csv data
data = list(csv.reader(f))
"""
the 3rd column in data is player position, the fourth
column is the player, and the 8th column is the
passing yards. For each player whose position in
column 3 is "QB", determine the sum of yards from
column 8
"""
# create a dictionary to hold the player name and passing yards
passing_yards = {}
# loop through the data
for row in data:
# check if the player is a quarterback
if row[2] == 'QB':
# check if the player is already in the dictionary
if row[3] in passing_yards:
# add the passing yards to the existing value
passing_yards[row[3]] += int(row[7])
else:
# add the player to the dictionary
passing_yards[row[3]] = int(row[7])
"""
print the sum of the passing yards sorted by sum
of passing yards in descending order
"""
for player in sorted(passing_yards, key=passing_yards.get, reverse=True):
print(player, passing_yards[player])
运行结果如下:
【图 2.10 按照书中给出的完整提示代码得到的运行结果】
6.2 基于 o3-mini 大模型的第一版实测
接着,删除原始代码,将全过程需要的三段提示词合在一起(第 1 至 5 行 [^1]),底层调用 o3-mini
模型,得到如下结果:
"""
1. open the csv file called "nfl_offensive_stats.csv" and read in the csv data from the file
2. the 3rd column in data is player position, the fourth column is the player, and the 8th column is the passing yards. For each player whose position in column 3 is "QB", determine the sum of yards from column 8
3. print the sum of the passing yards sorted by sum of passing yards in descending order
"""
import csv
# open the csv file called "nfl_offensive_stats.csv" and read in the csv data from the file
with open("nfl_offensive_stats.csv", "r") as file:
reader = csv.reader(file)
data = list(reader)
# the 3rd column in data is player position, the fourth column is the player, and the 8th column is the passing yards. For each player whose position in column 3 is "QB", determine the sum of yards from column 8
qb_passing_yards = {}
for row in data[1:]:
if row[2] == "QB":
if row[3] in qb_passing_yards:
qb_passing_yards[row[3]] += int(row[7])
else:
qb_passing_yards[row[3]] = int(row[7])
# print the sum of the passing yards sorted by sum of passing yards in descending order
sorted_qb_passing_yards = sorted(qb_passing_yards.items(), key=lambda x: x[1], reverse=True)
for player, yards in sorted_qb_passing_yards:
print(f"{player}: {yards}")
运行结果:
【图 2.11 组合三个提示词后利用 o3-mini 模型生成的提示代码与运行结果截图】
6.3 基于【Ctrl + I】临时对话框的第二版实测
如果通过 Ctrl + I 产生的临时对话框分别输三次提示词,最后得到的结果如下(效果和合并后的 docstring
版本类似):
import csv
with open("nfl_offensive_stats.csv", mode="r", newline="") as file:
csv_reader = csv.reader(file)
qb_yards = {}
# Skip header row if there is one
# next(csv_reader)
for row in csv_reader:
if row[2] == "QB":
player = row[3]
try:
yards = float(row[7])
except ValueError:
yards = 0.0
qb_yards[player] = qb_yards.get(player, 0) + yards
sorted_qb_yards = sorted(qb_yards.items(), key=lambda item: item[1], reverse=True)
for player, total_yards in sorted_qb_yards:
print(f"{player}: {total_yards}")
6.4 基于 docstring 文档字符串的第三版实测
但是,如果完全按前后三个 docstring
的方式输入提示词,得到的结果差别较大:
"""
open the csv file called "nfl_offsensive_stats.csv" and
read in the csv data from the file
"""
import csv
# open the csv file
with open("nfl_offensive_stats.csv") as csvfile:
# read in the csv data from the file
nfl_data = csv.reader(csvfile)
# print the data to the console
for row in nfl_data:
print(row)
"""
the 3rd column in data is player position, the fourth
column is player, and the 8th column is the passing yards.
For each player whose position in column 3 is 'QB', determine
the sum of yards from column 8.
"""
# open the csv file
with open("nfl_offensive_stats.csv") as csvfile:
# read in the csv data from the file
nfl_data = csv.reader(csvfile)
# initialize the sum of passing yards
sum_passing_yards = 0
# iterate through the data
for row in nfl_data:
# check if the player position is 'QB'
if row[2] == 'QB':
# add the passing yards to the sum
sum_passing_yards += int(row[7])
# print the sum of passing yards
print(sum_passing_yards)
"""
print the sum of the passing yards sorted by sum of
passing yards in descending order
"""
# open the csv file
with open("nfl_offensive_stats.csv") as csvfile:
# read in the csv data from the file
nfl_data = csv.reader(csvfile)
# initialize the sum of passing yards
sum_passing_yards = 0
# create a dictionary to store the sum of passing yards
passing_yards = {}
# iterate through the data
for row in nfl_data:
# check if the player position is 'QB'
if row[2] == 'QB':
# add the passing yards to the sum
sum_passing_yards = passing_yards.get(row[3], 0)
sum_passing_yards += int(row[7])
passing_yards[row[3]] = sum_passing_yards
# sort the dictionary by the sum of passing yards
sorted_passing_yards = sorted(passing_yards.items(), key=lambda x: x[1], reverse=True)
# print the sorted dictionary
print(sorted_passing_yards)
只运行最后一段代码,结果虽然和前两个一致,但输出格式明显不同了:
【图 2.12 分开输入提示词得到的回复内容以及运行结果(格式完全不同前两版)】
6.5 实测需求变更后的 Copilot 相应情况
如果倒序输出结果时想屏蔽掉某个队员的数据(例如 Tom Brady
),则需要对上述代码引入一段新提示词,并在指定位置进行修改:
# open the csv file
with open("nfl_offensive_stats.csv") as csvfile:
# read in the csv data from the file
nfl_data = csv.reader(csvfile)
# initialize the sum of passing yards
sum_passing_yards = 0
# create a dictionary to store the sum of passing yards
passing_yards = {}
# iterate through the data
for row in nfl_data:
# check if the player position is 'QB'
if row[2] == 'QB':
# add the passing yards to the sum
sum_passing_yards = passing_yards.get(row[3], 0)
sum_passing_yards += int(row[7])
passing_yards[row[3]] = sum_passing_yards
"""
print the sum of the passing yards sorted by sum
of passing yards in descending order
Do not include Tom Brady because he wins too much
"""
# sort the passing yards in descending order
sorted_passing_yards = sorted(passing_yards.items(), key=lambda x: x[1], reverse=True)
# print the sorted passing yards
for player, yards in sorted_passing_yards:
if player != 'Tom Brady':
print(player, yards)
运行结果如下(符合预期):
【图 2.13 人为筛除某名队员的数据后得到的实测结果(符合预期)】
本章小结
本章介绍了学习 Copilot
辅助编程的本地环境搭建方法,并给出了全书遵循的 Copilot
基本工作流程,最后通过一个简单的数据分析案例验证了 Copilot
提示内容的不确定性。由于该案例的参考代码并未展开讲解,这里也只列出本地实测结果。
综上也可以得出以下认知:
- Copilot 功能十分强大:提问者无需手写代码也能解决实际问题;
- 问题的拆解十分关键:小巧而明确的问题描述总是优于大而笼统的提示词;
- 提问者仍需要对代码语法有一定了解:否则变更需求时新增代码不知道放哪儿;
- 代码测试环节非常重要:因为 Copilot 的不确定性,完全相同的提示词生成的代码可能千差万别;
- Python 的功能同样强大:主要体现在各种现成的模块,无需重复造轮子。