LangChain实战(十六):构建基于SQL数据库的数据分析Agent

发布于:2025-09-06 ⋅ 阅读:(12) ⋅ 点赞:(0)

本文是《LangChain实战课》系列的第十六篇,将手把手教你如何使用SQLDatabaseToolkit和Agent构建一个智能的数据分析助手。这个助手能够理解自然语言问题,自动编写和执行SQL查询,并对结果进行智能分析和解释,让数据查询变得像对话一样简单。

前言

在当今数据驱动的世界中,能够快速从数据库中提取和分析信息是至关重要的技能。然而,编写复杂的SQL查询对于非技术用户来说往往是一个挑战。通过结合LangChain的SQLDatabaseToolkit和Agent技术,我们可以创建一个智能的数据分析助手,让用户用自然语言提问,自动生成和执行SQL查询,并获得有意义的分析结果。

为什么需要SQL数据分析Agent?

传统数据查询的痛点

  1. 技术门槛高:需要熟练掌握SQL语法

  2. 效率低下:手动编写查询耗时且容易出错

  3. 理解困难:非技术人员难以理解查询结果

  4. 灵活性差:固定的查询难以应对多变的分析需求

SQL Agent的优势

  1. 自然语言交互:用户可以用日常语言提问

  2. 自动查询生成:AI自动编写优化的SQL查询

  3. 智能结果解释:不仅返回数据,还提供分析和洞察

  4. 安全可控:可以限制查询范围和操作权限

环境准备和安装

在开始之前,我们需要安装必要的依赖包:

# 安装LangChain核心库
pip install langchain

# 安装SQL相关依赖
pip install sqlalchemy

# 安装数据库驱动(根据使用的数据库选择)
pip install pymysql      # MySQL
pip install psycopg2-binary  # PostgreSQL
pip install sqlite3      # SQLite(通常已内置)

# 安装其他工具包
pip install pandas
pip install numpy
pip install python-dotenv

# 安装可选的可视化库
pip install matplotlib
pip install seaborn

创建示例数据库

首先,让我们创建一个丰富的示例数据库来演示SQL Agent的功能:

import sqlite3
import pandas as pd
from datetime import datetime, timedelta
import random

def create_sample_database():
    """创建包含多个表的示例数据库"""
    conn = sqlite3.connect('company_data.db')
    cursor = conn.cursor()
    
    # 创建员工表
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS employees (
        employee_id INTEGER PRIMARY KEY,
        first_name TEXT NOT NULL,
        last_name TEXT NOT NULL,
        email TEXT UNIQUE NOT NULL,
        phone TEXT,
        hire_date DATE NOT NULL,
        job_title TEXT NOT NULL,
        department TEXT NOT NULL,
        salary REAL NOT NULL,
        manager_id INTEGER,
        FOREIGN KEY (manager_id) REFERENCES employees (employee_id)
    )
    ''')
    
    # 创建部门表
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS departments (
        department_id INTEGER PRIMARY KEY,
        department_name TEXT UNIQUE NOT NULL,
        location TEXT NOT NULL,
        budget REAL NOT NULL
    )
    ''')
    
    # 创建销售表
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS sales (
        sale_id INTEGER PRIMARY KEY,
        employee_id INTEGER NOT NULL,
        sale_date DATE NOT NULL,
        product_category TEXT NOT NULL,
        amount REAL NOT NULL,
        region TEXT NOT NULL,
        FOREIGN KEY (employee_id) REFERENCES employees (employee_id)
    )
    ''')
    
    # 创建项目表
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS projects (
        project_id INTEGER PRIMARY KEY,
        project_name TEXT NOT NULL,
        start_date DATE NOT NULL,
        end_date DATE,
        budget REAL NOT NULL,
        status TEXT NOT NULL CHECK(status IN ('Active', 'Completed', 'On Hold'))
    )
    ''')
    
    # 创建员工项目关联表
    cursor.execute('''
    CREATE TABLE IF NOT EXISTS employee_projects (
        employee_id INTEGER NOT NULL,
        project_id INTEGER NOT NULL,
        role TEXT NOT NULL,
        hours_worked REAL DEFAULT 0,
        PRIMARY KEY (employee_id, project_id),
        FOREIGN KEY (employee_id) REFERENCES employees (employee_id),
        FOREIGN KEY (project_id) REFERENCES projects (project_id)
    )
    ''')
    
    # 插入部门数据
    departments = [
        (1, 'Engineering', 'San Francisco', 1500000),
        (2, 'Sales', 'New York', 1200000),
        (3, 'Marketing', 'Chicago', 800000),
        (4, 'HR', 'Austin', 500000),
        (5, 'Finance', 'Boston', 900000)
    ]
    cursor.executemany('INSERT INTO departments VALUES (?, ?, ?, ?)', departments)
    
    # 插入员工数据
    employees = [
        (1, 'John', 'Doe', 'john.doe@company.com', '555-0101', '2020-01-15', 'Senior Engineer', 'Engineering', 120000, None),
        (2, 'Jane', 'Smith', 'jane.smith@company.com', '555-0102', '2019-03-10', 'Engineering Manager', 'Engineering', 150000, 1),
        (3, 'Bob', 'Johnson', 'bob.johnson@company.com', '555-0103', '2021-06-20', 'Software Engineer', 'Engineering', 95000, 2),
        (4, 'Alice', 'Brown', 'alice.brown@company.com', '555-0104', '2018-11-05', 'Sales Director', 'Sales', 130000, None),
        (5, 'Charlie', 'Wilson', 'charlie.wilson@company.com', '555-0105', '2022-02-14', 'Sales Representative', 'Sales', 75000, 4),
        (6, 'Diana', 'Lee', 'diana.lee@company.com', '555-0106', '2020-07-30', 'Marketing Manager', 'Marketing', 110000, None),
        (7, 'Mike', 'Chen', 'mike.chen@company.com', '555-0107', '2021-09-12', 'Marketing Specialist', 'Marketing', 85000, 6),
        (8, 'Sarah', 'Davis', 'sarah.davis@company.com', '555-0108', '2019-12-01', 'HR Manager', 'HR', 100000, None),
        (9, 'Tom', 'Anderson', 'tom.anderson@company.com', '555-0109', '2022-04-25', 'HR Coordinator', 'HR', 60000, 8),
        (10, 'Lisa', 'Garcia', 'lisa.garcia@company.com', '555-0110', '2020-08-18', 'Finance Manager', 'Finance', 125000, None)
    ]
    cursor.executemany('INSERT INTO employees VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)', employees)
    
    # 插入项目数据
    projects = [
        (1, 'Website Redesign', '2023-01-10', '2023-06-15', 50000, 'Completed'),
        (2, 'Mobile App Development', '2023-03-01', None, 75000, 'Active'),
        (3, 'Market Research', '2023-02-15', '2023-04-30', 30000, 'Completed'),
        (4, 'ERP Implementation', '2023-05-01', None, 100000, 'Active'),
        (5, 'Product Launch', '2023-06-01', None, 60000, 'On Hold')
    ]
    cursor.executemany('INSERT INTO projects VALUES (?, ?, ?, ?, ?, ?)', projects)
    
    # 插入员工项目关联数据
    employee_projects = [
        (1, 1, 'Lead Developer', 120),
        (2, 1, 'Project Manager', 40),
        (3, 1, 'Developer', 80),
        (1, 2,<

网站公告

今日签到

点亮在社区的每一天
去签到