Crawl4ai实操2

发布于:2025-06-24 ⋅ 阅读:(19) ⋅ 点赞:(0)

实操2

Chapter 1 - 基础形态

1.1 - Basic Type

import asyncio  # 异步编程库
from crawl4ai import AsyncWebCrawler  # 网页抓取工具
import os

OUTPUT_PATH = './outputs/markdown/'

def output_md(base_filename, md_str):
    # 创建输出目录
    os.makedirs(OUTPUT_PATH, exist_ok=True)

    # 生成带长度的文件名
    length = len(md_str)
    name, ext = os.path.splitext(base_filename)
    filename = f"{name}({length}){ext}"

    # 完整路径
    full_path = os.path.join(OUTPUT_PATH, filename)

    with open(full_path, 'w', encoding='utf-8') as f:
        f.write(md_str)

    print(f"已保存到: {full_path}")
# 异步抓取网页内容
async def main(output_filename):
    # 创建爬虫对象,自动管理资源(确保爬虫使用完后会自动关闭,释放资源)
    async with AsyncWebCrawler() as crawler:
        # 访问指定网址并等待响应(await 关键字表示等待这个操作完成后再继续执行下面的代码)
        result = await crawler.arun("https://www.anthropic.com/news/agent-capabilities-api")

        # 打印抓取结果
        print("Markdown length:", len(result.markdown))
        print(result.markdown[:300])

        # 保存到.md文件
        output_md(output_filename, result.markdown)

# 启动异步程序
asyncio.run(main('1_1_Basic.md'))
PS E:\AI-lab\n8n> & D:/anaconda3/envs/crawl4ai-python311/python.exe e:/AI-lab/n8n/crawl4ai-1.py
[INIT].... → Crawl4AI 0.6.3 
[FETCH]... ↓ https://www.anthropic.com/news/agent-capabilities-api                                                || ⏱: 3.40s 
[SCRAPE].. ◆ https://www.anthropic.com/news/agent-capabilities-api                                                || ⏱: 0.03s 
[COMPLETE] ● https://www.anthropic.com/news/agent-capabilities-api                                                || ⏱: 3.43s
Markdown length: 10941
[Skip to main content](https://www.anthropic.com/news/agent-capabilities-api#main-content)[Skip to footer](https://www.anthropic.com/news/agent-capabilities-api#footer)
[](https://www.anthropic.com/)
  * Claude
  * API
  * Solutions
  * Research
  * Commitments
  * Learn
[News](https://www.anthropic
已保存到: ./outputs/markdown/1_1_Basic(10941).md

网站公告

今日签到

点亮在社区的每一天
去签到