PyQt6实例_批量下载pdf工具_批量pdf网址获取

发布于:2025-04-02 ⋅ 阅读:(22) ⋅ 点赞:(0)

目录

前置:

步骤:

step one 安装包

step two 获取股票代码

step three 敲代码,实现 

step four 网址转pdf网址

视频


前置:

1 本系列将以 “PyQt6实例_批量下载pdf工具”开头,放在 【PyQt6实例】 专栏
2 本节讲述“批量pdf网址获取”没有涉及到PyQt6的知识点,是“批量下载pdf工具”的一个步骤
3 “批量下载pdf工具”实例是以下载巨潮pdf文件为使用场景,所以pdf网址获取来自巨潮
4 本系列后续会在B站录制视频,到时会在文末贴出链接。本人还是建议先看博文,不懂的再看视频,这样效率高,节约时间。

步骤:

step one 安装包

1 新建项目,创建虚拟环境

2 安装包 pip install akshare

step two 获取股票代码

打开通达信-》行情-》A股-》按“34”回车

 

取代码这一列,存储到txt文件中

step three 敲代码,实现 

import akshare as ak
from concurrent.futures import ThreadPoolExecutor
from datetime import datetime

# {'年报', '半年报', '一季报', '三季报', '业绩预告', '权益分派',
#     '董事会', '监事会', '股东大会', '日常经营', '公司治理', '中介报告',
#      '首发', '增发', '股权激励', '配股', '解禁', '公司债', '可转债', '其他融资',
#      '股权变动', '补充更正', '澄清致歉', '风险提示', '特别处理和退市', '退市整理期'}
def req_from_ak(thread_num:int,stock_ticker_list:list):
    category_str = '权益分派'
    end_date_str = '20250329'
    pre_dir = r'E:/temp003/'
    print(f'thread {thread_num} start.')
    for symbol_str in stock_ticker_list:
        try:
            df = ak.stock_zh_a_disclosure_report_cninfo(symbol=symbol_str, market="沪深京",
                                                                                            category=category_str,
                                                                                            start_date="20000101",
                                                                                            end_date=end_date_str)
            df.to_excel(pre_dir+symbol_str+'.xlsx',engine='openpyxl')
        except:
            print(symbol_str)
    print(f'thread {thread_num} execute end. {datetime.now().strftime("%Y-%m-%d %H:%M:%s")}')
    pass

def start_execute():
    with open('./stock_ticker.txt',mode='r',encoding='utf-8') as fr:
        contents = fr.read()
    stock_ticker_list = contents.split('\n')
    print(len(stock_ticker_list))
    thread_count = 5
    interval = len(stock_ticker_list)//thread_count
    if interval == 0:
        thread_count = 1
    params_list = []
    thread_num_list = []
    for i in range(0,thread_count):
        if i == thread_count-1:
            pre_list = stock_ticker_list[i*interval:]
        else:
            pre_list = stock_ticker_list[i*interval:i*interval+interval]
        thread_num_list.append(i)
        params_list.append(pre_list)
    with ThreadPoolExecutor() as executor:
        executor.map(req_from_ak, thread_num_list,params_list)
    print('线程池任务分配完毕')
    pass


if __name__ == '__main__':
    start_execute()
    pass

使用多线程,获取得快些

公告链接是要使用的。 

step four 网址转pdf网址

import os
import pandas as pd

def trans_url_to_pdfurl():
    pre_dir = r'E:/temp003/'
    tar_dir = r'E:/temp005/'
    file_list = os.listdir(pre_dir)
    for file_one in file_list:
        ticker = file_one[0:6]
        pre_file_path = pre_dir + file_one
        df = pd.read_excel(pre_file_path,engine='openpyxl')
        url_list = df['公告链接'].to_list()
        pdf_url_list = []
        for u_one in url_list:
            u_one_00 = u_one.split('&')
            node_00 = u_one_00[1].replace('announcementId=','')
            node_01 = u_one_00[-1].replace('announcementTime=','')
            node_01 = node_01[0:10]
            tar_node = f'http://static.cninfo.com.cn/finalpage/{node_01}/{node_00}.PDF'
            pdf_url_list.append(tar_node)
            pass
        pdf_url_list_str = '\n'.join(pdf_url_list)
        with open(f'{tar_dir}/{ticker}.txt', mode='w', encoding='utf-8') as fw:
            fw.write(pdf_url_list_str)
        pass
    pass

if __name__ == '__main__':
    trans_url_to_pdfurl()
    pass

至此,批量下载pdf工具 用于下载的pdf网址就准备好了。

视频

https://www.bilibili.com/video/BV1ASZwYhEGn/
https://www.bilibili.com/video/BV1oEZwYDE6N/
https://www.bilibili.com/video/BV1wuZwYZEJe/
https://www.bilibili.com/video/BV1XtZwYyEo4/