把doi直接插入word中,然后直接生成参考文献

发布于:2025-04-02 ⋅ 阅读:(15) ⋅ 点赞:(0)

这段代码通过提取、查询、替换DOI,生成参考文献列表来处理Word文档,可按功能模块划分:

  1. 导入模块
import re
from docx import Document
from docx.oxml.ns import qn
from habanero import Crossref

导入正则表达式模块re用于文本模式匹配,python - docx库中的Document类操作Word文档,qn函数处理命名空间(代码中未实际使用),以及habanero库的Crossref类,用于通过DOI查询参考文献信息。
2. 提取DOI函数

def extract_dois(text):
    doi_pattern = r'(10\.\d{4,9}/[-._;()/:A-Z0-9]+)'
    return re.findall(doi_pattern, text, re.IGNORECASE)

定义extract_dois函数,接收文本参数text,使用正则表达式doi_pattern匹配DOI格式,通过re.findall函数提取所有符合格式的DOI字符串,返回包含这些DOI的列表,忽略大小写。
3. 获取参考文献函数

def get_reference(doi):
    cr = Crossref()
    try:
        result = cr.works(ids=doi)
        if'message' in result:
            message = result['message']
            # 提取作者信息
            authors = []
            if 'author' in message:
                for author in message['author']:
                    if 'family' in author and 'given' in author:
                        last_name = author['family']
                        first_initial = author['given'][0] if author['given'] else ''
                        authors.append(f"{last_name}, {first_initial}.")
            author_str = ', '.join(authors)
            # 提取年份、标题等其他信息
            year = message['issued']['date - parts'][0][0] if 'issued' in message and 'date - parts' in message['issued'] and message['issued']['date - parts'] else 'n.d.'
            title = message['title'][0] if 'title' in message and message['title'] else 'No title'
            journal = message['container - title'][0] if 'container - title' in message and message['container - title'] else 'No journal'
            volume = message['volume'] if 'volume' in message else 'No volume'
            issue = message['issue'] if 'issue' in message else 'No issue'
            pages = message['page'] if 'page' in message else 'No pages'
            reference = f"{author_str} ({year}). {title}. {journal}, {volume}({issue}), {pages}. doi:{doi}"
            return reference
        else:
            return None
    except Exception:
        return None

get_reference函数接收DOI参数doi,创建Crossref实例cr查询该DOI对应的参考文献信息。尝试获取查询结果,若结果中存在message字段,则从中提取作者、年份、标题、期刊、卷号、期号、页码等信息,格式化为APA格式参考文献字符串并返回;若查询失败或出现异常,返回None
4. 主处理函数

def convert_dois_in_word(input_file, output_file):
    doc = Document(input_file)
    all_dois = []
    doi_original_index = {}
    index = 1
    # 提取文档中所有DOI并编号
    for paragraph in doc.paragraphs:
        dois = extract_dois(paragraph.text)
        for doi in dois:
            if doi not in all_dois:
                all_dois.append(doi)
                doi_original_index[doi] = index
                index += 1
    references = []
    successful_dois = []
    failed_dois = []
    # 获取每个DOI的参考文献信息
    for doi in all_dois:
        reference = get_reference(doi)
        if reference:
            references.append(reference)
            successful_dois.append(doi)
        else:
            failed_dois.append(doi)
    # 将文档中的DOI替换为上标引用序号
    for paragraph in doc.paragraphs:
        for doi in all_dois:
            if doi in successful_dois:
                index = successful_dois.index(doi) + 1
                runs = paragraph.runs
                for run in runs:
                    if doi in run.text:
                        parts = run.text.split(doi)
                        run.text = parts[0]
                        new_run = paragraph.add_run(f"[{index}]")
                        new_run.font.superscript = True
                        run = paragraph.add_run(parts[1])
    # 在文档末尾添加参考文献列表
    doc.add_page_break()
    doc.add_heading('参考文献', level=1)
    for i, reference in enumerate(references, start=1):
        doc.add_paragraph(f"[{i}] {reference}")
    doc.save(output_file)
    # 打印转换结果
    print("成功转换的DOI:")
    for doi in successful_dois:
        print(doi)
    print("\n转换失败的DOI:")
    for doi in failed_dois:
        original_index = doi_original_index[doi]
        print(f"{original_index}. {doi}")

convert_dois_in_word函数接收输入、输出文件路径参数input_fileoutput_file。打开输入Word文档,遍历段落提取所有DOI,为每个唯一DOI编号并存储。尝试获取每个DOI的参考文献信息,区分成功与失败的DOI。再次遍历段落,将成功获取信息的DOI替换为上标引用序号。在文档末尾添加分页符、“参考文献”标题及格式化的参考文献列表,最后保存文档并打印成功和失败转换的DOI信息。
5. 使用示例

input_file = 'input.docx'
output_file = 'output.docx'
convert_dois_in_word(input_file, output_file)

定义输入、输出文件路径,调用convert_dois_in_word函数执行对Word文档DOI的转换和参考文献生成操作。