Can‘t find model ‘en_core_web_sm‘错误解决

发布于:2025-03-11 ⋅ 阅读:(19) ⋅ 点赞:(0)

        用spacy库进行文本数据的实体识别,代码如下:

import spacy

# load a pipelinef using the name of an installed package, a string path or a Path-like object.
nlp = spacy.load("en_core_web_sm")

text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
for ent in doc.ents:print(ent.text, ent.label_)

在python IDEL Shell中运行,报错如下:

Traceback (most recent call last):
  File "D:\source\python demo\entity-recognition.py", line 3, in <module>
    nlp = spacy.load("en_core_web_sm") # load a pipelinef using the name of an installed package, a string path or a Path-like object.
  File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\spacy\__init__.py", line 51, in load
    return util.load_model(
  File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\spacy\util.py", line 472, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

错误提示:OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

找不到“en_core_web_sm”这个python模块。

【解决方法和步骤】:

1. 查看安装的spacy的版本,cmd中输入命令:

>pip show spacy

显示当前版本是:3.8.4

2. 查找与spacy版本一致的en_core_web_sm模块

en_core_web_sm地址查找对应版本的en_core_web_sm-3.8.0的链接并下载安装包en_core_web_sm-3.8.0.tar.gz

3. 安装en_core_web_sm-3.8.0,cmd输入命令:

pip install XXX\en_core_web_sm-3.8.0.tar.gz

注意XXX表示en_core_web_sm-3.8.0.tar.gz文件的存放路径,视实际情况而定。

安装成功会显示:

Defaulting to user installation because normal site-packages is not writeable
Processing f:\chromedownload\en_core_web_sm-3.8.0.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: en_core_web_sm
  Building wheel for en_core_web_sm (pyproject.toml) ... done
  Created wheel for en_core_web_sm: filename=en_core_web_sm-3.8.0-py3-none-any.whl size=12806171 sha256=4234cf698b46566be25d426394d8d095dee4b7563936c44c6714d8cf56619469
  Stored in directory: c:\users\jx\appdata\local\pip\cache\wheels\90\e0\2a\2251f0107678422c64ebd606676a42192a19277237c4575e03
Successfully built en_core_web_sm
Installing collected packages: en_core_web_sm
Successfully installed en_core_web_sm-3.8.0

4. 再次运行文章开头的文本数据的实体识别代码,显示:

Apple ORG
U.K. GPE
$1 billion MONEY

表示文本数据的实体识别成功

【博主按】网上有很多博客是直接通过命令:

python -m spacy download en_core_web_sm

安装en_core_web_sm,官方文档也是这么推荐的。本人实际操作中,出现如下错误:

Traceback (most recent call last):
  File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\urllib3\connection.py", line 199, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\urllib3\util\connection.py", line 60, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.2544.0_x64__qbz5n2kfra8p0\Lib\socket.py", line 978, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 11004] getaddrinfo failed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\urllib3\connectionpool.py", line 789, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\urllib3\connectionpool.py", line 490, in _make_request
    raise new_e

......

如红色部分提示,解读出来,根本原因是:Winsock错误,在 Windows 操作系统中,这通常表示 "WSAETIMEDOUT"(超时),通常因为一个网络连接尝试,等待时间超时导致。考虑是我的网络原因,导致连接github服务器超时。