用spacy库进行文本数据的实体识别,代码如下:
import spacy
# load a pipelinef using the name of an installed package, a string path or a Path-like object.
nlp = spacy.load("en_core_web_sm")
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
for ent in doc.ents:print(ent.text, ent.label_)
在python IDEL Shell中运行,报错如下:
Traceback (most recent call last):
File "D:\source\python demo\entity-recognition.py", line 3, in <module>
nlp = spacy.load("en_core_web_sm") # load a pipelinef using the name of an installed package, a string path or a Path-like object.
File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\spacy\__init__.py", line 51, in load
return util.load_model(
File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\spacy\util.py", line 472, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
错误提示:OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory.
找不到“en_core_web_sm”这个python模块。
【解决方法和步骤】:
1. 查看安装的spacy的版本,cmd中输入命令:
>pip show spacy
显示当前版本是:3.8.4
2. 查找与spacy版本一致的en_core_web_sm模块
去en_core_web_sm地址查找对应版本的en_core_web_sm-3.8.0的链接并下载安装包en_core_web_sm-3.8.0.tar.gz
3. 安装en_core_web_sm-3.8.0,cmd输入命令:
pip install XXX\en_core_web_sm-3.8.0.tar.gz
注意XXX表示en_core_web_sm-3.8.0.tar.gz文件的存放路径,视实际情况而定。
安装成功会显示:
Defaulting to user installation because normal site-packages is not writeable
Processing f:\chromedownload\en_core_web_sm-3.8.0.tar.gz
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: en_core_web_sm
Building wheel for en_core_web_sm (pyproject.toml) ... done
Created wheel for en_core_web_sm: filename=en_core_web_sm-3.8.0-py3-none-any.whl size=12806171 sha256=4234cf698b46566be25d426394d8d095dee4b7563936c44c6714d8cf56619469
Stored in directory: c:\users\jx\appdata\local\pip\cache\wheels\90\e0\2a\2251f0107678422c64ebd606676a42192a19277237c4575e03
Successfully built en_core_web_sm
Installing collected packages: en_core_web_sm
Successfully installed en_core_web_sm-3.8.0
4. 再次运行文章开头的文本数据的实体识别代码,显示:
Apple ORG
U.K. GPE
$1 billion MONEY
表示文本数据的实体识别成功
【博主按】:网上有很多博客是直接通过命令:
python -m spacy download en_core_web_sm
安装en_core_web_sm,官方文档也是这么推荐的。本人实际操作中,出现如下错误:
Traceback (most recent call last):
File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\urllib3\connection.py", line 199, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\urllib3\util\connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.2544.0_x64__qbz5n2kfra8p0\Lib\socket.py", line 978, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 11004] getaddrinfo failedThe above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\urllib3\connectionpool.py", line 789, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\JX\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\urllib3\connectionpool.py", line 490, in _make_request
raise new_e......
如红色部分提示,解读出来,根本原因是:Winsock错误,在 Windows 操作系统中,这通常表示 "WSAETIMEDOUT"(超时),通常因为一个网络连接尝试,等待时间超时导致。考虑是我的网络原因,导致连接github服务器超时。