表情包这个东西,现在每个人聊天都会看到。有时候自己发完文字后,不配一个表情包都会觉得很不习惯。不止是在聊天,就比如我现在发文章都要配几个表情包。
跟刚认识的朋友在聊天时,是不是的发几个表情包,都感觉能更快的拉近关系 ~
但是一个个保存表情包太麻烦了。用python一键保存就很方便!
接下来教你们如何用python一键保存千张表情包图
首先开发环境配置
Python 3.6
Pycharm
其次就是代码了
导入模块
import requests
import parsel
import re
import time
请求网址
url = f'fabiaoqing/biaoqing/lists/page/{page}.html'
请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
}
返回网页原代码
response = requests.get(url=url, headers=headers)
解析数据
selector = parsel.Selector(response.text) # 把respons.text 转换成 selector 对象
第一次提取 提取所有的div标签内容
divs = selector.css('#container div.tagbqppdiv') # css 根据标签提取内容
通过标签内容提取他的图片url地址
img_url = div.css('img::attr(data-original)').get()
提取标题
title = div.css('img::attr(title)').get()
获取图片的后缀名
name = img_url.split('.')[-1]
保存数据
new_title = change_title(title)
对表情包图片发送请求 获取它二进制数据
img_content = requests.get(url=img_url, headers=headers).content
保存数据
def save(title, img_url, name):
img_content = get_response(img_url).content
try:
with open('img\\' + title + '.' + name, mode='wb') as f:
# 写入图片二进制数据
f.write(img_content)
print('正在保存:', title)
except:
pass
替换标题中的特殊字符 - 因为文件命名不明还有特殊符号,所以我们需要通过正则表达式替换掉特殊字符
def change_title(title):
mode = re.compile(r'[\\\/\:\*\?\"\<\>\|]')
new_title = re.sub(mode, "_", title)
return new_title
记录时间
time_2 = time.time()
use_time = int(time_2) - int(time_1)
print(f'总共耗时:{use_time}秒')
朋友们,这里是单线程,下面是多线程,那我就直接上代码了
import requests
import parsel
import re
import time
import concurrent.futures
def change_title(title):
mode = re.compile(r'[\\\/\:\*\?\"\<\>\|]')
new_title = re.sub(mode, "_", title)
return new_title
def get_response(html_url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36'
}
repsonse = requests.get(url=html_url, headers=headers)
return repsonse
def save(title, img_url, name):
img_content = get_response(img_url).content
try:
with open('img\\' + title + '.' + name, mode='wb') as f:
f.write(img_content)
print('正在保存:', title)
except:
pass
def main(html_url):
html_data = get_response(html_url).text
selector = parsel.Selector(html_data)
divs = selector.css('#container div.tagbqppdiv')
for div in divs:
img_url = div.css('img::attr(data-original)').get()
title = div.css('img::attr(title)').get()
name = img_url.split('.')[-1]
new_title = change_title(title)
save(new_title, img_url, name)
if __name__ == '__main__':
time_1 = time.time()
exe = concurrent.futures.ThreadPoolExecutor(max_workers=10)
for page in range(1, 201):
url = f'fabiaoqing/biaoqing/lists/page/{page}.html'
exe.submit(main, url)
exe.shutdown()
time_2 = time.time()
use_time = int(time_2) - int(time_1)
print(f'总共耗时:{use_time}秒')
ok,十几秒一千多张,这速度有点快啊
朋友们看完的话,觉得有用不错的话,给我点个赞吧 ~