运行python爬虫遇到奇葩的问题


F:\bs\bosszp>python bosszp.py
https://www.zhipin.com/c101190100/?query=ETL工程师&page=1&ka=page-1
Traceback (most recent call last):
  File "bosszp.py", line 167, in <module>
    get_job(url=url, conn=conn, cursor=cursor, city_name_x="北京")
  File "bosszp.py", line 117, in get_job
    soup = BeautifulSoup(html, 'lxml')
  File "D:\software\DevEnv\Python\lib\site-packages\bs4\__init__.py", line 246, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
F:\bs\bosszp>pip install parser
ERROR: Could not find a version that satisfies the requirement parser (from versions: none)
ERROR: No matching distribution found for parser

F:\bs\bosszp>

解决方案(无效)

F:\bs\bosszp>pip install html_parser
Collecting html_parser
  Downloading html-parser-0.2.tar.gz (904 bytes)
Collecting ply
  Downloading ply-3.11-py2.py3-none-any.whl (49 kB)
     |████████████████████████████████| 49 kB 163 kB/s
Using legacy 'setup.py install' for html-parser, since package 'wheel' is not installed.
Installing collected packages: ply, html-parser
    Running setup.py install for html-parser ... done
Successfully installed html-parser-0.2 ply-3.11

F:\bs\bosszp>

然后发现不行


最终解决方案

pip install lxml

然后

soup = BeautifulSoup(html, 'lxml')
Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐