爬虫入门教程

时间：2026-04-23 10:48:16

1、打开Python编辑器。

>>> import requests

>>> html = requests.get('百度')

这里以百度为示范，引入requests库，对网页进行请求。

爬虫入门教程

2、>>> html.raise_for_status()

>>> print(html)

<Response [200]>

我们看一下状态是否有问题，200证明打开网页没问题。

爬虫入门教程

3、>>> from bs4 import BeautifulSoup

>>> soup = BeautifulSoup(html.content, 'lxml')

>>> print(soup)

这个时候我们需要借助BeautifulSoup和lxml来解析网页，并且打印一下，看一下有没问题问题。

爬虫入门教程

4、我们到百度网页，右键单击检查元素，查看代码是否和刚刚的一致。

爬虫入门教程

爬虫入门教程

5、>>> print(soup.title)

<title>百度一下，你就知道</title>

>>> print(soup.title.string)

百度一下，你就知道

没问题，我们就开始下一步，最简单的就是爬取网页的名字和标题。

爬虫入门教程

6、>>> print(soup.a)

>>> print(soup.p)

但是我们需求比较多的是要获得便签的内容，比如a和p，但是这里只能返回一个数据。

爬虫入门教程

7、>>> print(soup.findAll(class_="mnav"))

>>> for i in soup.findAll(class_="mnav"):

print(i.string)

因此我们可以借助findAll来进行查找全部，class来进行定位。

爬虫入门教程

8、>>> for i in soup.findAll(class_="mnav"):

print(i.get("href"))

还有一个入门必须知道的就是获取里面的链接，一般都是要获取href。

爬虫入门教程

Axure引用html

Axure制作网页头部

Axure RP8怎样将页面局部转换为母版？

Pycharm怎么确保保存的文件均以换行结束

如何用Axure制作动态倒计时

热门搜索

凯恩斯旅游中国旅游胜地排行榜五台山旅游景点荥阳旅游景点大全佳木斯旅游景点大全南山佛教文化旅游区重庆海外旅游百事通去朝鲜旅游多少钱西安旅游职业中专周口旅游景点大全

Copyright © 2026 长短途 All Rights Reserved 信息来自网络，所有数据仅供参考，有任何疑问请联系站长联系邮箱

联系邮箱