Python 系列教案第 4 课:豆瓣电影 Top250 爬虫

张开发
2026/4/10 1:42:10 15 分钟阅读

分享文章

Python 系列教案第 4 课:豆瓣电影 Top250 爬虫
一、完整 Python 案例代码展示Pythonimport requestsfrom lxml import etreeimport csvimport timefrom typing import List, Dictclass DoubanTop250Spider:"""豆瓣电影Top250爬虫,完整爬取+清洗+保存CSV"""def __init__(self):self.base_url = "https://movie.douban.com/top250"self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) ""AppleWebKit/537.36 (KHTML, like Gecko) ""Chrome/120.0.0.0 Safari/537.36"}self.movie_list: List[Dict] = []def get_page_html(self, url: str) - str:"""获取单页HTML源码,带异常处理"""try:resp = requests.get(url, headers=self.headers, timeout=10)resp.raise_for_status() # 抛出4xx/5xx错误return resp.textexcept Exception as e:print(f"请求失败:{e}")return ""def parse_page(self, html: str) - List[Dict]:"""解析单页电影数据:排名、标题、导演、评分、引言"""tree = etree.HTML(html)items = tree.xpath('//div[@class="item"]')page_data = []nbs

更多文章