Python Web Scraper / Crawler – HTML Tables to Excel Spreadsheet

我试图做一个网站刮板，将从网站拉表，然后将其粘贴到Excel电子表格。我是Python的EXTREME初学者（一般编码） – 几天前从字面上开始学习。

那么，我该如何制作这个网页抓取工具？这里是我有的代码：

import csv import requests from BeautifulSoup import BeautifulSoup url = 'https://www.techpowerup.com/gpudb/?mobile=0&released%5B%5D=y14_c&released%5B%5D=y11_14&generation=&chipname=&interface=&ushaders=&tmus=&rops=&memsize=&memtype=&buswidth=&slots=&powerplugs=&sort=released&q=' response = requests.get(url) html = response.content soup = BeautifulSoup(html) table = soup.find('table', attrs={'class': 'processors'}) list_of_rows = [] for row in table.findAll('tr')[1:]: list_of_cells = [] for cell in row.findAll('td'): text = cell.text.replace('&nbsp;', '') list_of_cells.append(text) list_of_rows.append(list_of_cells) outfile = open("./GPU.csv", "wb") writer = csv.writer(outfile) writer.writerow(["Product Name", "GPU Chip", "Released", "Bus", "Memory", "GPU clock", "Memory clock", "Shaders/TMUs/ROPs"]) writer.writerows(list_of_rows)

现在程序工作在上面的代码中的网站。

现在，我想从以下网站刮表： https ： //www.techpowerup.com/gpudb/2990/radeon-rx-560d

请注意，此页面上有几个表格。我应该添加/更改什么才能使程序在本页面上工作？我试图把所有的桌子都拿来，但是如果有人能帮我拿到其中的一个，我将非常感激！

基本上，你只需要修改你的问题中的代码，以说明该网站有几个表的事实！

什么是真正整洁（或者，我敢说，美丽）关于BeautifulSoup（BS）是findAll方法！这将创build一个BS对象，您可以迭代！

所以说，你有5个表格在你的来源。你可以设想运行tables = soup.findAll("table") ，它将返回源代码中每个表对象的列表！然后，您可以遍历该BS对象，并从每个相应的表中提取信息。

你的代码可能看起来像这样：

 import csv import requests import bs4 url = 'https://www.techpowerup.com/gpudb/2990/radeon-rx-560d' response = requests.get(url) html = response.content soup = bs4.BeautifulSoup(html, "lxml") tables = soup.findAll("table") tableMatrix = [] for table in tables: #Here you can do whatever you want with the data! You can findAll table row headers, etc... list_of_rows = [] for row in table.findAll('tr')[1:]: list_of_cells = [] for cell in row.findAll('td'): text = cell.text.replace('&nbsp;', '') list_of_cells.append(text) list_of_rows.append(list_of_cells) tableMatrix.append((list_of_rows, list_of_cells)) print(tableMatrix)

此代码的作品，但我会注意到，我没有添加任何原始代码的CSV文件格式！你将不得不重新devise，但它适用于你。但是我在这个地方评论说，你可以自由地为源码中的每个表做任何事情。您可以决定在每个表格对象中findAll("th")元素，并像这样填充您的CSV文件，也可以从单元格本身提取信息。现在我将每个表格的单元格数据保存在一个元组中，我将它附加到列表tableMatrix 。

我希望这可以帮助你在Python和BeautifulSoup冒险！

资料来源：

BeautifulSoup从多个表中提取数据
Python Web Scraper / Crawler – HTML Tables to Excel Spreadsheet
BeautifulSoup4文件

Python Web Scraper / Crawler – HTML Tables to Excel Spreadsheet

在python中将值写入excel中的一行中的单元格

每次脚本运行时插入新数据而不覆盖现有数据（Openpyxl：python）

将结果写入.xls（将2个查询提交到网页，并将不同的结果存储到.xls中）

在python中为csv添加新行

在Python中使用Web Scraper格式

美丽的汤：提取天气信息：表 – > Excel文件