用BeautifulSoup刮胡子盒，用pandas导出到Excel

我一直在试图弄清楚如何用Python 3.6以及BeautifulSoup和Pandas模块从Fangraphs中刮取棒球盒子的分数。我的最终目标是将网页的不同部分保存到Excel中的不同表格中。

为了做到这一点，我想我必须分别拉他们各自的id标签每个表。这是构成第一个Excel表格的四个表格（在页面上的图表下方）的代码。运行代码导致这个错误：

Traceback (most recent call last): File "Fangraphs Box Score Scraper.py", line 14, in <module> df1 = pd.read_html(soup,attrs={'id': ['WinsBox1_dghb','WinsBox1_dghp','WinsBox1_dgab','WinsBox1_dgap']}) File "C:\Python36\lib\site-packages\pandas\io\html.py", line 906, in read_html keep_default_na=keep_default_na) File "C:\Python36\lib\site-packages\pandas\io\html.py", line 743, in _parse raise_with_traceback(retained) File "C:\Python36\lib\site-packages\pandas\compat\__init__.py", line 344, in raise_with_traceback raise exc.with_traceback(traceback) TypeError: 'NoneType' object is not callable

 import requests from bs4 import BeautifulSoup import pandas as pd url = 'http://www.fangraphs.com/boxscore.aspx?date=2017-09-10&team=Red%20Sox&dh=0&season=2017' response = requests.get(url) soup = BeautifulSoup(response.text,"lxml") df1 = pd.read_html(soup,attrs={'id': ['WinsBox1_dghb','WinsBox1_dghp','WinsBox1_dgab','WinsBox1_dgap']}) writer = pd.ExcelWriter('Box Scores.xlsx') df1.to_excel(writer,'Traditional Box Scores')

你使用错误的id ，你把它forms<div>但需要从<table>标签read_html attrs采取，我认为你不需要使用BS，试试看：

 import pandas as pd url = 'http://www.fangraphs.com/boxscore.aspx?date=2017-09-10&team=Red%20Sox&dh=0&season=2017' df1 = pd.read_html( url, attrs={'id': ['WinsBox1_dghb_ctl00', 'WinsBox1_dgab_ctl00']} ) # and now df1 it is list of df writer = pd.ExcelWriter('Box Scores.xlsx') row = 0 for df in df1: df.to_excel(writer, sheet_name='tables', startrow=row , startcol=0) row = row + len(df.index) + 3 writer.save()

用BeautifulSoup刮胡子盒，用pandas导出到Excel

Python – 将数据格式化为Excel电子表格使用pandas

VBA到Python转换使用beautifulsoup

编写指定行号和列号的Excel文件 – openpyxl

parsing不同数量的标签的XML，以使长度相等的列表。 openpyxl和Beautifulsoup

如何使用汤＆python从Wikipedia的表中的特定列下的内容

Excel：parsing地址

Python 3.5 | 分割列表并导出到Excel或CSV

BeautifulSoup + xlwt：将HTML表格的内容放入Excel中