Python BeautifulSoup Scrape数据写入Excel“NotImplementedError”

我正在尝试编写一个脚本,用Python和BeautifulSoup来抓取网站,然后将数据写入excel表单。

它的工作,直到写作部分,然后我得到一个NotImplementedError ? 我查了一下,然后用TRY:和Pass:blocks ….将代码的写入部分包围起来。它解决了Python解释器控制台窗口中的错误,但是我的Excel表格是空白的。

这是我到目前为止:

 import requests, openpyxl from bs4 import BeautifulSoup wb = openpyxl.Workbook('RDWM_CRM.xls') wb.create_sheet('Phone') sheet = wb.get_sheet_by_name('Phone') # nav to webpage I want to scrape url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=New%20York%2C%20NY&page=2" r = requests.get(url) soup = BeautifulSoup(r.content) # for loop finds info then prints for div in soup.find_all("div", {"class": "info"}): print (div.contents[0].text) print (div.contents[1].text) # for loop finds info then writes to excel cells for div in soup.find_all("div", {"class": "info"}): sheet['A1'] = div.contents[0].text sheet['B1'] = div.contents[1].text wb.save('RDWM_CRM.xls') 

就像我上面所说的,即使没有错误,我也得到一张空白的excel表格。 这是在控制台中看到的回溯:

 Neptune Construction Serving the New York Area.(866) 664-1759 >>> # for loop finds info then writes to excel cells ... for div in soup.find_all("div", {"class": "info"}): ... sheet['A1'] = div.contents[0].text ... sheet['B1'] = div.contents[1].text ... Traceback (most recent call last): File "<stdin>", line 3, in <module> File "C:\Users\Josh\AppData\Local\Programs\Python\Python35\lib\site-packages\openpyxl\writer\write_only.py", line 223, in removed_method raise NotImplementedError NotImplementedError >>> wb.save('RDWM_CRM.xls') 

这是最后一块数据以及错误。





谢谢您的帮助!! 我仍然遇到excel工作表空白…这里是我使用的代码,没有错误….只是一个空白的Excel工作表。 它创build名为电话的新表,它只是空白…

 import requests from bs4 import BeautifulSoup from openpyxl import Workbook url = "http://www.yellowpages.com/search?search_terms=roofing%20company&geo_location_terms=Seattle%2C%20WA&page=4" # nav to webpage I want to scrape r = requests.get(url) soup = BeautifulSoup(r.content) # create a dummy list of texts to write to excel file divs = [] wb = Workbook() # open new workbook, use load_workbook if existing ws = wb.create_sheet('Phone') for div in divs: row = [div.contents[0].text, div.contents[1].text] # construct a row: shown only for example purposes ws.append(row) # could use ws.append(div) since each div is a list wb.save('RDWM_CRM.xlsx') # save workbook, will overwrite if exists 

任何帮助表示赞赏!

如果我没有完全理解你的问题,可以提前道歉,但是使用openpyxl似乎存在一些问题。

下面是一个使用openpyxl编写工作表的例子,可能会有所帮助:

 from openpyxl import Workbook # create a dummy list of texts to write to excel file divs = [[chr(i)*8, chr(i+1)*8] for i in range(65, 75, 1)] wb = Workbook() # open new workbook, use load_workbook if existing ws = wb.create_sheet(title="Example") for div in divs: row = [div[0], div[1]] # construct a row: shown only for example purposes ws.append(row) # could use ws.append(div) since each div is a list wb.save('example.xlsx') # save workbook, will overwrite if exists 

虚拟列表div看起来像这样:

 [['AAAAAAAA', 'BBBBBBBB'], ['BBBBBBBB', 'CCCCCCCC'], ['CCCCCCCC', 'DDDDDDDD'], ['DDDDDDDD', 'EEEEEEEE'], ['EEEEEEEE', 'FFFFFFFF'], ['FFFFFFFF', 'GGGGGGGG'], ['GGGGGGGG', 'HHHHHHHH'], ['HHHHHHHH', 'IIIIIIII'], ['IIIIIIII', 'JJJJJJJJ'], ['JJJJJJJJ', 'KKKKKKKK']] 

而excel文件example.xlsx有这个工作表的例子:

  AB 1 AAAAAAAA BBBBBBBB 2 BBBBBBBB CCCCCCCC 3 CCCCCCCC DDDDDDDD 4 DDDDDDDD EEEEEEEE 5 EEEEEEEE FFFFFFFF 6 FFFFFFFF GGGGGGGG 7 GGGGGGGG HHHHHHHH 8 HHHHHHHH IIIIIIII 9 IIIIIIII JJJJJJJJ 10 JJJJJJJJ KKKKKKKK 

你会构build一个像这样的行:

 row = [div.contents[0].text, div.contents[1].text] 

假设div.contents是正确的。 希望这可以帮助。 PS。 我正在使用openpyxl版本2.3.0