如何更改文件扩展名?

我试图从税务基金会网站上刮取一个'.xlsx'文件。 可悲的是,我不断收到一条错误消息: Excel cannot open the file '2017-FF-For-Website-7-10-2017.xlsx because the file format or file extension is not valid. verify that the file has not been corrupted and that the file extension matches the format of the file Excel cannot open the file '2017-FF-For-Website-7-10-2017.xlsx because the file format or file extension is not valid. verify that the file has not been corrupted and that the file extension matches the format of the file 。 我做了一些研究,它说解决这个问题的方法是将文件扩展名改为“.xls”,而不是“.xlsx”。 谁能帮忙?

 from bs4 import BeautifulSoup import urllib.request import os url = urllib.request.urlopen("https://taxfoundation.org/facts-figures-2017/") soup = BeautifulSoup(url, from_encoding=url.info().get_param('charset')) FHFA = os.chdir('C:/US_Census/Directory') seen = set() for link in soup.find_all('a', href=True): href = link.get('href') if not any(href.endswith(x) for x in ['.xlsx']): continue file = href.split('/')[-1] filename = file.rsplit('.', 1)[0] if filename not in seen: # only retrieve file if it has not been seen before seen.add(filename) # add the file to the set url = urllib.request.urlretrieve('https://taxfoundation.org/' + href, file) print(filename) print(' ') print("All files successfully downloaded.") 

PS我知道你可以下载这个文件,但我在网上抓取它来自动化一个特定的过程。

你的问题是你的url = urllib.request.urlretrieve('https://taxfoundation.org/' + href, file)行。 如果您进入网站并将鼠标hover在Excel下载button上,则会看到有更长的链接, https://files.taxfoundation.org/20170710170238/2017-FF-For-Website-7-10-2017.xlsx (注意2017....238 ?)。 所以你从来没有正确地下载Excel文件。 这是正确的路线:

url = urllib.request.urlretrieve(href, file)

其他一切工作正常。