将电子表格的列存储在Python字典中

我有一个表格存储在Excel文件中，如下所示：

物种花园绿篱公园牧场林地
黑鸟47 10 40 2 2
花鸡19 3 5 0 2
伟大的山雀50 0 10 7 0
麻雀46 16 8 4 0
罗宾9 3 0 0 2
 Song Thrush 4 0 6 0 0

我正在使用xlrd Python库来读取这些数据。我没有任何问题，使用下面的代码将它读入列表列表（列表中的每一行存储为列表）：

 from xlrd import open_workbook wb = open_workbook("Sample.xls") headers = [] sdata = [] for s in wb.sheets(): print "Sheet:",s.name if s.name.capitalize() == "Data": for row in range(s.nrows): values = [] for col in range(s.ncols): data = s.cell(row,col).value if row == 0: headers.append(data) else: values.append(data) sdata.append(values)

可能很明显， headers是存储列标题的简单列表， sdata包含表格数据，存储为列表列表。这是他们看起来：

标题：

 [u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland']

SDATA：

 [[u'Blackbird', 47.0, 10.0, 40.0, 2.0, 2.0], [u'Chaffinch', 19.0, 3.0, 5.0, 0.0, 2.0], [u'Great Tit', 50.0, 0.0, 10.0, 7.0, 0.0], [u'House Sparrow', 46.0, 16.0, 8.0, 4.0, 0.0], [u'Robin', 9.0, 3.0, 0.0, 0.0, 2.0], [u'Song Thrush', 4.0, 0.0, 6.0, 0.0, 0.0]]

但是我想将这些数据存储到一个Python字典中，每列都作为包含每列的所有值的列表的关键字。例如（仅显示一部分数据以节省空间）：

 dict = { 'Species': ['Blackbird','Chaffinch','Great Tit'], 'Garden': [47,19,50], 'Hedgerow': [10,3,0], 'Parkland': [40,5,10], 'Pasture': [2,0,7], 'Woodland': [2,2,0] }

所以，我的问题是：我怎么能做到这一点？我知道我可以通过列读取数据，而不是像上面的代码片段那样通过行读取数据，但我无法弄清楚如何将字段存储在字典中。

预先感谢您提供的任何帮助。

一旦你有了专栏，这很容易：

 dict(zip(headers, sdata))

实际上，在你的例子中看起来像是sdata可能是行数据，即使如此，这仍然是相当容易的，你也可以用zip来转置表：

 dict(zip(headers, zip(*sdata)))

其中之一就是你所要求的。

1。 XLRD

我强烈推荐使用来自集合库的defaultdict。每个键的值将以默认值启动，在这种情况下是一个空列表。我没有把那么多的exception抓到那里，你可能想添加基于你的用例的exception检测。

 import xlrd import sys from collections import defaultdict result = defaultdict(list) workbook = xlrd.open_workbook("/Users/datafireball/Desktop/stackoverflow.xlsx") worksheet = workbook.sheet_by_name(workbook.sheet_names()[0]) headers = worksheet.row(0) for index in range(worksheet.nrows)[1:]: try: for header, col in zip(headers, worksheet.row(index)): result[header.value].append(col.value) except: print sys.exc_info() print result

输出：

 defaultdict(<type 'list'>, {u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']})

2。 pandas

 import pandas as pd xl = pd.ExcelFile("/Users/datafireball/Desktop/stackoverflow.xlsx") df = xl.parse(xl.sheet_names[0]) print df

输出，你无法想象使用数据框可以获得多大的灵活性。

  Species Garden Hedgerow Parkland Pasture Woodland 0 Blackbird 47 10 40 2 2 1 Chaffinch 19 3 5 0 2 2 Great Tit 50 0 10 7 0 3 House Sparrow 46 16 8 4 0 4 Robin 9 3 0 0 2 5 Song Thrush 4 0 6 0 0

我会贡献自己，为我自己的问题提供另一个答案！

在发布我的问题之后，我发现了pyexcel – 一个非常小的Python库，它充当其他电子表格处理软件包（即xlrd和odfpy）的包装。它有一个很好的to_dict方法，它正是我想要的（即使不需要转置表）！

这是一个例子，使用上面的数据：

 from pyexcel import SeriesReader from pyexcel.utils import to_dict sheet = SeriesReader("Sample.xls") print sheet.series() #--- just the headers, stored in a list data = to_dict(sheet) print data #--- the full dataset, stored in a dictionary

输出：

 u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland'] {u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']}

希望它也有帮助！

如果XLRD无法解决您的问题，请考虑查看XLWings 。其中一个示例video演示了如何从Excel表格中获取数据并将其导入pandas数据框，这比字典更有用。

如果你真的想要一本字典，大pandas可以很容易地转换，看到这里。

这个脚本允许你将excel数据转换成字典列表

 import xlrd workbook = xlrd.open_workbook('Sample.xls') workbook = xlrd.open_workbook('Sample.xls', on_demand = True) worksheet = workbook.sheet_by_index(0) first_row = [] # The row where we stock names of columns for col in range(worksheet.ncols): first_row.append( worksheet.cell_value(0,col) ) # tronsform the workbook to a list of dictionnary data =[] for row in range(1, worksheet.nrows): elm = {} for col in range(worksheet.ncols): elm[first_row[col]]=worksheet.cell_value(row,col) data.append(elm) print data

将电子表格的列存储在Python字典中

在Openpyxl中使用嵌套字典创build一个列表

使用python和xlrd，从电子表格中读取2列的最佳方法是什么？

如何写excel文件（行和列）和unicode字符的单词？使用Java程序

XLRD / Python：使用for-loops将Excel文件读入dict