将从Excel中读取的数据组织到Pandas DataFrame中

我的这个脚本的目标是：1.从excel文件（> 100,000k行）以及标题（标签，单位）中读取timseries数据2.将excel数字date转换为pandas dataFrame的最佳date时间对象3.能够使用时间戳来引用行和系列标签来引用列

到目前为止，我用xlrd来读取excel数据到列表中。 pandas系列与每个列表和使用时间列表作为索引。与系列标题结合起来制作python字典。将字典传递给pandas DataFrame。尽pipe我的努力df.index似乎被设置为列标题，我不知道什么时候将date转换为date时间对象。

我刚刚开始使用python 3天前，所以任何build议将是伟大的！这是我的代码：

#Open excel workbook and first sheet wb = xlrd.open_workbook("C:\GreenCSV\Calgary\CWater.xlsx") sh = wb.sheet_by_index(0) #Read rows containing labels and units Labels = sh.row_values(1, start_colx=0, end_colx=None) Units = sh.row_values(2, start_colx=0, end_colx=None) #Initialize list to hold data Data = [None] * (sh.ncols) #read column by column and store in list for colnum in range(sh.ncols): Data[colnum] = sh.col_values(colnum, start_rowx=5, end_rowx=None) #Delete unecessary rows and columns del Labels[3],Labels[0:2], Units[3], Units[0:2], Data[3], Data[0:2] #Create Pandas Series s = [None] * (sh.ncols - 4) for colnum in range(sh.ncols - 4): s[colnum] = Series(Data[colnum+1], index=Data[0]) #Create Dictionary of Series dictionary = {} for i in range(sh.ncols-4): dictionary[i]= {Labels[i] : s[i]} #Pass Dictionary to Pandas DataFrame df = pd.DataFrame.from_dict(dictionary)

你可以直接在这里使用pandas，我通常喜欢创build一个DataFrames的字典（键是表名）：

 In [11]: xl = pd.ExcelFile("C:\GreenCSV\Calgary\CWater.xlsx") In [12]: xl.sheet_names # in your example it may be different Out[12]: [u'Sheet1', u'Sheet2', u'Sheet3'] In [13]: dfs = {sheet: xl.parse(sheet) for sheet in xl.sheet_names} In [14]: dfs['Sheet1'] # access DataFrame by sheet name

你可以看看parse的文档，它提供了一些更多的选项（例如skiprows ），这些skiprows允许你用更多的控制来parsing单个表单。

将从Excel中读取的数据组织到Pandas DataFrame中

将Excel附件保存为.txt – 使用Outlook 2010中的vbamacros打开

在excel中复制范围，在MSWord文档中查找特定的文本，replace为剪贴板图像

IF语句logical_test中的{1,0}在数组函数中有什么作用？

如何根据'n'作为单元格中的数字插入'n'行数

为什么我不断收到错误1004工作表范围失败？

Excel：在数组中find最常见的值（IF，索引，模式）

Excel VBA从今天工作的人们收集姓名

Excel – 减去两个date

在VBA中快速编写条件if-then语句

将sql查询结果导出为使用vba标题