python-pandas read_excel为index_col获取错误的数字

我试图阅读一个.xlsx文件,每个文件有4张,每个文件的时间和吸收率列如下所示:

Time Absorbance 0 0.1254 5 0.1278 10 0.128 15 0.1286 20 0.1303 25 0.1295 30 0.1296 35 0.1308 40 0.1301 45 0.1301 50 0.1309 ... 

我想要使​​每个工作表的DataFrame作为不同的列和行索引当前我的代码的时间如下所示:

 import numpy as np import pandas as pd, datetime as dt import glob, os runDir = "/Users/AaronT/Documents/Lab/Cascade/DTRA" if os.getcwd() != runDir: os.chdir(runDir) files = glob.glob("PTE_Kinetics*.xlsx") df = pd.DataFrame() for each in files: sheets = pd.ExcelFile(each).sheet_names for sheet in sheets: df[sheet] = pd.read_excel(each, sheet, index_col='Time') print df 

但是,我的输出没有适当的行索引值:

  Forced Wash Elution Wash Flow Through 0 0.1254 -0.0062 0.0544 0.0443 1 NaN NaN NaN NaN 2 NaN NaN NaN NaN 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN 5 0.1278 -0.0027 0.0560 0.0459 6 NaN NaN NaN NaN 7 NaN NaN NaN NaN 8 NaN NaN NaN NaN 9 NaN NaN NaN NaN 10 0.1280 -0.0004 0.0564 0.0467 11 NaN NaN NaN NaN 12 NaN NaN NaN NaN 13 NaN NaN NaN NaN 14 NaN NaN NaN NaN ... 

也许我不了解index_col是如何工作的,我可以在适当的时候为每个表格创build一个单独的DataFrame,但我更喜欢它们都在同一个表格上。 有什么build议么?

编辑:这是一个链接到Excel文件 。

注意:每张纸都被正确读取,你只是没有把它们粘在一起:

 In [11]: for sheet in e.sheet_names: print(pd.read_excel("PTE_Kinetics_04-30-2015.xlsx", sheet, index_col='Time').head(3)) Absorbance Time 0 0.1254 5 0.1278 10 0.1280 Absorbance Time 0 -0.0062 5 -0.0027 10 -0.0004 Absorbance Time 0 0.0544 5 0.0560 10 0.0564 Absorbance Time 0 0.0443 5 0.0459 10 0.0467 

而不是作为一个数据框,我宁愿把它们提取成字典:

 d = {} for sheet in e.sheet_names: d[sheet] = pd.read_excel("PTE_Kinetics_04-30-2015.xlsx", sheet, index_col='Time').head(3) 

现在你可以把它们粘在一起(不用担心excel):

 In [21]: pd.concat(d).unstack(0) Out[21]: Absorbance Elution Flow Through Forced Wash Wash Time 0 -0.0062 0.0443 0.1254 0.0544 5 -0.0027 0.0459 0.1278 0.0560 10 -0.0004 0.0467 0.1280 0.0564 

你只需要将它设置为“时间”列作为索引:

 In [5]: df= pd.ExcelFile('PTE_Kinetics_04-30-2015.xlsx') In [7]: sh = df.parse('Forced Wash') In [8]: sh.head() Out[8]: Time Absorbance 0 0 0.1254 1 5 0.1278 2 10 0.1280 3 15 0.1286 4 20 0.1303 In [9]: sh.set_index('Time').head() Out[9]: Absorbance Time 0 0.1254 5 0.1278 10 0.1280 15 0.1286 20 0.1303 

或者,将index_col传递给列名。

 In [12]: df.parse(i, index_col='Time').head() Out[12]: Absorbance Time 0 0.0443 5 0.0459 10 0.0467 15 0.0474 20 0.0480 

看来你在这个过程的其余部分正在顺利进行。