用NaT从dataframe中提取pandas多指数

我正在使用pandas来parsingExcel电子表格。 电子表格有几个工作表,每个工作表看起来像下面。 请注意,每列都具有对应于不同date的值,并且具有不同的长度:

Excel电子表格

无论什么原因,当大pandas分析Excel电子表格时,第一个工作表将第一列dateparsing为索引(即使index_col参数已指定为None)。 这仍然可以pipe理。

但是,在其他工作表中,它将索引parsing为多索引:

multiindex的屏幕截图

我想要做的是最终重build数据框,以便它们都共享一个通用的date索引,并且任何没有值的date都被NaN填充。 但是,我似乎无法从multiindex中提取date,甚至开始这个过程。

我试图执行两个级别0和1的dataframe上的reset_index() ,但它抱怨IndexError: cannot do a non-empty take from an empty axes. 我也尝试了unstack() ,但是这个抱怨ValueError: Index contains duplicate entries, cannot reshape

我想你使用read_excel参数parse_colsheaderindex_col 。 然后通过iloc从每一对创buildDataFrame,并最后将它们连接到一个:

 import pandas as pd df = pd.read_excel('f_name.xlsx', parse_cols=[0, 1, 3, 4, 7 , 8], index_col=0, header=0) #if you need reset NaT in index, but it is not necessary #df.index = df.index.to_series().fillna(0) print df Column_val1 Unnamed: 1 Column_val2 Unnamed: 3 Column_val3 1999-01-01 4 2000-01-01 5 2000-01-01 5 1999-01-02 1 2000-01-02 7 2000-01-02 7 1999-01-03 2 2000-01-03 8 2000-01-03 8 1999-01-04 3 2000-01-04 3 2000-01-04 3 1999-01-05 3 2000-01-05 6 2000-01-05 6 1999-01-06 3 2000-01-06 9 2000-01-06 9 1999-01-07 4 2000-01-07 1 2000-01-07 1 1999-01-08 6 2000-01-08 5 2000-01-08 5 1999-01-09 8 2000-01-09 2 2000-01-09 2 1999-01-10 2 2000-01-10 3 2000-01-10 3 1999-01-11 4 2000-01-11 47 2000-01-11 47 1999-01-12 5 2000-01-12 2 2000-01-12 2 NaT NaN 2000-01-13 8 2000-01-13 8 NaT NaN 2000-01-14 2 2000-01-14 2 NaT NaN 2000-01-15 87 2000-01-15 87 NaT NaN 2000-01-16 6 2000-01-16 6 NaT NaN 2000-01-17 89 2000-01-17 89 NaT NaN NaT NaN 2000-01-18 7 NaT NaN NaT NaN 2000-01-19 8 
 print df['Column_val1'] 1999-01-01 4 1999-01-02 1 1999-01-03 2 1999-01-04 3 1999-01-05 3 1999-01-06 3 1999-01-07 4 1999-01-08 6 1999-01-09 8 1999-01-10 2 1999-01-11 4 1999-01-12 5 NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN Name: Column_val1, dtype: float64 
 print df.set_index(df.iloc[:, 1])['Column_val2'] Unnamed: 1 2000-01-01 5 2000-01-02 7 2000-01-03 8 2000-01-04 3 2000-01-05 6 2000-01-06 9 2000-01-07 1 2000-01-08 5 2000-01-09 2 2000-01-10 3 2000-01-11 47 2000-01-12 2 2000-01-13 8 2000-01-14 2 2000-01-15 87 2000-01-16 6 2000-01-17 89 NaT NaN NaT NaN Name: Column_val2, dtype: float64 
 print df.set_index(df.iloc[:, 3])['Column_val3'] Unnamed: 3 2000-01-01 5 2000-01-02 7 2000-01-03 8 2000-01-04 3 2000-01-05 6 2000-01-06 9 2000-01-07 1 2000-01-08 5 2000-01-09 2 2000-01-10 3 2000-01-11 47 2000-01-12 2 2000-01-13 8 2000-01-14 2 2000-01-15 87 2000-01-16 6 2000-01-17 89 2000-01-18 7 2000-01-19 8 Name: Column_val3, dtype: int64 
 df = pd.concat([df['Column_val1'], df.set_index(df.iloc[:, 1])['Column_val2'], df.set_index(df.iloc[:, 3])['Column_val3'] ]) #better is use sort index df = df.sort_index() print df NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN NaT NaN 1999-01-01 4 1999-01-02 1 1999-01-03 2 1999-01-04 3 1999-01-05 3 1999-01-06 3 1999-01-07 4 1999-01-08 6 1999-01-09 8 1999-01-10 2 1999-01-11 4 1999-01-12 5 2000-01-01 5 2000-01-01 5 2000-01-02 7 2000-01-02 7 2000-01-03 8 2000-01-03 8 2000-01-04 3 2000-01-04 3 2000-01-05 6 2000-01-05 6 2000-01-06 9 2000-01-06 9 2000-01-07 1 2000-01-07 1 2000-01-08 5 2000-01-08 5 2000-01-09 2 2000-01-09 2 2000-01-10 3 2000-01-10 3 2000-01-11 47 2000-01-11 47 2000-01-12 2 2000-01-12 2 2000-01-13 8 2000-01-13 8 2000-01-14 2 2000-01-14 2 2000-01-15 87 2000-01-15 87 2000-01-16 6 2000-01-16 6 2000-01-17 89 2000-01-17 89 2000-01-18 7 2000-01-19 8 dtype: float64 
 #if you need remove rows where index is NaT print df[pd.notnull(df.index)] 1999-01-01 4 1999-01-02 1 1999-01-03 2 1999-01-04 3 1999-01-05 3 1999-01-06 3 1999-01-07 4 1999-01-08 6 1999-01-09 8 1999-01-10 2 1999-01-11 4 1999-01-12 5 2000-01-01 5 2000-01-01 5 2000-01-02 7 2000-01-02 7 2000-01-03 8 2000-01-03 8 2000-01-04 3 2000-01-04 3 2000-01-05 6 2000-01-05 6 2000-01-06 9 2000-01-06 9 2000-01-07 1 2000-01-07 1 2000-01-08 5 2000-01-08 5 2000-01-09 2 2000-01-09 2 2000-01-10 3 2000-01-10 3 2000-01-11 47 2000-01-11 47 2000-01-12 2 2000-01-12 2 2000-01-13 8 2000-01-13 8 2000-01-14 2 2000-01-14 2 2000-01-15 87 2000-01-15 87 2000-01-16 6 2000-01-16 6 2000-01-17 89 2000-01-17 89 2000-01-18 7 2000-01-19 8 dtype: float64