在pandas中更改dataframe的堆栈

我有一个像这样的数据框。

Name 2012 2013 2014 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 A abcdefgabcdefghijklma bcdefghijklm B abcdefgabcdefghijklma bcdefghijklm 

等等。 2012年,2013年,2014年代表年份和低于它们各自的月份,而a,b,c,d,e …代表NAME的价值,即A,B ..在相应的月份。 a,b,c,d,e …对于每个名称都是不同的,并且在这里仅用于表示目的。

目前,我做了以下工作:

 workbook = pd.ExcelFile('XYZ.xlsx') df = workbook.parse(sheetname='Page1-2') df2 = pd.melt(df, id_vars=["Name"], var_name="Date", value_name="Value") 

即我在df中导入了我的XYZ.xlsx文件。 使用pd.melt将dfsorting为df2。 df2的输出是这样的:

 Name Date Value A 2012 a A Unnamed b A Unnamed c A Unnamed d A Unnamed e A Unnamed f A Unnamed g A 2013 a A Unnamed b A Unnamed c A Unnamed d A Unnamed e 

等与其他年份和名称。 我想我的date列来这样的事情:

  Date 7/2012 8/2012 9/2012 10/2012 11/2012 12/2012 1/2013 2/2013 3/2013 4/2013 5/2013 6/2013 7/2013 8/2013 

根据最初的数据框架中提到的几个月和几年。 我不知道如何做到这一点。 任何帮助,高度赞赏!

打印(df.to_dict())我的示例数据库

 {'Name': {0: nan, 1: 'A', 2: 'B'}, 2012: {0: '07', 1: 'a', 2: 'a'},'Unnamed: 2': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 3': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 4': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 5': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 6': {0: '12', 1: 'f', 2: 'f'}, '2013': {0: '01', 1: 'a', 2: 'a'}, 'Unnamed: 8': {0: '02', 1: 'b', 2: 'b'}, 'Unnamed: 9': {0: '03', 1: 'c', 2: 'c'}, 'Unnamed: 10': {0: '04', 1: 'd', 2: 'd'}, 'Unnamed: 11': {0: '05', 1: 'e', 2: 'e'}, 'Unnamed: 12': {0: '06', 1: 'f', 2: 'f'}, 'Unnamed: 13': {0: '07', 1: 'a', 2: 'a'}, 'Unnamed: 14': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 15': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 16': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 17': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 18': {0: '12', 1: 'f', 2: 'f'}, '2014': {0: '01', 1: 'a', 2: 'a'}, 'Unnamed: 20': {0: '02', 1: 'b', 2: 'b'}, 'Unnamed: 21': {0: '03', 1: 'c', 2: 'c'}, 'Unnamed: 22': {0: '04', 1: 'd', 2: 'd'}, 'Unnamed: 23': {0: '05', 1: 'e', 2: 'e'}, 'Unnamed: 24': {0: '06', 1: 'f', 2: 'f'}, 'Unnamed: 25': {0: '07', 1: 'a', 2: 'a'}, 'Unnamed: 26': {0: '08', 1: 'b', 2: 'b'}, 'Unnamed: 27': {0: '09', 1: 'c', 2: 'c'}, 'Unnamed: 28': {0: '10', 1: 'd', 2: 'd'}, 'Unnamed: 29': {0: '11', 1: 'e', 2: 'e'}, 'Unnamed: 30': {0: '12', 1: 'f', 2: 'f'}} 

使用:

 #create index with column Name df = df.set_index('Name') #create Multiindex with columns (add instead Unammed categories) and first row idx = pd.Series(df.columns) df.columns = pd.MultiIndex.from_arrays([idx.mask(idx.str.contains('Unnamed:')).ffill(), df.iloc[0]], names=('Date','Month')) #remove first row df = df.iloc[1:] print (df) Date 2012 2013 ... 2014 Month 07 08 09 10 11 12 01 02 03 04 ... 03 04 05 06 07 08 09 10 11 12 Name ... A abcdefabcd ... cdefghijkl B abcdefabcd ... cdefghijkl print (df.columns) MultiIndex(levels=[['2012', '2013', '2014'], ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']], labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]], names=['Date', 'Month']) 

 #reshape df2 = df.unstack().reset_index(name='Value') df2['Date'] = df2['Month'] + '/' + df2['Date'] df2 = df2.drop('Month', axis=1) print (df2) Date Name Value 0 07/2012 A a 1 07/2012 B a 2 08/2012 A b 3 08/2012 B b 4 09/2012 A c 5 09/2012 B c 6 10/2012 A d 7 10/2012 B d 8 11/2012 A e 9 11/2012 B e 10 12/2012 A f 11 12/2012 B f 

如果可以从文件中读取df ,则可以将第一行和第二行的参数header=[0,1]添加到MultiIndex并将第一列Nameindex 。 然后解决scheme有点改变:

 df = pd.read_csv('filename', header=[0,1], index_col=[0]) idx = pd.Series(df.columns.get_level_values(0)) df.columns = pd.MultiIndex.from_arrays([idx.mask(idx.str.contains('Unnamed:')).ffill(), df.columns.get_level_values(1)], names=('Date','Month')) print (df) Date 2012 2013 ... 2014 Month 07 08 09 10 11 12 01 02 03 04 ... 03 04 05 06 07 08 09 10 11 12 Name ... A abcdefabcd ... cdefghijkl B abcdefabcd ... cdefghijkl print (df.columns) MultiIndex(levels=[['2012', '2013', '2014'], ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']], labels=[[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]], names=['Date', 'Month']) 

 #reshape df2 = df.unstack().reset_index(name='Value').rename(columns={'level_2':'Name'}) df2['Date'] = df2['Month'].astype(str) + '/' + df2['Date'].astype(str) #df2['Date'] = pd.to_datetime(df2['Date'].radd('1/'), format='%d/%m/%y') df2 = df2.drop('Month', axis=1) print (df2) Date Name Value 0 07/2012 A a 1 07/2012 B a 2 08/2012 A b 3 08/2012 B b 4 09/2012 A c 5 09/2012 B c 6 10/2012 A d 7 10/2012 B d 8 11/2012 A e