python:pandas – 如何将前两行的pandas数据框结合到数据框头?

我想读取一个如下所示的Excel文件:

在这里输入图像说明

我也有一个脚本,将这个xlsx文件转换成csv文件的名单(如果三张可用,那么它将创build三个不同的csv文件)。

这是csv文件如下所示:

Unnamed: 0,Gender A,Unnamed: 2,Gender B,Unnamed: 4,Gender C,Gender D date,Male,Female,Male,Female,Male,Female 2017-01-01 00:00:00,2,3,3,2,3,3 2017-01-02 00:00:00,5,7,7,42,3,5 2017-01-03 00:00:00,4,6,6,12,2,7 2017-01-04 00:00:00,6,7,3,6,4,8 2017-01-05 00:00:00,6,8,8,3,5,3 2017-01-06 00:00:00,54,3,3,6,3,5 2017-01-07 00:00:00,3,4,6,3,6,5 2017-01-08 00:00:00,3,6,6,3,6,4 2017-01-09 00:00:00,2,2,8,7,5,2 2017-01-10 00:00:00,4,3,2,4,5,5 2017-01-11 00:00:00,12,10,10,3,1,6 2017-01-12 00:00:00,9,7,7,3,4,1 

所以,我的第一个问题是哪个更好的select来处理这些文件 – xlsx或csv?

接下来,我只想读取前两行作为列标题。 所以我可以理解在哪个性别中有多less男性和女性可用。

预期产出:

 0 date Gender A_Male Gender A_Female Gender B_Male Gender B_Female Gender C_Male Gender D_Female 1 2017-01-01 00:00:00 2 3 3 2 3 3 2 2017-01-02 00:00:00 5 7 7 42 3 5 3 2017-01-03 00:00:00 4 6 6 12 2 7 4 2017-01-04 00:00:00 6 7 3 6 4 8 5 2017-01-05 00:00:00 6 8 8 3 5 3 6 2017-01-06 00:00:00 54 3 3 6 3 5 7 2017-01-07 00:00:00 3 4 6 3 6 5 8 2017-01-08 00:00:00 3 6 6 3 6 4 9 2017-01-09 00:00:00 2 2 8 7 5 2 10 2017-01-10 00:00:00 4 3 2 4 5 5 11 2017-01-11 00:00:00 12 10 10 3 1 6 12 2017-01-12 00:00:00 9 7 7 3 4 1 

咱们试试吧:

 df = pd.read_excel('Untitled 2.xlsx', header=[0,1]) df.columns = df.columns.map('_'.join) df.rename_axis('Date').reset_index() 

输出:

  Date Gender A_Male Gender A_Female Gender B_Male Gender B_Female \ 0 2017-01-01 2 3 3 2 1 2017-01-02 5 7 7 42 2 2017-01-03 4 6 6 12 3 2017-01-04 6 7 3 6 4 2017-01-05 6 8 8 3 5 2017-01-06 54 3 3 6 6 2017-01-07 3 4 6 3 7 2017-01-08 3 6 6 3 8 2017-01-09 2 2 8 7 9 2017-01-10 4 3 2 4 10 2017-01-11 12 10 10 3 11 2017-01-12 9 7 7 3 Gender C_Male Gender D_Female 0 3 3 1 3 5 2 2 7 3 4 8 4 5 3 5 3 5 6 6 5 7 6 4 8 5 2 9 5 5 10 1 6 11 4 1