如何使用pandas / python来处理excel文件头

我正在使用pandas阅读https://www.whatdotheyknow.com/request/193811/response/480664/attach/3/GCSE%20IGCSE%20results%20v3.xlsx 。

保存了我的脚本

import sys import pandas as pd inputfile = sys.argv[1] xl = pd.ExcelFile(inputfile) # print xl.sheet_names df = xl.parse(xl.sheet_names[0]) print df.head() 

然而,这似乎并没有正确处理标题,因为它给

  GCSE and IGCSE1 results2,3 in selected subjects4 of pupils at the end of key stage 4 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10 0 Year: 2010/11 (Final) NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 Coverage: England NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 1. Includes International GCSE, Cambridge Inte... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4 2. Includes attempts and achievements by these... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 

所有这些都应该被视为评论。

例如,如果将电子表格加载到libreoffice中,则可以看到列标题已正确parsing,并显示在第15行中,并带有下拉菜单,可让您select所需的项目。

如何让大pandas自动检测列标题的位置和libreoffice一样?

pandas是(是?)正确地处理文件,并正是你问它(他们?)的方式。 您没有指定header值,这意味着它默认从第0行获取列名称。 单元格的前几行不是以某种基本的方式进行的注释,而仅仅是您感兴趣的单元格。

简单地告诉parse你想跳过一些行:

 >>> xl = pd.ExcelFile("GCSE IGCSE results v3.xlsx") >>> df = xl.parse(xl.sheet_names[0], skiprows=14) >>> df.columns Index([u'Local Authority Number', u'Local Authority Name', u'Local Authority Establishment Number', u'Unique Reference Number', u'School Name', u'Town', u'Number of pupils at the end of key stage 4', u'Number of pupils attempting a GCSE or an IGCSE', u'Number of students achieving 8 or more GCSE or IGCSE passes at A*-G', u'Number of students achieving 8 or more GCSE or IGCSE passes at A*-A', u'Number of students achieving 5 A*-A grades or more at GCSE or IGCSE'], dtype='object') >>> df.head() Local Authority Number Local Authority Name \ 0 201 City of london 1 201 City of london 2 202 Camden 3 202 Camden 4 202 Camden Local Authority Establishment Number Unique Reference Number \ 0 2016005 100001 1 2016007 100003 2 2024104 100049 3 2024166 100050 4 2024196 100051 School Name Town \ 0 City of London School for Girls London 1 City of London School London 2 Haverstock School London 3 Parliament Hill School London 4 Regent High School London Number of pupils at the end of key stage 4 \ 0 105 1 140 2 200 3 172 4 174 Number of pupils attempting a GCSE or an IGCSE \ 0 104 1 140 2 194 3 169 4 171 Number of students achieving 8 or more GCSE or IGCSE passes at A*-G \ 0 100 1 108 2 SUPP 3 22 4 0 Number of students achieving 8 or more GCSE or IGCSE passes at A*-A \ 0 87 1 75 2 0 3 7 4 0 Number of students achieving 5 A*-A grades or more at GCSE or IGCSE 0 100 1 123 2 0 3 34 4 SUPP [5 rows x 11 columns]