当读取excel文件时，结合每列中的前两个条目作为标题

我一直在寻找这一段时间，但仍然无法弄清楚。我很感激，如果你能给我一些帮助。

我有一个excel文件：

, John, James, Joan, , Smith, Smith, Smith, Index1, 234, 432, 324, Index2, 2987, 234, 4354,

我想把它读成一个数据框，这样“John Smith，James Smith，Joan Smith”就是我的头像。我已经尝试过了，但我的头仍然是“约翰，詹姆斯，琼”

 xl = pd.ExcelFile(myfile, header=None) row = df.apply(lambda x: str(x.iloc[0]) + str(x.iloc[1])) df.append(row,ignore_index=True) nrow = df.shape[0] df = pd.concat([df.ix[nrow:], df.ix[2:nrow-1]])

可能是手工操作更容易吗？

 >>> import itertools >>> xl = pd.ExcelFile(myfile, header=None) >>> sh = xl.book.sheet_by_index(0) >>> rows = (sh.row_values(i) for i in xrange(sh.nrows)) >>> hd = zip(*itertools.islice(rows, 2))[1:] # read first two rows >>> df = pd.DataFrame(rows) # create DataFrame from remaining rows >>> df = df.set_index(0) >>> df.columns = [' '.join(x) for x in hd] # rename columns >>> df John Smith James Smith Joan Smith 0 Index1 234 432 324 Index2 2987 234 4354

如果你愿意，你可以保持两个层次分开。例如，如果您想仅基于姓氏来过滤列，这可能会很有用。否则，其他解决scheme肯定比这个好。

通常这对我有用：

 In [103]: txt = '''John,James,Joan ...: Smith,Smith,Smith ...: 234,432,324 ...: 2987,234,4354 ...: ''' In [104]: x = pandas.read_csv(StringIO(txt), header=[0,1]) ...: x.columns = pandas.MultiIndex.from_tuples(x.columns.tolist()) ...: x ...:

但由于某种原因，这是缺less第一行：/

 In [105]: x Out[105]: John James Joan Smith Smith Smith 0 2987 234 4354

我会检查pandas邮件列表，看看是否是一个错误。

我通过将Excel文件转换为csv文件和以下方法：

 df = pd.read_csv(myfile, header=None) header = df.apply(lambda x: str(x.ix[0]) + ' ' + str(x.ix[1])) df = df[2:] df.columns = header

这是输出：

 Out[252]: John Smith James Smith Joan Smith 2 234 432 324 3 3453 2342 563

但是，当我读入pd.ExcelFile（并parsing我感兴趣的特定工作表）时，与@Paul H有类似的问题。看来Excel格式默认将第一行视为列名，并返回给我，如：

  Smith 234 Smith 432 Smith 324 3 3453 2342 563

当读取excel文件时，结合每列中的前两个条目作为标题

在excel中，in-cell公式中的+前缀的目的是什么？

从Java获取打开的Excel文件（工作簿）列表

避免在Excel中出现多个错误popup消息

Shell脚本输出文件保存在MSExcel中

VBA配方太长，要缩短的办法

适用于Excel的HTML剪贴板格式

SQL Server数据库查询与PHP

VBA“喜欢”运算符和通配符

将长Excel工作表公式转换为VBA公式

保护Excel工作簿中的代码？