将excel或csv文件转换为pandas多级数据框

我已经给了一个相当大的Excel文件（5k行），也作为一个CSV，我想成为一个pandas多级DataFame。该文件的结构如下所示：

SampleID OtherInfo Measurements Error Notes sample1 stuff more stuff 36 6 26 7 37 8 sample2 newstuff lots of stuff 25 6 27 7

测量次数是可变的（有时为零）。在任何信息之间没有完整的空白行，并且“测量”和“错误”列在具有其他（string）数据的行上是空的; 这可能会使parsing（？）更困难。有没有简单的方法来自动化这个转换？我最初的想法是首先用Pythonparsing文件，然后在循环中将数据填充到DataFrame插槽中，但我不知道如何实现它，或者甚至是最佳的操作过程。

提前致谢！

看起来像你的文件有固定宽度的列，可以使用read_fwf（）。

 In [145]: data = """\ SampleID OtherInfo Measurements Error Notes sample1 stuff more stuff 36 6 26 7 37 8 sample2 newstuff lots of stuff 25 6 27 7 """ In [146]: df = pandas.read_fwf(StringIO(data), widths=[12, 13, 14, 9, 15])

好的，现在我们有了数据，只是一点额外的工作，你有一个框架，你可以使用set_index（）创build一个MultiLevel索引。

 In [147]: df[['Measurements', 'Error']] = df[['Measurements', 'Error']].shift(-1) In [148]: df[['SampleID', 'OtherInfo', 'Notes']] = df[['SampleID', 'OtherInfo', 'Notes']].fillna() In [150]: df = df.dropna() In [151]: df Out[151]: SampleID OtherInfo Measurements Error Notes 0 sample1 stuff 36 6 more stuff 1 sample1 stuff 26 7 more stuff 2 sample1 stuff 37 8 more stuff 4 sample2 newstuff 25 6 lots of stuff 5 sample2 newstuff 27 7 lots of stuff

这将至less将其清理以进行额外的处理。

 import csv reader = csv.Reader(open(<csv_file_name>) data = [] keys = reader.next() for row in reader(): r = dict(zip(keys,row)) if not r['measurements'] or not r['Error']: continue for key in ['SampleID', 'OtherInfo', 'Notes']: if not r[key]: index = -1 while True: if data[index][key]: r[key] = data[index][key] break index -= 1 data.append(r)

将excel或csv文件转换为pandas多级数据框

在FTP站点上合并CSV文件

python将xlsx保存为csvdate保存为date时间

Python脚本读取一个目录中的多个excel文件，并将其转换为另一个目录中的.csv文件

用于数据集的Excel VBAmacros

Excel 2011 Mac用于dynamic报告macros（VBA）的相对引用

apache metamodel – 在文件中的非string列上

如何折叠多列pandas

java excel到csv文件转换

使用R在数组元素Q 中将variables名“QW1I5K20”存储起来

Excel VBA将.csv转换为Excel文件