如何设置一个特定的列到inttypes的pandas

我有这个脚本来写一些CSV文件到一个文件夹的Excel中：

from pandas.io.excel import ExcelWriter import pandas import os path = 'data/' ordered_list = sorted(os.listdir(path), key = lambda x: int(x.split(".")[0])) with ExcelWriter('my_excel.xlsx') as ew: for csv_file in ordered_list: pandas.read_csv(path + csv_file).to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

现在我的问题是，所有的列（比如说G：H）都是string格式（例如'400或'10），之前我认为它们是string，因为csv将它们转换为string，我需要它们诠释，我怎么可以使G：H INT？我用python 3，谢谢！

PS（这是一个CSV样本）：

 ANPIS,,,,,,, AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,, ,,,,,,, Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,, Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,, 1. ALOCATIA DE STAT PENTRU COPII,,,,,,, Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente 1,2,3,4,5,6,7=5-6,8 1,CAZACU MIHAI,133121140,Aug 2016,84,84 2,NICOARA PETRU,143152638,"Aug 2014, Sept 2014",126,84 3,CERNEA NICOLAE DAN,143354723,Dec 2015,84,84 4,LUDWIG PETRU,144091376,Nov 2014,42,42 5,POPA REMUS,1440915363,Iun 2015,84,84 6,BOGDAN MARCEL,144154726,"Feb 2015, Apr 2015, Sept 2015, Oct 2015, Feb 2016",336,336 7,HENDRE AUGUSTIN,145054704,Feb 2015,42,42 8,COJOC VASILE,147050307,"Sept 2014, Oct 2014",84,84 9,RADULESCU VICTOR,147352628,"Sept 2014, Oct 2014, Nov 2014, Dec 2014",168,168 10,RADAU DUMITRU,148054764,"Feb 2017, Mar 2017",168,168 11,COVACIU PETRU,148054802,Iun 2016,84,84 12,BOT IOAN,14808634,"Aug 2014, Sept 2014, Oct 2014, Nov 2014",168,168

^^头是这个：

 ANPIS,,,,,,, AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,, ,,,,,,, Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,, Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,, 1. ALOCATIA DE STAT PENTRU COPII,,,,,,, Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente 1,2,3,4,5,6,7=5-6,8

您可以读取每个文件两次 – 第一个头只有参数nrows ，然后身体与skiprows 。

那么也需要写两遍。

解决方法有点复杂，因为大pandas错误的parsing数据 – 不支持8级的MulttiIndex。如果没有设置头文件，头文件中的数据会与正文结合在一起，输出结果很乱。

 with ExcelWriter('my_excel.xlsx') as ew: for csv_file in ordered_list: df1 = pandas.read_csv(path + csv_file, nrows=8, header=None) df2 = pandas.read_csv(path + csv_file, skiprows=8, header=None) df1.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', header=False) row = len(df1.index) df2.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', startrow=row , startcol=0, header=False)

使用“ apply删除' by strip并将其转换为int ：

 cols = ['G','H'] with ExcelWriter('my_excel.xlsx') as ew: for csv_file in ordered_list: df = pandas.read_csv(path + csv_file) df[cols] = df[cols].astype(str).apply(lambda x: x.str.strip("'")).astype(int) print (df.head()) df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

另一个解决scheme是使用自定义函数的参数converters

 cols = ['G','H'] def converter(x): return int(x.strip("'")) #define each column converters={x:converter for x in cols} with ExcelWriter('my_excel.xlsx') as ew: for csv_file in ordered_list: df = pandas.read_csv(path + csv_file, converters=converters) print (df.head()) df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')

如何设置一个特定的列到inttypes的pandas

用AngularJS和EPPlus生成CSV的问题

将DataGrid导出为CSV或Excel

是否有任何API使用Java文件并将其转换为CSV文件？

Excel – macros，复制到新的工作簿不工作的8行的倍数

使用python错误信息将excel文件转换为csv

是否有使用opencsv API或Apache Poi Api的限制？

如何以特定格式将数据从数据透视表导出为.csv？

分隔符变通办法CSV C＃

如何在ColdFusion中replace.xlsx文件中的逗号？

将Excel分割成独立的CSV文件 – VBAmacros