如何设置一个特定的列到inttypes的pandas

我有这个脚本来写一些CSV文件到一个文件夹的Excel中:

from pandas.io.excel import ExcelWriter import pandas import os path = 'data/' ordered_list = sorted(os.listdir(path), key = lambda x: int(x.split(".")[0])) with ExcelWriter('my_excel.xlsx') as ew: for csv_file in ordered_list: pandas.read_csv(path + csv_file).to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8') 

现在我的问题是,所有的列(比如说G:H)都是string格式(例如'400或'10),之前我认为它们是string,因为csv将它们转换为string,我需要它们诠释,我怎么可以使G:H INT? 我用python 3,谢谢!

PS(这是一个CSV样本):

 ANPIS,,,,,,, AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,, ,,,,,,, Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,, Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,, 1. ALOCATIA DE STAT PENTRU COPII,,,,,,, Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente 1,2,3,4,5,6,7=5-6,8 1,CAZACU MIHAI,133121140,Aug 2016,84,84 2,NICOARA PETRU,143152638,"Aug 2014, Sept 2014",126,84 3,CERNEA NICOLAE DAN,143354723,Dec 2015,84,84 4,LUDWIG PETRU,144091376,Nov 2014,42,42 5,POPA REMUS,1440915363,Iun 2015,84,84 6,BOGDAN MARCEL,144154726,"Feb 2015, Apr 2015, Sept 2015, Oct 2015, Feb 2016",336,336 7,HENDRE AUGUSTIN,145054704,Feb 2015,42,42 8,COJOC VASILE,147050307,"Sept 2014, Oct 2014",84,84 9,RADULESCU VICTOR,147352628,"Sept 2014, Oct 2014, Nov 2014, Dec 2014",168,168 10,RADAU DUMITRU,148054764,"Feb 2017, Mar 2017",168,168 11,COVACIU PETRU,148054802,Iun 2016,84,84 12,BOT IOAN,14808634,"Aug 2014, Sept 2014, Oct 2014, Nov 2014",168,168 

^^头是这个:

 ANPIS,,,,,,, AGENTIA JUDETEANA PENTRU PLATI SI INSPECTIE SOCIALA TIMIS,,,,,,, ,,,,,,, Macheta Comparativa CREDITORI - numai pentru Beneficiile a caror Evidenta se tine si in Contabilitate si in aplicatia SAFIR,,,,,,, Situatie ANALITICA - NOMINAL la 30.06.2017,,,,,,, 1. ALOCATIA DE STAT PENTRU COPII,,,,,,, Nr. Benef,Nume Prenume,CNP,Data Constituirii,Suma Contabilitate,Suma SAFIR,Differenta Suma,Explicatii daca exista diferente 1,2,3,4,5,6,7=5-6,8 

您可以读取每个文件两次 – 第一个头只有参数nrows ,然后身体与skiprows

那么也需要写两遍。

解决方法有点复杂,因为大pandas错误的parsing数据 – 不支持8级的MulttiIndex。 如果没有设置头文件,头文件中的数据会与正文结合在一起,输出结果很乱。

 with ExcelWriter('my_excel.xlsx') as ew: for csv_file in ordered_list: df1 = pandas.read_csv(path + csv_file, nrows=8, header=None) df2 = pandas.read_csv(path + csv_file, skiprows=8, header=None) df1.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', header=False) row = len(df1.index) df2.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8', startrow=row , startcol=0, header=False) 

使用“ apply删除' by strip并将其转换为int

 cols = ['G','H'] with ExcelWriter('my_excel.xlsx') as ew: for csv_file in ordered_list: df = pandas.read_csv(path + csv_file) df[cols] = df[cols].astype(str).apply(lambda x: x.str.strip("'")).astype(int) print (df.head()) df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8') 

另一个解决scheme是使用自定义函数的参数converters

 cols = ['G','H'] def converter(x): return int(x.strip("'")) #define each column converters={x:converter for x in cols} with ExcelWriter('my_excel.xlsx') as ew: for csv_file in ordered_list: df = pandas.read_csv(path + csv_file, converters=converters) print (df.head()) df.to_excel(ew, index = False, sheet_name=csv_file[:-4], encoding='utf-8')