将CSV文件转换为Excel后,整数存储为string – 如何将其转换回来?

在这个项目中,我已经将一个csv文件转换为一个xls文件和一个txt文件到一个xls文件。 目标是比较两个xls文件的差异,并打印出第三个excel文件的差异。

但是,当打印差异时,它们包含任何整数大于999的条目,因为从我的转换后的csv文件中的任何整数都被视为string而不是整数。 因此,由于转换后的csv excel文件中的逗号,它会将诸如1200(在我的转换后的xls文件中)的值与1200(在我的转换后的txt文件中)的值不同。

我的问题是:有没有办法将string解释的整数转换回被解释为整数? 否则,有没有办法从我的xls文件中删除所有的逗号? 我尝试了通常的dataframe.replace方法,它是无效的。

以下是我的代码:

#import required libraries import datetime import xlrd import pandas as pd #define the time_handle function to name the outputted excel files time_handle = datetime.datetime.now().strftime("%Y%m%d_%H%M") #identify XM1 file paths (for both csv origin and excel destination) XM1_csv = r"filepath" XM2_excel = r"filepath" + time_handle + ".xlsx" #identify XM2 file paths (for both txt origin and excel destination) XM2_txt = r"filepath" XM2_excel = r"filepath" + time_handle + ".xlsx" #remove commas from XM1 excel - failed attempts #XM1_excel = [col.replace(',', '') for col in XM1_excel] #XM1_excel = XM1_excel.replace(",", "") #for line in XM1_excel: #XM1_excel.write(line.replace(",", "")) #remove commas from XM1 CSV - failed attempts #XM1_csv = [col.replace(',', '') for col in XM1_csv] #XM1_csv = XM1_csv.replace(",", "") #for line in XM1_csv: #XM1_excel.write(line.replace(",", "")) #convert the csv XM1 file to an excel file, in the same folder pd.read_csv(XM1_csv).to_excel(XM1_excel) #convert the txt XM2 file to an excel file in the same folder pd.read_csv(XM2_txt, sep="|").to_excel(XM2_excel) #confirm XM1 filepath filepath_XM1 = XM1_excel #confirm XM2 filepath filepath_XM2 = XM2_excel #read relevant columns from the excel files df1 = pd.read_excel(filepath_XM2, sheetname="Sheet1", parse_cols= "H, J, M, U") df2 = pd.read_excel(filepath_XM1, sheetname="Sheet1", parse_cols= "C, E, G, K") #remove all commas from XM1 - failed attempts #df2 = [col.replace(',', '') for col in df2] #df2 = df2.replace(",", "") #for line in df2: #df2.write(line.replace(",", "")) #merge the columns from both excel files into one column each respectively df4 = df1["Exchange Code"] + df1["Product Type"] + df1["Product Description"] + df1["Quantity"].apply(str) df5 = df2["Exchange"] + df2["Product Type"] + df2["Product Description"] + df2["Quantity"].apply(str) #concatenate both columns from each excel file, to make one big column containing all the data df = pd.concat([df4, df5]) #remove all whitespace from each row of the column of data df=df.str.strip() df=["".join(x.split()) for x in df] #convert the data to a dataframe from a series df = pd.DataFrame({'Value': df}) #remove any duplicates df.drop_duplicates(subset=None, keep=False, inplace=True) #print to the console just as a visual aid print(df) #output_path = r"filepath" #print the erroneous entries to an excel file df.to_excel("XM1_XM2Comparison" + time_handle + ".xls") 

另外,我意识到与df1和df2有关的XM1和XM2文件名有点混乱,但我只是重命名了我的文件。 这对于文件以及它们在代码中的位置是有意义的!

谢谢

您可以在dataframe的读取端尝试一个名为converters的参数,您可以在其中指定数据types。 例:

 df= pd.read_excel(file, sheetname=YOUR_SHEET_HERE, converters={'FIELD_NAME': str}) 

converters都在read_csvread_excel

我实际上已经解决了这个问题,只是为了将来的参考。 当使用pd.read_csv读取csv时,我添加了千位方法,所以它看起来像这样:

 pd.read_csv(XM1, thousands = ",").to_excel(XM1_excel)