drop_duplicates不工作在pandas?

我的代码的目的是导入2 Excel文件,比较它们,并打印出一个新的Excel文件的差异。

但是,在连接所有数据并使用drop_duplicates函数后,代码将被控制台接受。 但是,当打印到新的excel文件,重复仍然在一天之内。

我错过了什么吗? 是drop_duplicatesfunction的东西?

我的代码如下:

 import datetime import xlrd import pandas as pd #identify excel file paths filepath = r"excel filepath" filepath2 = r"excel filepath2" #read relevant columns from the excel files df1 = pd.read_excel(filepath, sheetname="Sheet1", parse_cols= "B, D, G, O") df2 = pd.read_excel(filepath2, sheetname="Sheet1", parse_cols= "B, D, F, J") #merge the columns from both excel files into one column each respectively df4 = df1["Exchange Code"] + df1["Product Type"] + df1["Product Description"] + df1["Quantity"].apply(str) df5 = df2["Exchange"] + df2["Product Type"] + df2["Product Description"] + df2["Quantity"].apply(str) #concatenate both columns from each excel file, to make one big column containing all the data df = pd.concat([df4, df5]) #remove all whitespace from each row of the column of data df=df.str.strip() df=["".join(x.split()) for x in df] #convert the data to a dataframe from a series df = pd.DataFrame({'Value': df}) #remove any duplicates df.drop_duplicates(subset=None, keep="first", inplace=False) #print to the console just as a visual aid print(df) #print the erroneous entries to an excel file df.to_excel("Comparison19.xls") 

你有inplace=False所以你不会修改df 。 你也要

  df.drop_duplicates(subset=None, keep="first", inplace=True) 

要么

  df = df.drop_duplicates(subset=None, keep="first", inplace=False) 

使用inplace=False告诉大pandas返回一个新的数据框,并且丢弃重复的数据,所以你需要把它分配给df

 df = df.drop_duplicates(subset=None, keep="first", inplace=False) 

inplace=True告诉pandas在当前数据框中删除重复项

 df.drop_duplicates(subset=None, keep="first", inplace=True)