drop_duplicates不工作在pandas？

我的代码的目的是导入2 Excel文件，比较它们，并打印出一个新的Excel文件的差异。

但是，在连接所有数据并使用drop_duplicates函数后，代码将被控制台接受。但是，当打印到新的excel文件，重复仍然在一天之内。

我错过了什么吗？是drop_duplicatesfunction的东西？

我的代码如下：

 import datetime import xlrd import pandas as pd #identify excel file paths filepath = r"excel filepath" filepath2 = r"excel filepath2" #read relevant columns from the excel files df1 = pd.read_excel(filepath, sheetname="Sheet1", parse_cols= "B, D, G, O") df2 = pd.read_excel(filepath2, sheetname="Sheet1", parse_cols= "B, D, F, J") #merge the columns from both excel files into one column each respectively df4 = df1["Exchange Code"] + df1["Product Type"] + df1["Product Description"] + df1["Quantity"].apply(str) df5 = df2["Exchange"] + df2["Product Type"] + df2["Product Description"] + df2["Quantity"].apply(str) #concatenate both columns from each excel file, to make one big column containing all the data df = pd.concat([df4, df5]) #remove all whitespace from each row of the column of data df=df.str.strip() df=["".join(x.split()) for x in df] #convert the data to a dataframe from a series df = pd.DataFrame({'Value': df}) #remove any duplicates df.drop_duplicates(subset=None, keep="first", inplace=False) #print to the console just as a visual aid print(df) #print the erroneous entries to an excel file df.to_excel("Comparison19.xls")

你有inplace=False所以你不会修改df 。你也要

  df.drop_duplicates(subset=None, keep="first", inplace=True)

要么

  df = df.drop_duplicates(subset=None, keep="first", inplace=False)

使用inplace=False告诉大pandas返回一个新的数据框，并且丢弃重复的数据，所以你需要把它分配给df ：

 df = df.drop_duplicates(subset=None, keep="first", inplace=False)

或inplace=True告诉pandas在当前数据框中删除重复项

 df.drop_duplicates(subset=None, keep="first", inplace=True)

drop_duplicates不工作在pandas？

在Excel中对sorting数据进行sorting，同时将重复项列为不同的整数

Excel删除重复的string，通过与另一个单元格比较？

Excel – 查找重复

确定一组重复并find其最大/最高编号

实时复制唯一值

将多个数字或文本从一列转换为一行而不重复

在数组中重复N次

MS Excel查找和匹配重复的值并返回重复的唯一值

Excel VBA – 不同颜色的颜色重复范围

访问TransferSpreadsheet Excel – 防止重复？