忽略CSV上的重复行

我正在尝试读取一个CSV文件，并将其中的行写入另一个CSV文件。我的input文件有重复的行。在输出中，我只想要单行。从我的示例脚本中，您可以看到我创build了一个名为“读者”的列表。该列表获得了inputcsv的所有行。然后在for循环中，我使用writer.writerow（读者[1] + ….），它基本上读取标题后面的第一行。但问题是这第一行是重复的。我如何调整我的脚本，使其只执行一次？

for path in glob.glob("out.csv"): if path == "out1.csv": continue with open(path) as fh: readers = list(csv.reader(fh)) for row in readers: if row[8] == 'READ' and row[10] == '1110': writer.writerow(readers[1] + [] + [row[2]]) elif row[8] == 'READ' and row[10] == '1011': writer.writerow(readers[1] + [] + [" "] + [" "] + [" "] + [row[2]]) elif row[8] == 'READ' and row[10] != ('1101', '0111'): writer.writerow(readers[1] + [] + [" "] + [row[2]])

示例input

  ID No. Name Value RESULTS 28 Jason 56789 Fail 28 Jason 56789 Fail 28 Jason 56789 Fail 28 Jason 56789 Fail

你可以使用pandas包。这将是这样的：

 import pandas as pd # Read the file (considering header by default) and save in variable: table = pd.read_csv() # Drop the duplicates: clean_table = table.drop_duplicates() # Save clean data: clean_table.to_csv("data_without_duplicates.csv")

你可以在这里查看这里的参考资料

您可以使用设置types来删除重复项

readers_unique = list(set(readers))

虽然上面的答案基本上是正确的，使用pandas这对我来说似乎是矫枉过正。只需使用列表中已经在处理中看到的ID列值（假设ID列获得其名称，否则您必须使用组合键）。然后检查你是否已经看到这个值和“presto”：

 ID_COL = 1 id_seen = [] for path in glob.glob("out.csv"): if path == "out1.csv": continue with open(path) as fh: for row in csv.reader(fh): if row[ID_COL] not in id_seen: id_seen.append(row[ID_COL]) # write out whatever column you have to writer.writerow(readers[1] + [] + [row[2]])

忽略CSV上的重复行

python – 阅读文本，Excel，CSV文件到MS SQL服务器

csv模块自动写入不需要的回车

将pipe道导出到一个csv文件

保存Excel到CSVpipe道分隔（XlFileFormat枚举）

将.csv文件导入到SSIS

使用ruby将一行string分隔成不同的列

导出的CSV格式问题

使用PowerShell从.csv中的数据创build数据透视表

Excel VBA按照一定顺序分割CSV文件

在CSV文件中查找格式不正确的电子邮件地址