在excel中过滤一列

我正在尝试通过过滤值来组织一个列。 换句话说,有成千上万的重复名称,我想从每个“组”中只取一个名称,并将其复制到另一个列中。

所以列A是当前的情况,列是我想要得到的结果:

Column A Column B AB Mark Sociedad Ltda AB Mark Sociedad Ltda AB Mark Sociedad Ltda Acosta Acosta Manuel AB Mark Sociedad Ltda ALBAGLI, ZALIASNIK AB Mark Sociedad Ltda Acosta Acosta Manuel Acosta Acosta Manuel Acosta Acosta Manuel ALBAGLI, ZALIASNIK ALBAGLI, ZALIASNIK ALBAGLI, ZALIASNIK 

最后这是我正在尝试使用的脚本:

 import openpyxl from openpyxl import load_workbook import os os.chdir('path') workbook = openpyxl.load_workbook('abc.xlsx') page_i = workbook.get_sheet_names() sheet = workbook.get_sheet_by_name('Sheet1') for a in range(1, 10): representativex = sheet['A' + str(a)].value tuple(sheet['A1':'A10']) for row in sheet['A1':'A10']: if representativex in row: continue else: sheet['B' + str(a)].value sheet['B' + str(a)] = representativex workbook.save('abc.xlsx') 

不幸的是它不工作。

我并没有真正使用Python,但这是一个粗略的方法,我发现相对较快。

 import openpyxl wb = openpyxl.load_workbook('test.xlsx') ws1 = wb.active 

样本数据

 names = [] for row in ws1.columns[0]: names.append(row.value) names = sorted(list(set(names))) start = 1 for name in names: ws1.cell(row = start, column=2).value = name start += 1 wb.save('test.xlsx') 

示例输出数据

编辑 :显然,较新的openpyxl升级需要稍作修改

改变这个:

 for row in ws1.columns[0]: names.append(row.value) 

对此:

 for row in ws1.iter_cols(max_col = 1, min_row=1): for cell in row: names.append(cell.value) 

以防万一你的专栏不同

 iter_cols(min_col=None, max_col=None, min_row=None, max_row=None)[source] Returns all cells in the worksheet from the first row as columns. If no boundaries are passed in the cells will start at A1. If no cells are in the worksheet an empty tuple will be returned. Parameters: min_col (int) – smallest column index (1-based index) min_row (int) – smallest row index (1-based index) max_col (int) – largest column index (1-based index) max_row (int) – smallest row index (1-based index)