比较来自不同excel文件的列，并在每个输出的开头添加一列

我想开始说，我不是Excel专家，所以我需要一些帮助。

假设我有3个excel文件： main.xlsx ， 1.xlsx和2.xlsx 。在他们所有的我有一个名为Serial Numbers的列。我不得不：

查找1.xlsx和2.xlsx所有序列号 ，并validation它们是否在main.xlsx 。

如果find一个序列号：

在main.xlsx的最后一列上，在同一行上find的编号为OK + name_of_the_file_in which_it_was_found的序列号 OK + name_of_the_file_in which_it_was_found 。否则写NOK 。同时，如果find序列号，则在最后一列写入1.xlsx和2.xlsx ok或nok 。

提及： serial number可以在1.xlsx和2.xlsx上的不同列上

例：

main.xlsx

 name date serial number phone status ab abcd c <-- ok,2.xlsx bc 1234 d <-- ok,1.xlsx cd 3456 e <-- ok,1.xlsx de 4567 f <-- NOK efg <-- skip,don't write anything to status column

1.xlsx

 name date serial number phone status ab 1234 c <-- OK (because is find in main) bc lala d <-- NOK (because not find in main) cd 3456 e <-- OK (because find main) de jjjj f <-- NOK (because not find in main) efg <-- skip,don't write anything to status column

2.xlsx

 name date serial number phone status abc <-- skip,don't write anything to status column bc abcd d <-- OK (because find main) cd 4533 e <-- NOK (because not find in main) de jjjj f <-- NOK (because not find in main) efg <-- skip,don't write anything to status column

现在，我尝试在Python中这样做，但显然我不知道如何写入状态列（尝试使用dataFrames ），在serial number查找相同的行。任何帮助将非常感激。（或至less一些指导）

我的问题是没有find重复，而是跟踪行（写在正确的serial number的状态），并写入指定列（ status列）的Excel中。

我的尝试：

 import pandas as pd get_main = pd.ExcelFile('main.xlsx') get_1 = pd.ExcelFile('1.xlsx') get_2 = pd.ExcelFile('2.xlsx') sheet1_from_main = get_main.parse(0) sheet1_from_1 = get_1.parse(0) sheet1_from_2 = get_2.parse(0) column_from_main = sheet1_from_main.iloc[:, 2].real column_from_main_py = [] for x in column_from_main: column_from_main_py.append(x) column_from_1 = sheet1_from_1.iloc[:, 2].real column_from_1_py = [] for y in column_from_1: column_from_1_py.append(y) column_from_2 = sheet1_from_2.iloc[:, 2].real column_2_py = [] for z in column_from_2: column_2_py.append(z)

build议编辑：

 import pandas as pd get_main = pd.read_excel('main.xls', sheetname=0) get_1 = pd.read_excel('1.xls', sheetname=0) get_2 = pd.read_excel('2.xls', sheetname=0) column_from_main = get_main.ix[:, 'Serial No.'].real column_from_main_py = column_from_main.tolist() column_from_1 = get_1.ix[:, 'SERIAL NUMBER'].real column_from_1_py = column_from_1.tolist() column_from_2 = get_2.ix[:, 'S/N'].real column_from_2_py = column_from_2.tolist() # Tried to put example data at specific column df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]}) writer = pd.ExcelWriter('first.xlsx', engine='xlsxwriter') df.to_excel(writer, sheet_name='Sheet1') workbook = writer.book worksheet = writer.sheets['Sheet1'] worksheet.set_column('M:M', None, None) writer.save()

首先你可以跳过使用excelfile和parsing通过使用pd.read_excel(filename, sheetname=0) 。

就你的列而言，尝试按名称访问列，而不是索引。而不是使用for循环来创build列表，请使用tolist方法。因此，而不是column_from_main = sheet1_from_main.iloc[:, 2].real你可以说：

 column_from_main = get_main.ix[:, 'serial number'].real column_from_main_py = column_from_main.tolist()

对其他文件也一样。这将删除序列号列索引不同的任何问题，并将运行得更快。

至于你对无法正确写入“状态”的评论，你能显示你的代码吗？我很乐意帮忙，但很高兴看到你已经做了什么。

为了检查main中的值和另外两个文件，你需要迭代你创build的列表，并检查主列表中的每个值是否在其他列表中。在那个循环中，你可以根据main中的序列号是否存在于一个，没有或两者中来分配状态值：

 get_main['status'] = '' get_1['status'] = '' get_2['status'] = '' for num in column_from_main_py: if num not in column_from_1_py and not in column_from_2_py: get_main.loc[get_main['serial number'] == num, 'status'] = 'NOK' elif num in column_from_1_py and not in column_from_2_py: get_main.loc[get_main['serial number'] == num, 'status'] = 'OK,1.xlsx' get_1.loc[get_1['serial number'] == num, 'status'] = 'OK' elif num not in column_from_1_py and in column_from_2_py: get_main.loc[get_main['serial number'] == num, 'status'] = 'OK,2.xlsx' get_2.loc[get_2['serial number'] == num, 'status'] = 'OK'

行get_main.loc是将OK或NOK值设置为状态列的位置。本质上，它find了一些条件为真的索引，然后让您更改该索引处特定列的值。一旦你经历了主要清单，那么你可以查看1和2的列表，find不在主要的序列号。同理：

 for num in column_from_1_py: if num not in column_from_main_py: get_1.loc[get_1['serial number'] == num, 'status'] = 'NOK' for num in column_from_2_py: if num not in column_from_main_py: get_2.loc[get_2['serial number'] == num, 'status'] = 'NOK'

这将设置你的NOK值，你应该很好，继续和出口数据框到Excel（或CSV，HDF，SQL等…），应该这样做。

有很多方法可以根据你想要做的工作来索引和selectpandas数据。我build议阅读文档中的索引和select数据页面，因为它对我来说是一个很好的参考。

请注意，问题中提供的input文件不是正在使用的实际input文件。获得真实的input文件后，构build了以下信息/脚本。下面的问题目前不能解决问题。

要使用以下示例，首先安装petl和openpyxl（用于您的xlsx文件）：

 pip install openpyxl pip install petl

脚本：

 import petl main = petl.fromxlsx('main.xlsx') one = petl.fromxlsx('1.xlsx', row_offset=1) two = petl.fromxlsx('2.xlsx') non_serial_rows = petl.select(main, lambda rec: rec['serial number'] is None) serial_rows = petl.select(main, lambda rec: rec['serial number'] is not None) main_join_one = petl.join(serial_rows, petl.cut(one,['serial number']), key='serial number') main_join_one_file = petl.addfield(main_join_one, 'file', 'ok, 1.xlsx') main_join_two = petl.join(serial_rows, petl.cut(two,['serial number']), key='serial number') main_join_two_file = petl.addfield(main_join_two, 'file', 'ok, 2.xlsx') stacked_joins = petl.stack(main_join_two_file, main_join_one_file) nok_rows = petl.antijoin(serial_rows, petl.cut(stacked_joins, ['serial number']), key='serial number') nok_rows = petl.addfield(nok_rows, 'file', 'NOK') output_main = petl.stack(stacked_joins, non_serial_rows, nok_rows) main_final = output_main def main_compare(table): non_serial_rows = petl.select(table, lambda rec: rec['serial number'] is None) serial_rows = petl.select(table, lambda rec: rec['serial number'] is not None) ok_rows = petl.join(serial_rows, petl.cut(main, ['serial number']), key='serial number') ok_rows = petl.addfield(ok_rows, 'file', 'OK') nok_rows = petl.antijoin(serial_rows, petl.cut(main, ['serial number']), key='serial number') nok_rows = petl.addfield(nok_rows, 'file', 'NOK') return petl.stack(ok_rows, nok_rows, non_serial_rows) one_final = main_compare(one) two_final = main_compare(two) petl.toxlsx(main_final, 'mainNew.xlsx') print petl.lookall(main_final) petl.toxlsx(one_final, '1New.xlsx') print petl.lookall(one_final) petl.toxlsx(two_final, '2New.xlsx') print petl.lookall(two_final)

输出（控制台上的文本和实际修改的xlsx文件）

比较来自不同excel文件的列，并在每个输出的开头添加一列

例：

在Sheet Excel中没有给出一个或多个必需参数的值

VBA Target.Address到两个单元格

使用ColdFusion条件格式化Excel文件

如何从当前函数VBA Excel获取单元格行

为.validation指定单元格的范围

如果两列比较结果为真，则比较下两列的对应值并报告真/假

AngularJS – button – 依赖前的调用函数或指令

如何使用VBAmacros在Excel中自动发布HTML图表？

sorting数组或工作表

使VBA DateDif像Excel DATEDIFF一样运行