比较2 excel文件，保留1张固定的1张，然后用python与另一个同一列的文件进行比较

我们有2个excel文件，一个有7.5k条logging，另外有7k条logging。我们需要通过保持一个固定的特定列与一张纸进行比较，并与另一张纸进行比较。

例如sheet1：

**Emp_ID| Name| Phone| Address** ------------------------------------- 1 | A | 123 | ABC ------------------------------------- 2 | B | 456 | CBD ------------------------------------- 3 | C | 789 | S

对于示例表2：

 **Emp_ID| Name| Phone| Address** ------------------------------------- 1 | A | 123 | ABC ------------------------------------- 3 | C | 789 | S

在执行python脚本时，应该以Emp_ID和Emp_ID = 2为基础进行Python比较，并将Emp_ID作为parameter passing。我正在尝试使用XLRD模块，但它只比较单元格而不是冻结一列，然后将该行与其他Excel文件进行比较。

 def compareexcel(oldSheet, newSheet): rowb2 = xlrd.open_workbook(oldSheet) rowb1 = xlrd.open_workbook(newSheet) sheet1 = rowb1.sheet_by_index(0) sheet2 = rowb2.sheet_by_index(0) for rownum in range(max(sheet1.nrows, sheet2.nrows)): if rownum < sheet1.nrows: row_rb1 = sheet1.row_values(rownum) row_rb2 = sheet2.row_values(rownum) for colnum, (c1, c2) in enumerate(izip_longest(row_rb1, row_rb2)): if c1 != c2: print "Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2)

我已经写了一个函数来search一个列值到另一个表中，并在比较函数的基础上进行比较

 def search(sheet2 , s): for row in range(sheet2.nrows):`enter code here` if s == sheet2.cell(row,0).value: return (row,0) return (9,9) def compare(oldPerPaxSheet,newPerPaxSheet): rb1 = xlrd.open_workbook(oldPerPaxSheet) rb2 = xlrd.open_workbook(newPerPaxSheet) sheet1 = rb1.sheet_by_index(0) sheet2 = rb2.sheet_by_index(0) for rownum in range(max(self.sheet1.nrows, self.sheet2.nrows)): if rownum < sheet1.nrows: row_rb1 = sheet1.row_values(rownum) print ("row_rb1 : "), row_rb1 search_str = sheet1.cell(rownum,0).value r,c = search(sheet2,search_str) if (c != 9): row_rb2 = sheet2.row_values(r) for colnum, (c1, c2) in enumerate(izip_longest(row_rb1, row_rb2)): if c1 != c2: print "Row {} Col {} - {} != {}".format(rownum+1, colnum+1, c1, c2) else: print ("ROw does not exists in the other sheet") pass else: print ("Row {} missing").format(rownum+1)

你可以很容易地使用pandas.read_excel 。

我将使用Emp_ID作为索引创build2个DataFrame

 import pandas as pd sheets = pd.read_excel(excel_filename, sheetname=[old_sheet, new_sheet], index_col=0) sheet1 = sheets[old_sheet] sheet2 = sheets[new_sheet]

我添加了一些行，有更明确的分歧

工作表Sheet1

  Name Phone Address Emp_ID 1 A 123 ABC 2 B 456 CBD 3 C 789 S 5 A 123 ABC

Sheet2中

  Name Phone Address Emp_ID 1 A 123 ABC 3 C 789 S 4 D 12 A 5 E 123 ABC

那么计算缺less的Emp_ID就变得非常简单了

 missing_in_1 = set(sheet2.index) - set(sheet1.index) missing_in_2 = set(sheet1.index) - set(sheet2.index)

missing_in_1，missing_in_2

 ({4}, {2})

所以sheet1没有在sheet2中的Emp_ID4，而sheet2没有按照预期的那样设置2

然后为了寻找差异，我们在两张纸上进行内部连接

 combined = pd.merge(sheet1, sheet2, left_index=True, right_index=True, suffixes=('_1', '_2'))

结合

  Name_1 Phone_1 Address_1 Name_2 Phone_2 Address_2 Emp_ID 1 A 123 ABC A 123 ABC 3 C 789 SC 789 S 5 A 123 ABC E 123 ABC

并遍历sheet1的列以查找差异并将其保存在dict

 differences = {} for column in sheet1.columns: diff = combined[column+'_1'] != combined[column+'_2'] if diff.any(): differences[column] = list(combined[diff].index)

分歧

 {'Name': [5]}

如果你想要整个差异列表，你可以将最后一行改为differences[column] = combined[diff]

分歧

 {'Name': Name_1 Phone_1 Address_1 Name_2 Phone_2 Address_2 Emp_ID 5 A 123 ABC E 123 ABC}

比较2 excel文件，保留1张固定的1张，然后用python与另一个同一列的文件进行比较

在Excel中分割大写单词

Excel – 检索单元格名称和范围名称

VBA代码将PowerPoint文件转换为Window Media播放器文件

使用Range.End

比较列中的行，如果find匹配，则使用vba比较列的内容

VBA – 从命名范围创build数组

我不断收到System.Runtime.InteropServices.COMException（0x80028018）：旧的格式或无效的types库。错误

VBA – 用户窗体与无限计算循环中的多个文本框

筛选并计算不同的值

我如何强制python（使用win32com）创build一个新的Excel实例？

比较2 excel文件，保留1张固定的1张，然后用python与另一个同一列的文件进行比较

在Excel中分割大写单词

Excel – 检索单元格名称和范围名称

VBA代码将PowerPoint文件转换为Window Media播放器文件

使用Range.End

比较列中的行，如果find匹配，则使用vba比较列的内容

VBA – 从命名范围创build数组

我不断收到System.Runtime.InteropServices.COMException（0x80028018）：旧的格式或无效的types库。 错误

VBA – 用户窗体与无限计算循环中的多个文本框

筛选并计算不同的值

我如何强制python（使用win32com）创build一个新的Excel实例？

我不断收到System.Runtime.InteropServices.COMException（0x80028018）：旧的格式或无效的types库。错误