使用Pandas在Python中复制Excel的IndexMatch

我有一个Excel电子表格，我经常更新（每天2-3次）。此更新需要运行索引匹配来从另一个电子表格中的表中提取值，并将其写入第一个列中的列。值覆盖旧的，而不是创build一个新的列。

我想使用pandas（和xlwings将数据写入电子表格，但我没有与该部分的问题）自动化此过程。第一步是复制excel的INDEXMATCH（）和pandas。总的来说，该function应该：

采用的参数是要编入索引的列的string标题，要写入的列以及包含用于匹配读写列的值的列
迭代写入列; 在每次迭代中，在读取列中search对应的匹配列值与写入列的匹配列值匹配的值
如果没有匹配值，则将NaN或“＃N / A”写入dataframe（重要的是区分0和不匹配）

我期望在pandas中有一个本地的vlookup / indexmatchfunction，但我能find的唯一的东西是关于连接或合并数据框，这不是我想要做的 – 我想覆盖数据框中的各个值，并以任意的索引顺序进行。

我已经设法使用一个非常丑陋的特定于脚本的函数来工作，但是我认为尝试将函数推广到其他用途将是有用的。经过一些清理和重写，我有以下几点：

##Index Match in Python with pandas #Remember that dataframes start at 0, excel starts at 1 #This only works if both DFs have the same indices (integers, strings, whatever) import numpy as np import pandas as pd #sample dataframes d = {'Match Column' : [0.,1.,2.,3.,4.,7.,'string'], 'Read Column' : ['zero','one','two','three','four','seven','string']} dfRead = pd.DataFrame(d) d2 = {'Match Column' : [0.,1.,2.,3.,4.,5.,6.,7.,'8'], 'Write Column' : [0,0,0,0,0,0,0,0,'0']} dfWrite = pd.DataFrame(d2) #test arguments ReadColumn = 'Read Column' WriteColumn = 'Write Column' ReadMatchColumn = 'Match Column' WriteMatchColumn = 'Match Column' def indexmatch(dfRead, dfWrite, ReadColumn, WriteColumn, ReadMatchColumn, WriteMatchColumn, skiprows=0): #convert the string inputs to a column number for each dataframe RCNum = np.where(dfRead.columns == ReadColumn)[0][0] WCNum = np.where(dfWrite.columns == WriteColumn)[0][0] RMCNum = np.where(dfRead.columns == ReadMatchColumn)[0][0] WMCNum = np.where(dfWrite.columns == WriteMatchColumn)[0][0] for i in range(skiprows,len(dfWrite.index),1): match = dfWrite.loc[dfWrite.index[i]][WMCNum] #the value we're using to match the columns try: matchind = dfRead.index[np.where(dfRead[ReadMatchColumn] == match)[0][0]] value = dfRead.fillna('#N/A').loc[matchind][RCNum] #replaces DF NaN values with excel's #N/A, optional method dfWrite.set_value(dfWrite.index[i],WriteColumn,value) except KeyError: dfWrite.set_value(dfWrite.index[i],WriteColumn,np.nan) #if there is no match, write NaN to the 'cell' except IndexError: dfWrite.set_value(dfWrite.index[i],WriteColumn,np.nan)

这是有效的，但并不美观，当你想要将一个列与另一个数据框的索引进行匹配时（例如，将数据框与数据透视表数据框相匹配），这不起作用。

有没有一个更强大和简洁的方法来做到这一点？

按要求，预期投入和产出：

 In [2]: dfRead Out[2]: Match Column Read Column 0 0 zero 1 1 one 2 2 two 3 3 three 4 4 four 5 7 seven 6 string string In [3]: dfWrite Out[3]: Match Column Write Column 0 0 0 1 1 0 2 2 0 3 3 0 4 4 0 5 5 0 6 6 0 7 7 0 8 8 0 In [4]: indexmatch(dfRead, dfWrite, 'Read Column', 'Write Column', 'Match Column', 'Match Column') In [5]: dfWrite Out[7]: Match Column Write Column 0 0 zero 1 1 one 2 2 two 3 3 three 4 4 four 5 5 NaN 6 6 NaN 7 7 seven 8 8 NaN

pd.Series.map会把一个Series作为参数，如果用一个索引作为关键字来input一个字典，就会这样处理它。

在这里应用，看起来像

 dfWrite['Write Column'] = dfWrite['Match Column'].map(dfRead.set_index('Match Column')['Read Column']) dfWrite Out[409]: Match Column Write Column 0 0 zero 1 1 one 2 2 two 3 3 three 4 4 four 5 5 NaN 6 6 NaN 7 7 seven 8 8 NaN

给相同的输出

 indexmatch(dfRead, dfWrite, 'Read Column', 'Write Column', 'Match Column', 'Match Column') dfWrite Out[413]: Match Column Write Column 0 0 zero 1 1 one 2 2 two 3 3 three 4 4 four 5 5 NaN 6 6 NaN 7 7 seven 8 8 NaN

要匹配dfRead的索引，请跳过.set_index(...)步骤。要匹配dfWrite的索引，请将dfWrite['Match Column'].map dfWrite.index.to_series().map dfWrite['Match Column'].map为dfWrite.index.to_series().map

您也可以使用mergefunction：

 dfWrite = pd.merge(left=dfWrite.ix[:,['Match Column']], right=dfRead, on='Match Column', how='left') dfWrite.rename(columns={'Read Column':'Write Column'}, inplace=True)

使用Pandas在Python中复制Excel的IndexMatch

VBA – 打印空单元格

VBA函数告诉机器不要显示某些variables

如何计算回复条件的单元格

通过VBA与Chromeparsing网页

iPhone Excel一代

如何做几个checkbox相同的代码来设置checkbox值？

节点xlsx模块获取excel文件的标题

Excel：如果满足条件，则对数组进行求和的公式

下标超出范围重命名工作表后出现错误

更改数据透视表上的页面filter