用于合并两个电子表格的Python代码,与Excel中的“索引匹配匹配”function类似

我有两个单独的电子表格,并使用pandas为了将其转换成数据框架。 我正在寻找从电子表格2中提取信息并将这些信息合并到电子表格1中。我知道如何通过使用“索引匹配匹配”function来提取excel中的信息,但是我在使用Python因为它似乎不适合合并或连接方法…或者我只是错过了连接。

电子表格1将如下所示: 电子表格1

电子表格2将如下所示: 电子表格2

最终结果的头像是这样的:

区域分支Master_Product_Name Retrieved_#

  • 北,弗雷斯诺,产品1: 5,060
  • 北,弗雷斯诺,产品2: 2,429
  • 北,萨克拉门托,产品3: 405
  • 北,弗雷斯诺,产品6:10

会发生什么事是程序将插入一个新的列,并将基于索引列和行插入数据到单个数据点。 所以对于我的例子中的第一行,这是弗雷斯诺和产品1 …它将在电子表格中看两行,直到它到达产品1,然后直到find弗雷斯诺和插入5,060。

在目前的情况下,有地区栏不重要,但有能力检索基于多个标准是我能想象的。

IIUC您可以stackSpreadsheet 2创build的数据框df2 。 然后,您需要从行转换rename列 – Master Product Name

如果您需要通过“ Region ,“ Master Product Name和“ Branch列进行merge ,则必须小写两个Branch列以进行匹配。

 df1 = pd.read_csv('s1.csv', sep=";", index_col=None) print df1 Region Branch Master Product Name 0 North FRESNO Product 1 1 North FRESNO Product 2 2 North SACRAMENTO Product 2 3 North FRESNO Product 12 4 North INLAND EMPIRE Product 13 5 North FRESNO Product 4 6 North FRESNO Product 5 7 North SAN DIEGO Product 3 8 North FRESNO Product 3 9 North FRESNO Product 3 10 North LOS ANGELES Product 3 11 North FRESNO Product 6 12 North FRESNO Product 7 13 North FRESNO Product 8 14 North SAN FRANCISCO Product 8 15 North FRESNO Product 11 16 North FRESNO Product 9 17 South LOS ANGELES Product 12 18 South LOS ANGELES Product 3 19 North SAN FRANCISCO Product 1 20 North SAN FRANCISCO Product 2 21 North SAN FRANCISCO Product 12 22 North SAN FRANCISCO Product 13 23 North SAN FRANCISCO Product 4 24 North SAN FRANCISCO Product 5 25 South SAN JOSE Product 3 26 South SAN JOSE Product 3 
 df2 = pd.read_csv('s2.csv', sep=";", index_col=None) print df2 Branch Region Product 1 Product 2 Product 3 Product 4 Product 5 \ 0 Fresno North 5 060 2 429 675 540 10 1 Sacramento North 5 534 2 344 405 324 118 2 San Francisco North 2 924 1 167 111 89 32 3 San Jose North 4 039 2 019 179 143 102 4 Los Angeles South 4 453 2 273 302 298 -2 5 Inland Empire South 4 680 2 091 404 400 21 6 Orange County South 3 167 1 493 95 94 41 7 San Diego South 5 468 2 439 282 279 16 Product 6 0 10 1 117 2 32 3 102 4 76 5 41 6 67 7 188 
 df2 = df2.set_index(['Region', 'Branch']) df2 = df2.stack() df2 = df2.reset_index(name = 'Count') df2 = df2.rename(columns={'level_2':'Master Product Name'}) 
 print df2 Region Branch Master Product Name Count 0 North Fresno Product 1 5 060 1 North Fresno Product 2 2 429 2 North Fresno Product 3 675 3 North Fresno Product 4 540 4 North Fresno Product 5 10 5 North Fresno Product 6 10 6 North Sacramento Product 1 5 534 7 North Sacramento Product 2 2 344 8 North Sacramento Product 3 405 9 North Sacramento Product 4 324 10 North Sacramento Product 5 118 11 North Sacramento Product 6 117 12 North San Francisco Product 1 2 924 13 North San Francisco Product 2 1 167 14 North San Francisco Product 3 111 15 North San Francisco Product 4 89 16 North San Francisco Product 5 32 17 North San Francisco Product 6 32 18 North San Jose Product 1 4 039 19 North San Jose Product 2 2 019 20 North San Jose Product 3 179 21 North San Jose Product 4 143 22 North San Jose Product 5 102 23 North San Jose Product 6 102 24 South Los Angeles Product 1 4 453 25 South Los Angeles Product 2 2 273 26 South Los Angeles Product 3 302 27 South Los Angeles Product 4 298 28 South Los Angeles Product 5 -2 29 South Los Angeles Product 6 76 30 South Inland Empire Product 1 4 680 31 South Inland Empire Product 2 2 091 32 South Inland Empire Product 3 404 33 South Inland Empire Product 4 400 34 South Inland Empire Product 5 21 35 South Inland Empire Product 6 41 36 South Orange County Product 1 3 167 37 South Orange County Product 2 1 493 38 South Orange County Product 3 95 39 South Orange County Product 4 94 40 South Orange County Product 5 41 41 South Orange County Product 6 67 42 South San Diego Product 1 5 468 43 South San Diego Product 2 2 439 44 South San Diego Product 3 282 45 South San Diego Product 4 279 46 South San Diego Product 5 16 47 South San Diego Product 6 188 
 #for metching by columns Branch - string to lowercase df1['Branch'] = df1['Branch'].str.lower() df2['Branch'] = df2['Branch'].str.lower() df = pd.merge(df1, df2, on=['Region','Master Product Name', 'Branch']) print df Region Branch Master Product Name Count 0 North fresno Product 1 5 060 1 North fresno Product 2 2 429 2 North sacramento Product 2 2 344 3 North fresno Product 4 540 4 North fresno Product 5 10 5 North fresno Product 3 675 6 North fresno Product 3 675 7 North fresno Product 6 10 8 South los angeles Product 3 302 9 North san francisco Product 1 2 924 10 North san francisco Product 2 1 167 11 North san francisco Product 4 89 12 North san francisco Product 5 32