循环通过python数组来匹配第二个数组中的多个条件,快速的方法?

我是一个Python的初学者,想知道是否有更快的方法来做这个代码,所以请原谅我的无知。 我有2个Excel表格:一个( 结果 )有大约30,000行唯一的用户id,然后我有30列的问题问题,下面的单元格是空的。 我的第二张( 答案 ),有大约40万行和3列。 第一列有用户标识符,第二列有问题,第三列有来自用户的相应问题的答案。 我想要做的事情本质上是一个索引匹配数组excel函数,我可以通过匹配用户标识和问题来填充表单1中的空白单元格和表单2中的答案。

结果表 解答表

现在我写了一段代码,但是从表1中处理4列需要花费大约2个小时。我试图弄清楚我的做法是不是完全利用了Numpy的function。

import pandas as pd import numpy as np # Need to take in data from 'answers' and merge it into the 'results' data # Will requiring matching the data based on 'id' in column 1 of 'answers' and the # 'question' in column 2 of 'answers' results = pd.read_excel("/Users/data.xlsx", 'Results') answers = pd.read_excel("/Users/data.xlsx", 'Answers') answers_array = np.array(answers) ######### # Create a list of questions being asked that will be matched to column 2 in answers. # Just getting all the questions I want column_headers = list(results.columns) formula_headers = [] ######### for header in column_headers: formula_headers.append(header) del formula_headers[0:13] # Create an empty array with ids in which the 'merged' data will be fed into pre_ids = np.array(results['Id']) ids = np.reshape(pre_ids, (pre_ids.shape[0], 1)) ids = ids.astype(str) zero_array = np.zeros((ids.shape[0], len(formula_headers))) ids_array = np.hstack((ids, zero_array)) ########## for header in range(len(formula_headers)): question_index = formula_headers[header] for user in range(ids_array.shape[0]): user_index = ids_array[user, 0] location = answers_array[(answers_array[:, 0] == int(user_index)) & (answers_array[:, 1] == question_index)] # This location formula is what I feel is messing everything up, # or could be because of the nested loops # If can't find the user id and question in the answers array if location.size == 0: ids_array[user][header + 1] = '' else: row_location_1 = np.where(np.all(answers_array == location[0], axis=1)) row_location = int(row_location_1[0][0]) ids_array[user][header + 1] = answers_array[row_location][2] print ids_array 

我们可以转向第二个数据框,而不是用第二个数据填充第一个数据框。

 answers.set_index(['id', 'question']).answer.unstack() 

如果您需要行和列与results数据reindex_like的行和列相同,则可以添加reindex_like方法

 answers.set_index(['id', 'question']).answer.unstack().reindex_like(results) 

如果你有重复

 cols = ['id', 'question'] answers.drop_duplicates(cols).set_index(cols).answer.unstack()