循环通过python数组来匹配第二个数组中的多个条件，快速的方法？

我是一个Python的初学者，想知道是否有更快的方法来做这个代码，所以请原谅我的无知。我有2个Excel表格：一个（结果）有大约30,000行唯一的用户id，然后我有30列的问题问题，下面的单元格是空的。我的第二张（答案），有大约40万行和3列。第一列有用户标识符，第二列有问题，第三列有来自用户的相应问题的答案。我想要做的事情本质上是一个索引匹配数组excel函数，我可以通过匹配用户标识和问题来填充表单1中的空白单元格和表单2中的答案。

结果表解答表

现在我写了一段代码，但是从表1中处理4列需要花费大约2个小时。我试图弄清楚我的做法是不是完全利用了Numpy的function。

import pandas as pd import numpy as np # Need to take in data from 'answers' and merge it into the 'results' data # Will requiring matching the data based on 'id' in column 1 of 'answers' and the # 'question' in column 2 of 'answers' results = pd.read_excel("/Users/data.xlsx", 'Results') answers = pd.read_excel("/Users/data.xlsx", 'Answers') answers_array = np.array(answers) ######### # Create a list of questions being asked that will be matched to column 2 in answers. # Just getting all the questions I want column_headers = list(results.columns) formula_headers = [] ######### for header in column_headers: formula_headers.append(header) del formula_headers[0:13] # Create an empty array with ids in which the 'merged' data will be fed into pre_ids = np.array(results['Id']) ids = np.reshape(pre_ids, (pre_ids.shape[0], 1)) ids = ids.astype(str) zero_array = np.zeros((ids.shape[0], len(formula_headers))) ids_array = np.hstack((ids, zero_array)) ########## for header in range(len(formula_headers)): question_index = formula_headers[header] for user in range(ids_array.shape[0]): user_index = ids_array[user, 0] location = answers_array[(answers_array[:, 0] == int(user_index)) & (answers_array[:, 1] == question_index)] # This location formula is what I feel is messing everything up, # or could be because of the nested loops # If can't find the user id and question in the answers array if location.size == 0: ids_array[user][header + 1] = '' else: row_location_1 = np.where(np.all(answers_array == location[0], axis=1)) row_location = int(row_location_1[0][0]) ids_array[user][header + 1] = answers_array[row_location][2] print ids_array

我们可以转向第二个数据框，而不是用第二个数据填充第一个数据框。

 answers.set_index(['id', 'question']).answer.unstack()

如果您需要行和列与results数据reindex_like的行和列相同，则可以添加reindex_like方法

 answers.set_index(['id', 'question']).answer.unstack().reindex_like(results)

如果你有重复

 cols = ['id', 'question'] answers.drop_duplicates(cols).set_index(cols).answer.unstack()

循环通过python数组来匹配第二个数组中的多个条件，快速的方法？

Ruby：使用默认值在Excel中创build一个下拉列表

如何从不同的Excel文件和工作表导入数据到一个主要的Excel文件

如何计算单元格数组中出现的string的实例

返回最后一个填充的单元格在可变范围内的值

Excelmacros – 从数组中读取数据

Excel中的国际macros

Excel文本格式不适用于Office编写器报告

pandasExcel导入仅适用于单个函数调用 – 第二个函数调用时出错

为什么将单元格内容设置为左alignment（Aspose Cells）？

如何获取在页面末尾有注释的Excel表页数