使用pandas创build一个平均的数据框

From country Austria Belgium Denmark France Germany Italy Luxembourg Switzerland The Netherlands United Kingdom Austria 0 0 0 0 0 0 3 0 6 1 Belgium 0 0 0 2 1 1 0 0 5 1 Denmark 0 2 0 2 0 1 0 2 3 0 France 0 0 0 0 6 0 0 0 4 0 Germany 0 2 0 6 0 0 0 1 1 0 Italy 0 0 3 0 1 0 4 1 1 0 Luxembourg 0 0 0 4 0 1 0 1 3 1 Switzerland 0 1 0 0 0 0 0 0 7 2 The Netherlands 1 0 5 1 0 2 0 0 0 1 United Kingdom 2 0 2 2 0 2 1 0 1 0

在这里，我有一个表格，其中的值是从一个国家分配到一个国家的列的点。我总共有60个表，我试图创build一个看起来相同的最终表，但是这些值是所有60个表的平均值。我还没有find任何function在pandas或其他地方在堆栈交换平均每个值，就像我试图做的，我怎么能解决这个问题？

PS：在一些表格中有或多或less的国家

您可以read_excel使用参数sheetname=None的Dataframes 。然后用concat创build大的df ， groupby用第二级index和聚合的mean ：

 dict_dfs = pd.read_excel('multiple_sheets.xlsx', sheetname=None) print (dict_dfs) {'sheetname1': ab 0 1 4 1 2 8, 'sheetname2': ab 0 7 1 1 5 0, 'sheetname3': ab 0 4 5} df = pd.concat(dict_dfs) print (df) ab sheetname1 0 1 4 1 2 8 sheetname2 0 7 1 1 5 0 sheetname3 0 4 5 df = df.groupby(level=1).mean() print (df) ab 0 4.0 3.333333 1 3.5 4.000000

编辑：

用你的数据文件示例：

 dict_dfs = pd.read_excel('multiple_sheets.xlsx', sheetname=None, index_col=0) df = pd.concat(dict_dfs) df = df.groupby(level=1).mean() print (df) Austria Belgium Denmark France Germany Italy \ Fromcountry Austria 4 0 0 0 0 0 Belgium 0 0 0 2 1 1 Denmark 0 2 0 2 0 1 France 0 0 0 0 6 0 Germany 0 2 0 6 0 0 Italy 0 0 3 0 1 0 Luxembourg 0 0 0 4 0 1 Switzerland 0 1 0 0 0 0 The Netherlands 1 0 5 1 0 2 USA 3 4 0 0 0 0 United Kingdom 2 0 2 2 0 2 Luxembourg Switzerland The Netherlands USA United Kingdom Fromcountry Austria 3 0 6 4.0 1 Belgium 0 0 5 4.0 1 Denmark 0 2 3 5.0 0 France 0 0 4 0.0 0 Germany 0 1 1 0.0 0 Italy 4 1 1 0.0 0 Luxembourg 0 1 3 0.0 1 Switzerland 0 0 7 0.0 2 The Netherlands 0 0 0 0.0 1 USA 0 0 0 0.0 0 United Kingdom 1 0 1 0.0 0

如果有多个条目，则最后使用reindex进行参考index和columns名称的过滤：

 #reference sheetname - sheetname1 idx = dict_dfs['sheetname1'].index cols = dict_dfs['sheetname1'].columns df = df.reindex(index=idx, columns=cols) print (df) Austria Belgium Denmark France Germany Italy \ Fromcountry Austria 4 0 0 0 0 0 Belgium 0 0 0 2 1 1 Denmark 0 2 0 2 0 1 France 0 0 0 0 6 0 Germany 0 2 0 6 0 0 Italy 0 0 3 0 1 0 Luxembourg 0 0 0 4 0 1 Switzerland 0 1 0 0 0 0 The Netherlands 1 0 5 1 0 2 United Kingdom 2 0 2 2 0 2 Luxembourg Switzerland The Netherlands United Kingdom Fromcountry Austria 3 0 6 1 Belgium 0 0 5 1 Denmark 0 2 3 0 France 0 0 4 0 Germany 0 1 1 0 Italy 4 1 1 0 Luxembourg 0 1 3 1 Switzerland 0 0 7 2 The Netherlands 0 0 0 1 United Kingdom 1 0 1 0

假设我们有一个dataframes tables的列表

 tables = [df.set_index('From country').copy() for _ in range(10)]

如果我们将索引设置为'From country' ，那么它已经不是索引了。如果已经存在，则跳过该部分。

然后，我们将数据pd.Panel列表转换为pd.Panel并取平均值在零轴上

 pd.Panel(dict(enumerate(tables))).mean(0)

如果tables已经是一个字典，那么我们只需要直接传递给pd.Panel

 pd.Panel(tables).mean(0)

在这里输入图像说明

使用pandas创build一个平均的数据框

在Excel 2010中有条件地获取单元格的平均值

MYSQL SELECT AVG（）满足一定条件的行

使用如果灵活得到平均水平

平均如果不是零

如何计算与每个组成直方图的频率库链接的数字列的平均值，Excel 2010？

计算不包括当前观察值的平均值

有条件地平均Excel数据列

平均可变长度的VBA行

在Excel中使用VBA查找显示的筛选数据的平均值和标准偏差

最后x天的平均值（最后一个填满的单元格的开始）