Python:如何根据另一个variables的bin分割一个variables?

我想创build一个基于variablesX的bin的variablesV的直方图。 为此,我阅读了一个如下所示的Excel文件:

 Column X Column V 99.9 0 100.0 3 25.17 2 39.45 1 66.52 1 17.17 6 9.25 2 86.11 3 84.09 3 

对于variablesX每个bin,我想计算与它相关的V值的平均值。 例如:

 X bin: 0-30 -> avg(V)=(2+6+2)/3=3.33 X bin: 31-80 -> avg(V)=(1+1)/2=1.00 X bin: 81-100 -> avg(V)=(3+3+0+3)/4=2.25 

所以我想出了:

 X bin avg(V) 0-30 3.33 31-80 1.00 81-100 2.25 

要做到这一点,我已经写了下面的代码块,其中我使用一些列表来收集每个X bin(binwidth = 10)内的所有V值。

编辑

我的列表长度有问题。 例如,对于1000行的excel文件,只有属于41-50 1 V值。 但是,如果我inputlen(islands_4150)=999 。 代码在哪里获得其他998值?

 from openpyxl import load_workbook wb = load_workbook(filename = 'myfile.xlsx') ws=wb.active cell_range_1 = ws['X2':'X1001'] cell_range_2 = ws['V2':'V1001'] cf_list=[] #List with X values island_list=[] #List with V values for row in range(2,1001): for column in 'X': cell_name_1="{}{}".format(column, row) #X cf_list.append(ws[cell_name_1].value) x=map(lambda x: int(x) if x%1==0 else x, cf_list) for column in 'V': cell_name_2="{}{}".format(column, row) #V island_list.append(ws[cell_name_2].value) v=map(lambda x: int(x) if x%1==0 else x, island_list) islands_010=[] #List with values from column V which corresponding values from column X are 0<=value<=10 islands_1120=[] islands_2130=[] islands_3140=[] islands_4150=[] islands_5160=[] islands_6170=[] islands_7180=[] islands_8190=[] islands_91100=[] for i, val in enumerate(x): for j, elem in enumerate(v): if x[i]>=0 and x[i]<=10: islands_010.append(v[i]) elif x[i]>=11 and x[i]<=20: islands_1120.append(v[i]) elif x[i]>=21 and x[i]<=30: islands_2130.append(v[i]) elif x[i]>=31 and x[i]<=40: islands_3140.append(v[i]) elif x[i]>=41 and x[i]<=50: islands_4150.append(v[i]) elif x[i]>=51 and x[i]<=60: islands_5160.append(v[i]) elif x[i]>=61 and x[i]<=70: islands_6170.append(v[i]) elif x[i]>=71 and x[i]<=80: islands_7180.append(v[i]) elif x[i]>=81 and x[i]<=90: islands_8190.append(v[i]) elif x[i]>=91 and x[i]<=100: islands_91100.append(v[i]) if len(islands_010)==0: print ('Avg islands 0-10: 0') else: avg010=round(reduce(lambda x, y: x + y, islands_010) / len(islands_010),3) print ('Avg islands 0-10: '+str(avg010)) if len(islands_1120)==0: print ('Avg islands 11-20: 0') else: avg1120=round(reduce(lambda x, y: x + y, islands_1120) / len(islands_1120),3) print ('Avg islands 11-20: '+str(avg1120)) if len(islands_2130)==0: print ('Avg islands 21-30: 0') else: avg2130=round(reduce(lambda x, y: x + y, islands_2130) / len(islands_2130),3) print ('Avg islands 21-30: '+str(avg2130)) if len(islands_3140)==0: print ('Avg islands 31-40: 0') else: avg3140=round(reduce(lambda x, y: x + y, islands_3140) / len(islands_3140),3) print ('Avg islands 31-40: '+str(avg3140)) if len(islands_4150)==0: print ('Avg islands 41-50: 0') else: avg4150=round(reduce(lambda x, y: x + y, islands_4150) / len(islands_4150),3) print ('Avg islands 41-50: '+str(avg4150)) if len(islands_5160)==0: print ('Avg islands 51-60: 0') else: avg5160=round(reduce(lambda x, y: x + y, islands_5160) / len(islands_5160),3) print ('Avg islands 51-60: '+str(avg5160)) if len(islands_6170)==0: print ('Avg islands 61-70: 0') else: avg6170=round(reduce(lambda x, y: x + y, islands_6170) / len(islands_6170),3) print ('Avg islands 61-70: '+str(avg6170)) if len(islands_7180)==0: print ('Avg islands 71-80: 0') else: avg7180=round(reduce(lambda x, y: x + y, islands_7180) / len(islands_7180),3) print ('Avg islands 71-80: '+str(avg7180)) if len(islands_8190)==0: print ('Avg islands 81-90: 0') else: avg8190=round(reduce(lambda x, y: x + y, islands_8190) / len(islands_8190),3) print ('Avg islands 81-90: '+str(avg8190)) if len(islands_91100)==0: print ('Avg islands 91-100: 0') else: avg91100=round(reduce(lambda x, y: x + y, islands_91100) / len(islands_91100),3) print ('Avg islands 91-100: '+str(avg91100)) 

你的代码现在看起来相当糟糕,这使得问题变得模糊不清。

第一个问题是空白。 你需要一些。

接下来是for column in 'X': for column in 'V': 。 这两个for循环是无用的,他们可以被replace为:

 cell_name_1="X{}".format(row) #X variable cell_name_2="V{}".format(row) #V variable 

此外,我build议抓住单元格的值,然后做所有的比较:

 x_val = float(ws[cell_name_1].value) v_val = int(ws[cell_name_2].value) 

python的范围是包含在第一个数字,并排除在最后。 因此,你的第一个循环的范围应该是1002,最后一行是1001。

 for row in range(2, 1002): 

我build议使用ws = WB.get_sheet_by_name("sheet_name")检索工作表,而不是ws=wb.active以确保您始终获得所需表单。

最后,我们遇到了实际的问题。 你目前的方法是直接从Excel直接读入垃圾箱。 你应该做的是从Excel中读取所有的数据,然后操纵它来产生你想要的箱子。 第一步是将数据导入一个使你的生活变得最简单的python结构,我推荐一个元组列表:

 islands.append((x_val,v_val)) 

这会产生这样的事情:

 [(99.9, 0), (100.0, 3), (25.17, 2), (39.45, 1), (66.52, 1), (17.17, 6), (9.25, 2), (86.11, 3), (84.09, 3)] 

现在我们应该按列X值对数据进行sorting:

 islands.sort(key = lambda x: x[0]) 

生产:

 [(9.25, 2), (17.17, 6), (25.17, 2), (39.45, 1), (66.52, 1), (84.09, 3), (86.11, 3), (99.9, 0), (100.0, 3)] 

现在我们的数据已经sorting了,我们可以很容易地生成一个由每个bin的最大值定义的值的字典:

 bins = [30, 80, 100] binned_data = {key: [] for key in bins} for item in islands: for bin in bins: if item[0] <= bin: binned_data[bin].apppend(item[1]) break 

这会导致如下的字典:

 {80: [1, 1], 100: [0, 3, 3, 3], 30: [2, 6, 2]} 

从这里你可以用平均数来计算平均值

 averages = {bin: sum(binned_data[bin])/float(len(binned_data[bin])) for bin in binned_data} 

把它放在一起:

 from openpyxl import load_workbook wb = load_workbook(filename = 'myfile.xlsx') ws = wb.get_sheet_by_name("sheet_name") islands = [] for row in range(2,1002): cell_name_1="X{}".format(row) #X variable cell_name_2="V{}".format(row) #V variable x_val = float(ws[cell_name_1].value) v_val = int(ws[cell_name_2].value) islands.append((x_val,v_val)) islands.sort(key = lambda x: x[0]) bins = [30, 80, 100] binned_data = {key: [] for key in bins} for item in islands: for bin in bins: if item[0] <= bin: binned_data[bin].apppend(item[1]) break averages = {bin: sum(binned_data[bin])/float(len(binned_data[bin])) for bin in binned_data}