分组，分类，pandas累计总和

我正在从excel Countifs / Sum转换到Pandas。在Pandas中，我希望对某些input数据进行分组，累积和，然后将其作为输出表写入csv。

我的input表是每个项目发生时间标记的项目列表，如：

import pandas as pd df_in = pd.DataFrame({ 'Date' :[pd.Timestamp('20130101'),pd.Timestamp('20140101'),pd.Timestamp('20150101'),pd.Timestamp('20160101'),pd.Timestamp('20160101'),pd.Timestamp('20160101')], 'Type' : ['item1','item2','item2','item1','item1','item1'], 'Proj' : ['PJ1','PJ1','PJ1','PJ1','PJ2','PJ2']}) #giving Proj Date Type PJ1 2013-01-01 item1 PJ1 2014-01-01 item2 PJ1 2015-01-01 item2 PJ1 2016-01-01 item1 PJ2 2016-01-01 item1 PJ2 2016-01-01 item1

我想在一系列用户定义的时间窗口中对每个项目的每个项目types进行累计总和（最后，我希望每个项目在一个时间段（月，季度，年度等）实现的累计项目数量。我的输出（binned到结束date）应该看起来像

 Proj Date_ item1 item2 PJ1 2014-01-01 1.0 1.0 PJ1 2016-01-01 2.0 2.0 PJ2 2014-01-01 0.0 0.0 PJ2 2016-01-01 2.0 0.0

此代码工作，但似乎笨拙，需要循环。有没有更好的方法来实现输出？也许vector化的东西？此外，即使数据中有空数据，我也总是希望保留输出区域 – 稍后需要它们进行一致的绘图。

 #prepare output table df_out = pd.DataFrame({ 'Date_' : [], 'Proj' : [], 'item1' : [], 'item2' : []}) #my time bins bins = [pd.Timestamp('20121229'),pd.Timestamp('20140101'),pd.Timestamp('20160101')] #group and bin data in a dataframe groups = df_in.groupby(['Proj',pd.cut(df_in.Date, bins),'Type']) allData = groups.count().unstack() #list of projects in data proj_list = list(set(df_in['Proj'])) #build output table by looping per project for p in proj_list: #cumulative sum of items achieved per project per bin ProjData = allData.loc[p].fillna(0).cumsum() #output should appear binned to the end date ProjData=ProjData['Date'][:] ProjData['Date_']=pd.IntervalIndex(ProjData.index.get_level_values('Date')).right #include row wise project reference ProjData['Proj']=p #collapse the multi-dimensional dataframe for outputting ProjData.reset_index(level=0, inplace=True) ProjData.reset_index(level=0, inplace=True) #build output table for export df_out = df_out.append(ProjData[['Date_','Proj','item1','item2']])

分组，分类，pandas累计总和

我的countif只能使用硬编码的date标准，而不是dynamic的，因此= today（） – 365

如何从另一张表中自动填充数据的列？

计算一个特定列中单元格中的值与特定其他列的同一行中的值不相同的频率

CountIf列标题匹配星期几

Count如果配方不匹配

在Excel中查找date范围内的合格率

如何在Countifs函数的criteria_range中做一些小math（在Countif中使用OR）

使用COUNTIFS获取滚动计数

vba countifexpression

列A中的Excel Countifdate大于列B