使用Python在Excel中使用数据透视表进行报告

我有数据

ID,"address","used_at","active_seconds","pageviews" 0a1d796327284ebb443f71d85cb37db9,"vk.com",2016-01-29 22:10:52,3804,115 0a1d796327284ebb443f71d85cb37db9,"2gis.ru",2016-01-29 22:48:52,214,24 0a1d796327284ebb443f71d85cb37db9,"yandex.ru",2016-01-29 22:14:30,4,2 0a1d796327284ebb443f71d85cb37db9,"worldoftanks.ru",2016-01-29 22:10:30,41,2 

但是这么大, Excel不能打开它。 我需要把所有的时间分到不同的星期,并把结果打印到每个address每个id 。 它应该看起来像

 ID vk.com 2gis.ru yandex.ru 0a1d796327284ebb443f71d85cb37db9 23 40 56 465a3fc01a62fd89a8094abdaccdcc99 0 100 45 ... 

我数数到所有的时间

 data = pd.read_csv("desktop-visits-dnp.csv") group = data.groupby(['ID', 'address']).active_seconds.sum() 

但是我需要把它分成几个星期

但是我没有太多的python技能,也不知道我能否完成这个任务

以下代码为每个IDweek创build一个active_seconds总和。

首先,生成一些类似于你的样本数据:

 df = pd.DataFrame() ids = [''.join([random.choice(string.ascii_lowercase + string.digits) for _ in range(16)]) for i in range(10)] addresses = [''.join([random.choice(string.ascii_lowercase) for _ in range(10)]) for i in range(10)] df['ID'] = np.random.choice(ids, size=10000) df['address'] = np.random.choice(addresses, size=10000) df['active_seconds'] = np.random.randint(0, 100, 10000) df['used_at'] = pd.date_range(start=datetime(2016, 1, 1, 0, 0, 0), freq='H', periods=10000) 

现在设置used_atIDaddressindexunstack()后者,这将address放入以active_seconds作为值的列中。

 df = df.set_index(['used_at', 'ID', 'address']).unstack().loc[:, 'active_seconds'].reset_index('ID') 

接下来,按ID分组,每个时间段对所有值进行resampleresample ,并将ID重置为列而不是索引:

 df = df.groupby('ID').resample('W', how='sum').reset_index('ID') 

每个IDaddress每周使用

 df.head() address ID afgpxizbum cihchvzttw dguznssmbi irpvqtmuva \ used_at 2016-01-03 06y2myiclyb2s4hr NaN NaN NaN 19.0 2016-01-10 06y2myiclyb2s4hr 57.0 15.0 66.0 NaN 2016-01-17 06y2myiclyb2s4hr 13.0 144.0 152.0 139.0 2016-01-24 06y2myiclyb2s4hr 186.0 112.0 NaN NaN 2016-01-31 06y2myiclyb2s4hr 15.0 68.0 128.0 63.0 address otlkynddwv ptzzhghnfl rgwbuevvez tgvbvfibaf toimlivump \ used_at 2016-01-03 30.0 NaN NaN 50.0 NaN 2016-01-10 59.0 28.0 NaN NaN 214.0 2016-01-17 106.0 26.0 179.0 62.0 69.0 2016-01-24 87.0 10.0 130.0 264.0 7.0 2016-01-31 144.0 NaN 215.0 NaN 208.0 address uwsdzqyudi used_at 2016-01-03 99.0 2016-01-10 235.0 2016-01-17 128.0 2016-01-24 85.0 2016-01-31 60.0 

现在你可以在group_by周,迭代结果并保存到索引。

 for week, data in df.groupby(level=0): data.to_excel('{}.xlsx'.format(week))