使用groupby重新格式化excel数据，并在python中将空行添加到dataframe

我有一个为期一年的60分钟计数降雨的长期excel文件。我正在查看excel文件，将总的降雨量值汇总为日总降雨量（ group.by well well），然后将这些值放入一个新的dataframe中，每年的每一天都是一个单独的行，如果有那天没有下雨，下雨的时候总的日降雨量Value 。我已经概述了我会采取的步骤和我在下面的代码的尝试。我打开其他build议，如果我试图编码是废话。 Excel文件的第一行如下所示：

 60 Minute Counts, [] Time Stamp Latitude Longitude Value () Dec 27 2015 01:30:00 AM 0.297 36.900 0.25 Dec 25 2015 01:00:00 PM 0.297 36.900 0.51 Dec 25 2015 10:30:00 AM 0.297 36.900 0.25 Dec 25 2015 07:30:00 AM 0.297 36.900 0.25 Dec 25 2015 05:00:00 AM 0.297 36.900 0.25 Dec 25 2015 04:30:00 AM 0.297 36.900 0.25 Dec 17 2015 02:30:00 AM 0.297 36.900 0.25 Dec 16 2015 02:30:00 PM 0.297 36.900 0.25 Dec 16 2015 02:00:00 PM 0.297 36.900 0.76 Dec 16 2015 12:30:00 PM 0.297 36.900 0.25 Dec 16 2015 12:00:00 PM 0.297 36.900 0.76 Dec 16 2015 11:30:00 AM 0.297 36.900 5.08 Dec 16 2015 11:00:00 AM 0.297 36.900 0.51 Dec 15 2015 03:30:00 PM 0.297 36.900 0.25

然后我需要阅读我曾经玩过的excel文件：

 from openpyxl import load_workbook wb = load_workbook(filename = 'filename.xlsx') sheet_ranges = wb['60 minute counts']

但我不知道如何阅读第3 +行的实际值。

在为Time Stamp和Value ()列定义dataframedf0之后，我需要将Time Stamp转换为像YYYY-MM-DD这样的格式，它可以使用以下代码：

 import pandas as pd df0["time"] = pd.to_datetime(df0["time"]) df0["day"] = df0['time'].map(lambda x: x.day) df0["month"] = df0['time'].map(lambda x: x.month) df0["year"] = df0['time'].map(lambda x: x.year)

那么我需要把60分钟计数的降雨量合并为每日总降雨量，方法是：

 df1 = df0.groupby(['Value ()', 'day', 'month', 'year'], as_index=False).sum()

最后，我需要制作一年中每天都有一排的数据框，然后是每日总降雨量。它看起来像这样：

 Date Value 2015-12-31 0 2015-12-30 0 2015-12-29 0 2015-12-28 0 2015-12-27 0.25 2015-12-26 0 2015-12-25 1.52 2015-12-24 0 2015-12-23 0 2015-12-22 0 2015-12-21 0 2015-12-20 0 2015-12-19 0 2015-12-18 0 2015-12-17 0.25 2015-12-16 7.62

… 等等

让我知道是否有助于发布整个文件，我可以添加一个保pipe箱链接。

看来你需要resample ：

 df0.index = pd.to_datetime(df0["Time Stamp"]) df1 = df0.resample('D')['Value ()'].sum().fillna(0).reset_index() print (df1) Time Stamp Value () 0 2015-12-15 0.25 1 2015-12-16 7.61 2 2015-12-17 0.25 3 2015-12-18 0.00 4 2015-12-19 0.00 5 2015-12-20 0.00 6 2015-12-21 0.00 7 2015-12-22 0.00 8 2015-12-23 0.00 9 2015-12-24 0.00 10 2015-12-25 1.51 11 2015-12-26 0.00 12 2015-12-27 0.25

或者与Grouper ：

 df0.index = pd.to_datetime(df0["Time Stamp"]) df1 = df0.groupby(pd.Grouper(freq='D'))['Value ()'].sum().fillna(0).reset_index() print (df1) Time Stamp Value () 0 2015-12-15 0.25 1 2015-12-16 7.61 2 2015-12-17 0.25 3 2015-12-18 0.00 4 2015-12-19 0.00 5 2015-12-20 0.00 6 2015-12-21 0.00 7 2015-12-22 0.00 8 2015-12-23 0.00 9 2015-12-24 0.00 10 2015-12-25 1.51 11 2015-12-26 0.00 12 2015-12-27 0.25

如有必要添加sort_index ：

 df1 = df0.resample('D')['Value ()'].sum().sort_index(ascending=False).fillna(0).reset_index() print (df1) Time Stamp Value () 0 2015-12-27 0.25 1 2015-12-26 0.00 2 2015-12-25 1.51 3 2015-12-24 0.00 4 2015-12-23 0.00 5 2015-12-22 0.00 6 2015-12-21 0.00 7 2015-12-20 0.00 8 2015-12-19 0.00 9 2015-12-18 0.00 10 2015-12-17 0.25 11 2015-12-16 7.61 12 2015-12-15 0.25 df1 = df0.groupby(pd.Grouper(freq='D'))['Value ()'].sum() .sort_index(ascending=False).fillna(0).reset_index() print (df1) Time Stamp Value () 0 2015-12-27 0.25 1 2015-12-26 0.00 2 2015-12-25 1.51 3 2015-12-24 0.00 4 2015-12-23 0.00 5 2015-12-22 0.00 6 2015-12-21 0.00 7 2015-12-20 0.00 8 2015-12-19 0.00 9 2015-12-18 0.00 10 2015-12-17 0.25 11 2015-12-16 7.61 12 2015-12-15 0.25

使用groupby重新格式化excel数据，并在python中将空行添加到dataframe

使用VBA excel在组内创build组

Excel数据透视表：如何根据date时间值来计算员工的工作天数？

EPPlus – 组行

Excel – 如何计数（*）和类似于SQL的groupby

如何使用rowspan正确分组excel数据源？

如何在Excel中按列中的值应用组

Excel：按照组计数的多行上具有匹配标准的CountIfs

转置和分组数据

MS excel使子表/智能分组

你如何在列中分组数据？