计算与使用pandas的专栏

我有excel

 ID digit count 03a63f1c5a89fb89fcc4d7cf60e2e6b1 131826356 1 1ea192ddd5c042d71910de18595553a5 100897602 5 1ea192ddd5c042d71910de18595553a5 123581809 7 1ea192ddd5c042d71910de18595553a5 137961455 1 1ea192ddd5c042d71910de18595553a5 163217715 4 1ea192ddd5c042d71910de18595553a5 164622895 3 206dcb0a5bfbdbedb77b6f25bbb4b54b 139156767 12 21a27ac254d8b2fe0a52d052bbbd14a5 267483648 1 

我需要将所有的sumcount在每个ID ,然后将每个count除以这个值。 我试试

 df = pd.read_excel("count_video.xlsx") group = df.groupby('ID')['count'].sum() 

但是我不知道,接下来怎么用呢?

您可以使用lambda来将每个计数值除以分组sum

 In [155]: df['count/sum'] = df.groupby('ID')['count'].transform(lambda x: x/x.sum()) df Out[155]: ID digit count count/sum 0 03a63f1c5a89fb89fcc4d7cf60e2e6b1 131826356 1 1.00 1 1ea192ddd5c042d71910de18595553a5 100897602 5 0.25 2 1ea192ddd5c042d71910de18595553a5 123581809 7 0.35 3 1ea192ddd5c042d71910de18595553a5 137961455 1 0.05 4 1ea192ddd5c042d71910de18595553a5 163217715 4 0.20 5 1ea192ddd5c042d71910de18595553a5 164622895 3 0.15 6 206dcb0a5bfbdbedb77b6f25bbb4b54b 139156767 12 1.00 7 21a27ac254d8b2fe0a52d052bbbd14a5 267483648 1 1.00