csv表数据处理问题

我有一个这样的桌子

在这里输入图像描述

我需要为每个县的最高百分比分数添加一个字段。 例如如果99.03833,那么安德森县的最高分是HAZ_7。 第一行代表分数。 每一行的数字代表得分的百分比。 我需要每个县的多数得分。

任何人都可以知道如何在Excel或Python中做到这一点?

Excel解决scheme列名称:

 =INDEX(C$1:L$1,MATCH(MAX(C2:L2),C2:L2,0)) 

Excel解决scheme的价值:

 =MAX(B2:L2) 

我会假设这是一个名为df的pandasDataFrame。 如果是这样的话,下面的python会在您的DataFrame中添加一个名为max的列,其中包含每行的最大值。

 df['max'] = df.loc[:,'%HAZ_1':].max(axis=1) 

以下是如何在Python中完成的。

 import csv filename = 'county_data.csv' output_filename = 'county_data2.csv' def maxelements(names, seq): """ Return corresponding names of the position(s) of the largest element in sequence. """ max_value = max(seq) return [names[i] for i, v in enumerate(seq) if v == max_value] with open(filename, 'r') as infile, open(output_filename, 'w') as outfile: reader = csv.reader(infile) writer = csv.writer(outfile) fieldnames = next(reader) # assume first row contains field names writer.writerow(fieldnames + ['Max']) # plus name of new field haz_fields = fieldnames[2:] for row in reader: row = row[:2] + [float(elem) for elem in row[2:]] # convert haz fields to numbers maxfields = maxelements(haz_fields, row[2:]) writer.writerow(row + maxfields) 

这是一个小样本inputcvs文件:

 County,FIPS,%HAZ_1,%HAZ_2,%HAZ_3,%HAZ_4,%HAZ_5,%HAZ_6,%HAZ_7,%HAZ_8,%HAZ_9,%HAZ_10 Anderson County,48001,0,0,0,0,0,0,99.03833,0.961668,0,0 Andrews County,48003,0,0,0,0,0,0,26.08,73.92,0,0 Angelina County,48005,0,0,0,0,0,62.41924,37.58076,0,0,0 Aransas County,48007,0,0,100,0,0,0,0,0,0,0 

以下是写入输出文件的内容:

 County,FIPS,%HAZ_1,%HAZ_2,%HAZ_3,%HAZ_4,%HAZ_5,%HAZ_6,%HAZ_7,%HAZ_8,%HAZ_9,%HAZ_10,Max Anderson County,48001,0.0,0.0,0.0,0.0,0.0,0.0,99.03833,0.961668,0.0,0.0,%HAZ_7 Andrews County,48003,0.0,0.0,0.0,0.0,0.0,0.0,26.08,73.92,0.0,0.0,%HAZ_8 Angelina County,48005,0.0,0.0,0.0,0.0,0.0,62.41924,37.58076,0.0,0.0,0.0,%HAZ_6 Aransas County,48007,0.0,0.0,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,%HAZ_3 

注意: maxelements()函数返回一个列表,因为可能有两个或更多具有相同最大值的%HAZ#字段(尽pipe这在示例input中不会发生)。 代码不一定会妥善处理这种情况,主要是因为你没有描述在这种情况下你会想要发生什么。

这不是一个问题,你可以使用它的以下版本 – 本质上是一个单一的 – 它只是返回第一个索引:

 def maxelements(names, seq): """ Return corresponding names of the position(s) of the largest element in sequence. """ return [names[seq.index(max(seq))]]