在Pandas和Excel中部分重复的条件格式

我有以下csv数据名为reviews.csv

 Movie,Reviewer,Sentence,Tag,Sentiment,Text, Jaws,John,s1,Plot,Positive,The plot was great, Jaws,Mary,s1,Plot,Positive,The plot was great, Jaws,John,s2,Acting,Positive,The acting was OK, Jaws,Mary,s2,Acting,Neutral,The acting was OK, Jaws,John,s3,Scene,Positive,The visuals blew me away, Jaws,Mary,s3,Effects,Positive,The visuals blew me away, Vertigo,John,s1,Scene,Negative,The scenes were terrible, Vertigo,Mary,s1,Acting,Negative,The scenes were terrible, Vertigo,John,s2,Plot,Negative,The actors couldn't make the story believable, Vertigo,Mary,s2,Acting,Positive,The actors couldn't make the story believable, Vertigo,John,s3,Effects,Negative,The effects were awful, Vertigo,Mary,s3,Effects,Negative,The effects were awful, 

我的目标是把这个CSV文件转换成Excel电子表格,条件格式。 具体来说,我想申请以下规则:

  1. 如果“电影”,“句子”,“标签”和“情感”值相同,则整行应该是绿色的。

  2. 如果“电影”,“句子”和“标记”值相同,但“情感”值不同,则该行应为蓝色。

  3. 如果“电影”和“句子”值相同,但“标记”值不同,则该行应为红色。

所以我想创build一个如下所示的Excel电子表格(.xlsx):

电子表格用颜色编码的部分重复

我一直在看Pandas的样式文档,以及XlsxWriter的条件格式教程,但我似乎无法把它们放在一起。 这是我迄今为止。 我可以将csv读入pandas数据框,对其进行sorting(尽pipe我不确定这是否必要),然后将其写回Excel电子表格。 我该如何做条件格式化,以及代码在哪里去?

 def csv_to_xls(source_path, dest_path): """ Convert a csv file to a formatted xlsx spreadsheet Input: path to hospital review csv file Output: formatted xlsx spreadsheet """ #Read the source file and convert to Pandas dataframe df = pd.read_csv(source_path) #Sort by Filename, then by sentence number df.sort_values(['File', 'Sent'], ascending=[True, True], inplace = True) #Create the xlsx file that we'll be writing to orig = pd.ExcelWriter(dest_path, engine='xlsxwriter') #Convert the dataframe to Excel, create the sheet df.to_excel(orig, index=False, sheet_name='report') #Variables for the workbook and worksheet workbook = orig.book worksheet = orig.sheets['report'] #Formatting for exact, partial, mismatch, gold exact = workbook.add_format({'bg_color':'#B7F985'}) #green partial = workbook.add_format({'bg_color':'#D3F6F4'}) #blue mismatch = workbook.add_format({'bg_color':'#F6D9D3'}) #red #Do the conditional formatting somehow orig.save() 

免责声明:我是我要build议的图书馆的作者之一

使用StyleFrame和DataFrame.duplicated可以很容易地实现这DataFrame.duplicated

 from StyleFrame import StyleFrame, Styler sf = StyleFrame(df) green = Styler(bg_color='#B7F985') blue = Styler(bg_color='#D3F6F4') red = Styler(bg_color='#F6D9D3') sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence'], keep=False)], styler_obj=red) sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag'], keep=False)], styler_obj=blue) sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag', 'Sentiment'], keep=False)], styler_obj=green) sf.to_excel('test.xlsx').save() 

这输出以下内容:

在这里输入图像说明