在Pandas和Excel中部分重复的条件格式

我有以下csv数据名为reviews.csv ：

 Movie,Reviewer,Sentence,Tag,Sentiment,Text, Jaws,John,s1,Plot,Positive,The plot was great, Jaws,Mary,s1,Plot,Positive,The plot was great, Jaws,John,s2,Acting,Positive,The acting was OK, Jaws,Mary,s2,Acting,Neutral,The acting was OK, Jaws,John,s3,Scene,Positive,The visuals blew me away, Jaws,Mary,s3,Effects,Positive,The visuals blew me away, Vertigo,John,s1,Scene,Negative,The scenes were terrible, Vertigo,Mary,s1,Acting,Negative,The scenes were terrible, Vertigo,John,s2,Plot,Negative,The actors couldn't make the story believable, Vertigo,Mary,s2,Acting,Positive,The actors couldn't make the story believable, Vertigo,John,s3,Effects,Negative,The effects were awful, Vertigo,Mary,s3,Effects,Negative,The effects were awful,

我的目标是把这个CSV文件转换成Excel电子表格，条件格式。具体来说，我想申请以下规则：

如果“电影”，“句子”，“标签”和“情感”值相同，则整行应该是绿色的。
如果“电影”，“句子”和“标记”值相同，但“情感”值不同，则该行应为蓝色。
如果“电影”和“句子”值相同，但“标记”值不同，则该行应为红色。

所以我想创build一个如下所示的Excel电子表格（.xlsx）：

电子表格用颜色编码的部分重复

我一直在看Pandas的样式文档，以及XlsxWriter的条件格式教程，但我似乎无法把它们放在一起。这是我迄今为止。我可以将csv读入pandas数据框，对其进行sorting（尽pipe我不确定这是否必要），然后将其写回Excel电子表格。我该如何做条件格式化，以及代码在哪里去？

 def csv_to_xls(source_path, dest_path): """ Convert a csv file to a formatted xlsx spreadsheet Input: path to hospital review csv file Output: formatted xlsx spreadsheet """ #Read the source file and convert to Pandas dataframe df = pd.read_csv(source_path) #Sort by Filename, then by sentence number df.sort_values(['File', 'Sent'], ascending=[True, True], inplace = True) #Create the xlsx file that we'll be writing to orig = pd.ExcelWriter(dest_path, engine='xlsxwriter') #Convert the dataframe to Excel, create the sheet df.to_excel(orig, index=False, sheet_name='report') #Variables for the workbook and worksheet workbook = orig.book worksheet = orig.sheets['report'] #Formatting for exact, partial, mismatch, gold exact = workbook.add_format({'bg_color':'#B7F985'}) #green partial = workbook.add_format({'bg_color':'#D3F6F4'}) #blue mismatch = workbook.add_format({'bg_color':'#F6D9D3'}) #red #Do the conditional formatting somehow orig.save()

免责声明：我是我要build议的图书馆的作者之一

使用StyleFrame和DataFrame.duplicated可以很容易地实现这DataFrame.duplicated ：

 from StyleFrame import StyleFrame, Styler sf = StyleFrame(df) green = Styler(bg_color='#B7F985') blue = Styler(bg_color='#D3F6F4') red = Styler(bg_color='#F6D9D3') sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence'], keep=False)], styler_obj=red) sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag'], keep=False)], styler_obj=blue) sf.apply_style_by_indexes(sf[df.duplicated(subset=['Movie', 'Sentence', 'Tag', 'Sentiment'], keep=False)], styler_obj=green) sf.to_excel('test.xlsx').save()

这输出以下内容：

在这里输入图像说明

在Pandas和Excel中部分重复的条件格式

将特殊字符保存为可在PC（Excel）和Mac（数字）上打开的CSV

OPENROWSET或OPENDATASOURCE在远程服务器的文件系统上从.xlsx或.csv文件获取数据的示例

使用pipe道分隔符将Excel导出为CSV而不更改列表分隔符设置

打开导出的CSV文件时，Excel不显示零小数

Python – 使用csv和xlrd模块将多行excel文件写入一行csv文件

* .csv增加文件大小

播放基于csv数据的值

导入CSV到Excel – 自动“文本到列”和“插入表”

将浮点值写入.csv文件

将多个CSV文件中的数据导入Excel工作表