Python – excel – 使用两个csv字典统计单元格中的单词数

我有一个Python代码，用于统计文本（.txt）文件的出现次数：

find_words = re.compile(r'(?:(?<=[^\w./-])|(?<=^))[A-Za-z]+(?:-[A-Za-z]+)*(?=\W|$)').findall wanted1 = set(find_words(open('word_list_1.csv').read().lower())) wanted2 = set(find_words(open('word_list_2.csv').read().lower())) negators = set(find_words(open('negators.csv').read().lower())) ignore = set(find_words(open('Ignore words.csv').read().lower()))

那么我会做以下处理文本文件：

 with open(csvfile, "wb") as output: writer = csv.writer(output) for f in glob.glob("*.txt"): print "Processing file number : ", i, " out of :", len(glob.glob("*.txt")) i=i+1 with open(f) as inputfile: wordNumber=0 for line in inputfile: if find_words(line.lower()) != []: lineWords=find_words(line.lower())

所以，问题是，我如何做一个Excel文件，而不是.txt文件？我试图做到以下几点：

 for i in range(0, rows): for j in range(0,cols): write_sheet1.write(i,j,sheet.cell_value(i,j)) if sheet.cell_value(i,4)!=0: for line in sheet.cell_value(i,4): print "Line is : ", line if find_words(line.lower()) != []: lineWords=find_words(line.lower())

但它不工作，它只返回一个字符，而不是整行和/或文字…

那么我怎样才能让它的Excel单元格而不是文本文件？

当你阅读一个文本文件时，Python可以让你迭代它，就好像它是一个列表一样。相比之下，电子表格单元格的值（大概）只是一个string，因此您可以直接在其中find单词。

 for i in range(rows): for j in range(cols): write_sheet1.write(i, j, sheet.cell_value(i, j)) if find_words(sheet.cell_value(i, 4)) != []: cell_words = find_words(sheet.cell_value(i, 4).lower())

如果单元格可能包含除string之外的其他内容（如数字），则需要先使用str()将其转换为string。（我不确定你用什么模块来阅读Excel工作表。）

我会使用pandas导入Excel文件，然后迭代pandasDataFrame中的所有单元格。

 import pandas as pd df = pd.read_excel(...) df_out = df.applymap(func)

func是获取单元格内容并返回结果的函数。每个单元格的结果将在df_out中。

Python – excel – 使用两个csv字典统计单元格中的单词数

将Excel（xls）文件转换为不带GUI的逗号分隔（csv）文件

导出到csv后无法显示正确的中文字符

将特殊字符保存为可在PC（Excel）和Mac（数字）上打开的CSV

Excel CSV文件分隔符更改

Excel函数，search最大通配符（*）

如何防止Excel截断CSV文件中的数字？

ruby检查csv中的string是否会被误认为date（在excel中）

CSV拆分方法在第一栏打印第一栏两次

将字典列表中的字典值写入新的列

phpmyadmin导出CSV到excel下降数据