使用xlrd / xlwt和循环迭代优化Excel数据收集/减less

我刚刚开始使用Python进行编码,并且有很多东西需要学习。 我的代码的目标是从单元格拉string,检查其字符长度,并用特定的缩写replace单词。 然后,我将新的string写入一个不同的Excel表,并保存一次所有的数据已经减less。 我终于想出了如何让它工作,但这确实需要很长时间。 我正在使用10000个以上的string单元格,我的循环迭代可能远没有优化。 如果你有任何信息,这将是很好的。

import xlwt import xlrd book = xlrd.open_workbook() # opens excel file for data input reduc = xlwt.Workbook() # creates the workbook that the reduced data will be saved in # Calls the sheets I will be working with Data = book.sheet_by_index(3) Table = book.sheet_by_index(5) sheet1 = reduc.add_sheet("sheet 1") # the initial loop pulls the string from excel for x in xrange(30): # I use a limited range for debugging From = str(Data.col(15)[x].value) To = str(Data.col(16)[x].value) print x # I just print this to let me know that i'm not stuck if len(From) <= 30 and len(To) <= 30: sheet1.write(x, 3, From) sheet1.write(x, 4, To) else: while len(From) > 30 or len(To) > 30: for y in xrange(Table.nrows): word = str(Table.col(0)[y].value) abbrv = str(Table.col(1)[y].value) if len(From) > 30: From = From.replace(word, abbrv) if len (To) > 30: To = To.replace(word, abbrv) sheet1.write(x, 3, From) sheet1.write(x, 4, To) break reduc.save("newdoc.xls") print " DONE! 

以下是我更新的代码。 这几乎是我所期望的瞬间。 我预先加载了我想要的所有列,然后通过相同的循环系统运行它。 我然后存储而不是将数据写入新的Excel文件。 在所有数据减less后,我将每个单元保存在一个单独的for循环中。 感谢您的build议家伙。

 import xlwt import xlrd # Workbook must be located in the Python27 folder in the C:/directory book = xlrd.open_workbook() # opens exel file for data input # Calls the sheets I will be working with Data = book.sheet_by_index(0) Table = book.sheet_by_index(1) # Import column data from excel From = Data.col_values(15) To = Data.col_values(16) word = Table.col_values(0) abbrv = Table.col_values(1) # Empty variables to be filled with reduced string From_r = [] To_r = [] # Notes to be added for x in xrange(Data.nrows): if len(From[x]) <= 28 and len(To[x]) <= 28: From_r.append(From[x]) To_r.append(To[x]) else: while len(From[x]) > 28 or len(To[x]) > 28: for y in xrange(Table.nrows): if len(From[x]) > 28: From[x] = From[x].replace(word[y], abbrv[y]) if len (To[x]) > 28: To[x] = To[x].replace(word[y], abbrv[y]) From_r.append(From[x]) To_r.append(To[x]) break # Create new excel file to write reduced strings into reduc = xlwt.Workbook() sheet1 = reduc.add_sheet("sheet 1") # Itterate through list to write each object into excel for i in xrange(Data.nrows): sheet1.write(i, 3, From_r[i]) sheet1.write(i, 4, To_r[i]) # Save reduced string in new excel file reduc.save("lucky.xls") print " DONE! " 

缓慢可能是由于低效的replace代码。 你应该尝试加载所有的单词和相应的缩写,除非列表如此之大,你会用完内存。 然后再加速,你可以一口气replace所有的单词。

做到这一点,并将其从循环中移出

 words = [str(cell.value) for cell in Table.col(0)] #list comprehension abbr = [str(cell.value) for cell in Table.col(1)] replacements = zip(words, abbr) 

这里的函数使用正则expression式模块来replace给定列表中的所有匹配项。

 import re def multiple_replacer(*key_values): replace_dict = dict(key_values) replacement_function = lambda match: replace_dict[match.group(0)] pattern = re.compile("|".join([re.escape(k) for k, v in key_values])) return lambda string: pattern.sub(replacement_function, string) 

使用它可以这样做:

 replaceFunc = multiple_replacer(*replacements) #constructs the function. Do this outside the loop, after the replacements have been gathered. myString = replaceFunc(myString)