优化或加速从.xy文件读取到excel

我有几个.xy文件(2列x和y值)。 我一直在尝试读取所有这些文件,并将“y”值粘贴到一个excel文件中(所有这些文件中的“x”值是相同的)。 我到目前为止的代码逐个读取文件,但它非常慢(每个文件大约需要20秒)。 我有相当多的.xy文件,时间大大增加。 我到现在为止的代码是:

import os,fnmatch,linecache,csv from openpyxl import Workbook wb = Workbook() ws = wb.worksheets[0] ws.title = "Sheet1" def batch_processing(file_name): row_count = sum(1 for row in csv.reader(open(file_name))) try: for row in xrange(1,row_count): data = linecache.getline(file_name, row) print data.strip().split()[1] print data ws.cell("A"+str(row)).value = float(data.strip().split()[0]) ws.cell("B"+str(row)).value = float(data.strip().split()[1]) print file_name wb.save(filename = os.path.splitext(file_name)[0]+".xlsx") except IndexError: pass workingdir = "C:\Users\Mine\Desktop\P22_PC" os.chdir(workingdir) for root, dirnames, filenames in os.walk(workingdir): for file_name in fnmatch.filter(filenames, "*_Cs.xy"): batch_processing(file_name) 

任何帮助表示赞赏。 谢谢。

我认为你的主要问题是,你正在写入Excel,并保存在文件的每一行,为目录中的每一个文件。 我不确定实际将值写入Excel需要多长时间,但是只需将循环save移出循环并只保存一次所有内容就可以节省一点时间。 另外,这些文件有多大? 如果它们很大,那么linecache可能是一个好主意,但是假设它们不是太大,那么你可能没有它。

 def batch_processing(file_name): # Using 'with' is a better way to open files - it ensures they are # properly closed, etc. when you leave the code block with open(filename, 'rb') as f: reader = csv.reader(f) # row_count = sum(1 for row in csv.reader(open(file_name))) # ^^^You actually don't need to do this at all (though it is clever :) # You are using it now to govern the loop, but the more Pythonic way is # to do it as follows for line_no, line in enumerate(reader): # Split the line and create two variables that will hold val1 and val2 val1, val2 = line print val1, val2 # You can also remove this - printing takes time too ws.cell("A"+str(line_no+1)).value = float(val1) ws.cell("B"+str(line_no+1)).value = float(val2) # Doing this here will save the file after you process an entire file. # You could save a bit more time and move this to after your walk statement - # that way, you are only saving once after everything has completed wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")