如何批量运行python脚本？

我正在寻找一个解决scheme来批量运行一组数据的python命令。例如，我想为前10行运行下面提到的代码，打印输出并运行下一批，直到行结束。这样做的原因是，目前运行1000行花费了大量的时间。

试图使用concurrent.futures.ProcessPoolExecutor但它没有帮助。有一个更好的方法吗？

这里是代码：

 import os, sys import xlwt import numpy import tensorflow as tf import xlsxwriter import urllib filename = "/home/shri/Desktop/tf_files/test1" def getimg(count): # open file to read with open("{0}.csv".format(filename), 'r') as csvfile: # iterate on all lines i = 0 for line in csvfile: splitted_line = line.split(',') # check if we have an image URL if splitted_line[1] != '' and splitted_line[1] != "\n": urllib.urlretrieve(splitted_line[1], '/home/shri/Desktop/tf_files/images/{0}.jpg'.format (splitted_line[0])) print "Image saved for {0}".format(splitted_line[0]) i += 1 else: print "No result for {0}".format(splitted_line[0]) os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' def run_inference(count): # Create a workbook and add a worksheet. workbook = xlsxwriter.Workbook('output.xlsx') worksheet = workbook.add_worksheet() # Start from the first cell. Rows and columns are zero indexed. row = 0 col = 0 # search for files in 'images' dir files_dir = os.getcwd() + '/images' files = os.listdir(files_dir) # loop over files, print prediction if it is an image for f in files: if f.lower().endswith(('.png', '.jpg', '.jpeg')): image_path = files_dir + '/' + f # Read in the image_data image_data = tf.gfile.FastGFile(image_path, 'rb').read() # Loads label file, strips off carriage return label_lines = [line.rstrip() for line in tf.gfile.GFile("retrained_labels.txt")] # Unpersists graph from file with tf.gfile.FastGFile("retrained_graph.pb", 'rb') as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) tf.import_graph_def(graph_def, name='') with tf.Session() as sess: # Feed the image_data as input to the graph and get first prediction softmax_tensor = sess.graph.get_tensor_by_name('final_result:0') predictions = sess.run(softmax_tensor, \ {'DecodeJpeg/contents:0': image_data}) # Sort to show labels of first highest prediction in order of confidence top_k = predictions[0].argsort()[-len(predictions):][::-1] for node_id in top_k: human_string = label_lines[node_id] score = predictions[0][node_id] worksheet.write_string(row, 1, image_path) worksheet.write(row, 2, human_string) worksheet.write(row, 3, score) print(row) print(node_id) print(image_path) print('%s (score = %.5f)' % (human_string, score)) row +=1 workbook.close() with concurrent.futures.ThreadPoolExecutor(max_workers=5) as e: for i in range(10): e.submit(run_inference, i)

这里是Excel表格中的数据

在这里输入图像说明

我build议使用GNU并行。创build一个文本文件，每行是你需要运行的命令，例如

 python mycode.py someargs python mycode.py someotherargs ...

然后简单地运行

 parallel commands.txt -j 8

它会并行处理整个命令列表中的8个（或多个你select的）脚本实例。

GNU并行不能使串行程序运行得更快，或者将串行程序改为并行程序。

GNU Parallel 能做什么，是用不同的参数并行运行一个串行程序。但是为了这个工作，你需要让你的串行程序能够并行运行，并能够分解工作。

所以你需要让你的串行程序能够解决这个问题的一部分。这可能意味着您最终需要将所有部分解决scheme收集到一个完整的解决scheme中。

这种技术今天被称为Map-Reduce。 GNU并行执行映射阶段。

在你的情况下，确定哪一部分是慢的是一个好主意，并且看看你可以如何将这个部分改变成可以作为部分解决scheme运行的东西。

让我们假设这是获取缓慢的URL。然后你创build一个程序，获取URL号码我可以给我的命令行：

 seq 10000 | parallel -j30 python get_url_number.py {}

这里我们并行运行30个工作。这通常不会使networking服务器崩溃，并可能能够填充您的带宽。

如何批量运行python脚本？

我怎样才能保存的格式，而从excel数据导出到Evernote

在excel VBA中比较两个不同列的值与唯一条件

Excel工作表的单元格types显示为实际上是文本types的数字

在stringvba excel中匹配date模式

searchstring并获取行和列的值

对于如何在第二列中的两个值之间合并单元格的更好的解决scheme

从MySQL到DataNucleus的XLS（java）

ModifyAppliesToRange不起作用

公式中的Excel循环引用

根据相对单元格引用更改单元格值