使用Python xlsxwriter模块将srt数据写入excel

这次我试图用Python的xlsxwriter模块将.srt中的数据写入excel。

字幕文件在崇高的文本中看起来像这样：

但是我想把数据写入一个excel，所以看起来像这样：

这是我第一次为此编写python，所以我仍然处于试验和错误的阶段…我试图写下如下代码

但我不认为这是有道理的

我会继续尝试，但如果你知道如何做，请告诉我。我会读你的代码，并试图理解他们！谢谢！ 🙂

以下将问题分解成几个部分：

parsinginput文件。 parse_subtitles是一个生成器，它获取行的源，并以{'index':'N', 'timestamp':'NN:NN:NN,NNN -> NN:NN:NN,NNN', 'subtitle':'TEXT'}' 。我采取的方法是追踪我们所处的三个不同状态中的哪一个：
1. seeking to next entry ，当我们正在寻找下一个索引号，它应该匹配正则expression式^\d*$ （只是一堆数字）
2. 当find索引时looking for timestamp ，我们期望时间戳记到下一行，它应该匹配正则expression式^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$ （HH：MM：SS，mmm – > HH：MM：SS，mmm）和
3. reading subtitles同时消耗实际的字幕文本，用空行和EOF解释为字幕终止点。
将上述logging写入工作表中的一行。 write_dict_to_worksheet接受一行和一个工作表，外加一个logging和一个字典，为每个logging的键定义Excel 0索引列号，然后适当地写入数据。
组织整体转换convert接受一个input文件名（例如'Wildlife.srt' ，将打开并传递给parse_subtitles函数，并输出文件名（例如'Subtitle.xlsx' ，将使用xlsxwriter创build。头，并从input文件parsing每个logging，将该logging写入到XLSX文件。

为了自我评价的目的，日志logging留下来，因为当再现你的input文件时，我发一个: a ; 在一个时间戳，使其无法识别，并popup错误是方便debugging！

在这个Gist中，我已经把源文件的文本版本和下面的代码放在一起

 import xlsxwriter import re import logging def parse_subtitles(lines): line_index = re.compile('^\d*$') line_timestamp = re.compile('^\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}$') line_seperator = re.compile('^\s*$') current_record = {'index':None, 'timestamp':None, 'subtitles':[]} state = 'seeking to next entry' for line in lines: line = line.strip('\n') if state == 'seeking to next entry': if line_index.match(line): logging.debug('Found index: {i}'.format(i=line)) current_record['index'] = line state = 'looking for timestamp' else: logging.error('HUH: Expected to find an index, but instead found: [{d}]'.format(d=line)) elif state == 'looking for timestamp': if line_timestamp.match(line): logging.debug('Found timestamp: {t}'.format(t=line)) current_record['timestamp'] = line state = 'reading subtitles' else: logging.error('HUH: Expected to find a timestamp, but instead found: [{d}]'.format(d=line)) elif state == 'reading subtitles': if line_seperator.match(line): logging.info('Blank line reached, yielding record: {r}'.format(r=current_record)) yield current_record state = 'seeking to next entry' current_record = {'index':None, 'timestamp':None, 'subtitles':[]} else: logging.debug('Appending to subtitle: {s}'.format(s=line)) current_record['subtitles'].append(line) else: logging.error('HUH: Fell into an unknown state: `{s}`'.format(s=state)) if state == 'reading subtitles': # We must have finished the file without encountering a blank line. Dump the last record yield current_record def write_dict_to_worksheet(columns_for_keys, keyed_data, worksheet, row): """ Write a subtitle-record to a worksheet. Return the row number after those that were written (since this may write multiple rows) """ current_row = row #First, horizontally write the entry and timecode for (colname, colindex) in columns_for_keys.items(): if colname != 'subtitles': worksheet.write(current_row, colindex, keyed_data[colname]) #Next, vertically write the subtitle data subtitle_column = columns_for_keys['subtitles'] for morelines in keyed_data['subtitles']: worksheet.write(current_row, subtitle_column, morelines) current_row+=1 return current_row def convert(input_filename, output_filename): workbook = xlsxwriter.Workbook(output_filename) worksheet = workbook.add_worksheet('subtitles') columns = {'index':0, 'timestamp':1, 'subtitles':2} next_available_row = 0 records_processed = 0 headings = {'index':"Entries", 'timestamp':"Timecodes", 'subtitles':["Subtitles"]} next_available_row=write_dict_to_worksheet(columns, headings, worksheet, next_available_row) with open(input_filename) as textfile: for record in parse_subtitles(textfile): next_available_row = write_dict_to_worksheet(columns, record, worksheet, next_available_row) records_processed += 1 print('Done converting {inp} to {outp}. {n} subtitle entries found. {m} rows written'.format(inp=input_filename, outp=output_filename, n=records_processed, m=next_available_row)) workbook.close() convert(input_filename='Wildlife.srt', output_filename='Subtitle.xlsx')

编辑：更新以跨多行输出中分割多行字幕

使用Python xlsxwriter模块将srt数据写入excel

删除行VBAmacros需要大量的时间

用户自定义函数 – 此公式所在工作表的名称

在R中创buildxls文件，并在列名称的顶部添加一个标题

Excel – 如何拥有多个input设备（USB）

数据透视表filter在空时引发错误

将Excel图表轴比例链接到单元格中的值

用下标字符导入excel文件

如何在VBA中testingExcel工作簿中是否存在VBA？

Excel VBA无法使COM加载项处于非活动状态

Excel工作表中的列中的string模式的VBA代码匹配