在Python中使用openpyxl将行插入到Excel电子表格中

我正在寻找使用openpyxl将行插入电子表格的最佳方法。

实际上,我有一个电子表格(Excel 2007),它有一个标题行,后面是(最多)几千行数据。 我正在寻找作为实际数据的第一行插入行,所以在标题之后。 我的理解是append函数适合将内容添加到文件的末尾

阅读openpyxl和xlrd(和xlwt)的文档,除了手动循环内容并插入新表格(插入所需的行之后)之外,我找不到任何明确的方法来完成此操作。

鉴于我迄今为止在Python方面的经验有限,我正试图理解这是否是最好的select(最pythonic!),如果有的话可以提供一个明确的例子。 特别是我可以用openpyxl读写行,还是必须访问单元格? 另外我可以(通过)写同一个文件(名称)?

==根据此处的反馈,更新为function完整的版本:groups.google.com/forum/#!topic/openpyxl-users/wHGecdQg3Iw。 ==

正如其他人指出的, openpyxl不提供此function,但我已经扩展了Worksheet类如下实现插入行。 希望这certificate对他人有用。

 def insert_rows(self, row_idx, cnt, above=False, copy_style=True, fill_formulae=True): """Inserts new (empty) rows into worksheet at specified row index. :param row_idx: Row index specifying where to insert new rows. :param cnt: Number of rows to insert. :param above: Set True to insert rows above specified row index. :param copy_style: Set True if new rows should copy style of immediately above row. :param fill_formulae: Set True if new rows should take on formula from immediately above row, filled with references new to rows. Usage: * insert_rows(2, 10, above=True, copy_style=False) """ CELL_RE = re.compile("(?P<col>\$?[AZ]+)(?P<row>\$?\d+)") row_idx = row_idx - 1 if above else row_idx def replace(m): row = m.group('row') prefix = "$" if row.find("$") != -1 else "" row = int(row.replace("$","")) row += cnt if row > row_idx else 0 return m.group('col') + prefix + str(row) # First, we shift all cells down cnt rows... old_cells = set() old_fas = set() new_cells = dict() new_fas = dict() for c in self._cells.values(): old_coor = c.coordinate # Shift all references to anything below row_idx if c.data_type == Cell.TYPE_FORMULA: c.value = CELL_RE.sub( replace, c.value ) # Here, we need to properly update the formula references to reflect new row indices if old_coor in self.formula_attributes and 'ref' in self.formula_attributes[old_coor]: self.formula_attributes[old_coor]['ref'] = CELL_RE.sub( replace, self.formula_attributes[old_coor]['ref'] ) # Do the magic to set up our actual shift if c.row > row_idx: old_coor = c.coordinate old_cells.add((c.row,c.col_idx)) c.row += cnt new_cells[(c.row,c.col_idx)] = c if old_coor in self.formula_attributes: old_fas.add(old_coor) fa = self.formula_attributes[old_coor].copy() new_fas[c.coordinate] = fa for coor in old_cells: del self._cells[coor] self._cells.update(new_cells) for fa in old_fas: del self.formula_attributes[fa] self.formula_attributes.update(new_fas) # Next, we need to shift all the Row Dimensions below our new rows down by cnt... for row in range(len(self.row_dimensions)-1+cnt,row_idx+cnt,-1): new_rd = copy.copy(self.row_dimensions[row-cnt]) new_rd.index = row self.row_dimensions[row] = new_rd del self.row_dimensions[row-cnt] # Now, create our new rows, with all the pretty cells row_idx += 1 for row in range(row_idx,row_idx+cnt): # Create a Row Dimension for our new row new_rd = copy.copy(self.row_dimensions[row-1]) new_rd.index = row self.row_dimensions[row] = new_rd for col in range(1,self.max_column): col = get_column_letter(col) cell = self.cell('%s%d'%(col,row)) cell.value = None source = self.cell('%s%d'%(col,row-1)) if copy_style: cell.number_format = source.number_format cell.font = source.font.copy() cell.alignment = source.alignment.copy() cell.border = source.border.copy() cell.fill = source.fill.copy() if fill_formulae and source.data_type == Cell.TYPE_FORMULA: s_coor = source.coordinate if s_coor in self.formula_attributes and 'ref' not in self.formula_attributes[s_coor]: fa = self.formula_attributes[s_coor].copy() self.formula_attributes[cell.coordinate] = fa # print("Copying formula from cell %s%d to %s%d"%(col,row-1,col,row)) cell.value = re.sub( "(\$?[AZ]{1,3}\$?)%d"%(row - 1), lambda m: m.group(1) + str(row), source.value ) cell.data_type = Cell.TYPE_FORMULA # Check for Merged Cell Ranges that need to be expanded to contain new cells for cr_idx, cr in enumerate(self.merged_cell_ranges): self.merged_cell_ranges[cr_idx] = CELL_RE.sub( replace, cr ) Worksheet.insert_rows = insert_rows 

用我现在使用的代码来回答这个问题,以达到预期的效果。 请注意,我手动插入位置1的行,但应该很容易调整以满足特定的需求。 你也可以很容易地调整这个插入多行,只是填充从相关位置开始的其余数据。

另外请注意,由于下游依赖关系,我们手动指定来自“工作表1”的数据,并将数据复制到插入到工作簿开头的新工作表中,同时将原始工作表重命名为“Sheet1.5” 。

编辑:我也添加(稍后)对new_cell.style.number_format.format_code = 'mm/dd/yyyy'的更改,以解决在这里默认的复制操作删除所有格式的问题: new_cell.style.number_format.format_code = 'mm/dd/yyyy' 。 我找不到任何可以设置的文档,更多的是反复试验的情况!

最后,不要忘记这个例子是保存在原来的。 您可以在适用的情况下更改保存path以避免这种情况。

  import openpyxl wb = openpyxl.load_workbook(file) old_sheet = wb.get_sheet_by_name('Sheet1') old_sheet.title = 'Sheet1.5' max_row = old_sheet.get_highest_row() max_col = old_sheet.get_highest_column() wb.create_sheet(0, 'Sheet1') new_sheet = wb.get_sheet_by_name('Sheet1') # Do the header. for col_num in range(0, max_col): new_sheet.cell(row=0, column=col_num).value = old_sheet.cell(row=0, column=col_num).value # The row to be inserted. We're manually populating each cell. new_sheet.cell(row=1, column=0).value = 'DUMMY' new_sheet.cell(row=1, column=1).value = 'DUMMY' # Now do the rest of it. Note the row offset. for row_num in range(1, max_row): for col_num in range (0, max_col): new_sheet.cell(row = (row_num + 1), column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value wb.save(file) 

当涉及到行或列级操作时,Openpyxl工作表的function有限。 Worksheet与行/列相关的唯一属性是属性row_dimensionscolumn_dimensions ,它们分别为每个行和列存储“RowDimensions”和“ColumnDimensions”对象。 这些字典也用于像get_highest_row()get_highest_column()这样的函数。

其他所有操作都在单元级别上进行,Cell对象在字典中被跟踪, _cells (以及它们在字典_styles跟踪的_styles )。 大多数看起来像在行或列级别上执行任何操作的函数实际上都是在一系列单元格(如前面提到的append() )上进行操作的。

最简单的做法是build议:创build一个新工作表,追加标题行,追加新数据行,附加旧数据行,删除旧工作表,然后将新工作表重新命名为旧工作表。 这种方法可能会出现的问题是行/列维度属性和单元格样式的丢失,除非您专门复制它们。

或者,您可以创build自己的插入行或列的函数。

我有很多非常简单的工作表,我需要从中删除列。 既然你要求明确的例子,我会提供快速投入的function:

 from openpyxl.cell import get_column_letter def ws_delete_column(sheet, del_column): for row_num in range(1, sheet.get_highest_row()+1): for col_num in range(del_column, sheet.get_highest_column()+1): coordinate = '%s%s' % (get_column_letter(col_num), row_num) adj_coordinate = '%s%s' % (get_column_letter(col_num + 1), row_num) # Handle Styles. # This is important to do if you have any differing # 'types' of data being stored, as you may otherwise get # an output Worksheet that's got improperly formatted cells. # Or worse, an error gets thrown because you tried to copy # a string value into a cell that's styled as a date. if adj_coordinate in sheet._styles: sheet._styles[coordinate] = sheet._styles[adj_coordinate] sheet._styles.pop(adj_coordinate, None) else: sheet._styles.pop(coordinate, None) if adj_coordinate in sheet._cells: sheet._cells[coordinate] = sheet._cells[adj_coordinate] sheet._cells[coordinate].column = get_column_letter(col_num) sheet._cells[coordinate].row = row_num sheet._cells[coordinate].coordinate = coordinate sheet._cells.pop(adj_coordinate, None) else: sheet._cells.pop(coordinate, None) # sheet.garbage_collect() 

我把它传递给我正在使用的工作表,而我想删除的列号,然后就离开了。 我知道这不完全是你想要的,但我希望这个信息帮助!

编辑:注意到有人给了这个另一票,并认为我应该更新它。 Openpyxl中的坐标系统在过去几年中经历了一些变化,为_cell项目引入了coordinate属性。 这也需要进行编辑,或者将行保留为空(而不是删除),Excel将抛出关于文件问题的错误。 这适用于Openpyxl 2.2.3(未经testing与更高版本)

我拿达拉斯解决scheme,并添加了对合并单元格的支持:

  def insert_rows(self, row_idx, cnt, above=False, copy_style=True, fill_formulae=True): skip_list = [] try: idx = row_idx - 1 if above else row_idx for (new, old) in zip(range(self.max_row+cnt,idx+cnt,-1),range(self.max_row,idx,-1)): for c_idx in range(1,self.max_column): col = self.cell(row=1, column=c_idx).column #get_column_letter(c_idx) print("Copying %s%d to %s%d."%(col,old,col,new)) source = self["%s%d"%(col,old)] target = self["%s%d"%(col,new)] if source.coordinate in skip_list: continue if source.coordinate in self.merged_cells: # This is a merged cell for _range in self.merged_cell_ranges: merged_cells_list = [x for x in cells_from_range(_range)][0] if source.coordinate in merged_cells_list: skip_list = merged_cells_list self.unmerge_cells(_range) new_range = re.sub(str(old),str(new),_range) self.merge_cells(new_range) break if source.data_type == Cell.TYPE_FORMULA: target.value = re.sub( "(\$?[AZ]{1,3})%d"%(old), lambda m: m.group(1) + str(new), source.value ) else: target.value = source.value target.number_format = source.number_format target.font = source.font.copy() target.alignment = source.alignment.copy() target.border = source.border.copy() target.fill = source.fill.copy() idx = idx + 1 for row in range(idx,idx+cnt): for c_idx in range(1,self.max_column): col = self.cell(row=1, column=c_idx).column #get_column_letter(c_idx) #print("Clearing value in cell %s%d"%(col,row)) cell = self["%s%d"%(col,row)] cell.value = None source = self["%s%d"%(col,row-1)] if copy_style: cell.number_format = source.number_format cell.font = source.font.copy() cell.alignment = source.alignment.copy() cell.border = source.border.copy() cell.fill = source.fill.copy() if fill_formulae and source.data_type == Cell.TYPE_FORMULA: #print("Copying formula from cell %s%d to %s%d"%(col,row-1,col,row)) cell.value = re.sub( "(\$?[AZ]{1,3})%d"%(row - 1), lambda m: m.group(1) + str(row), source.value ) 

不幸的是,在读取文件时没有更好的方法,使用像xlwt这样的库来写出一个新的excel文件(新行插入在顶部)。 Excel不能像你可以读取和追加的数据库一样工作。 你不幸的是只需要读取信息并在内存中操作,并写出本质上是一个新的文件。

编辑尼克的解决scheme,这个版本需要一个起始行,插入的行数和一个文件名,并插入必要数量的空行。

 #! python 3 import openpyxl, sys my_start = int(sys.argv[1]) my_rows = int(sys.argv[2]) str_wb = str(sys.argv[3]) wb = openpyxl.load_workbook(str_wb) old_sheet = wb.get_sheet_by_name('Sheet') mcol = old_sheet.max_column mrow = old_sheet.max_row old_sheet.title = 'Sheet1.5' wb.create_sheet(index=0, title='Sheet') new_sheet = wb.get_sheet_by_name('Sheet') for row_num in range(1, my_start): for col_num in range(1, mcol + 1): new_sheet.cell(row = row_num, column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value for row_num in range(my_start + my_rows, mrow + my_rows): for col_num in range(1, mcol + 1): new_sheet.cell(row = (row_num + my_rows), column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value wb.save(str_wb)