如何将xls转换为xlsx

我有一些* .xls(excel 2003)文件,我想将这些文件转换为xlsx(excel 2007)。

我使用uno python包,当我保存文件时,我可以设置filter名称:MS Excel 97但是没有像“MS Excel 2007”这样的filter名称,

请帮助我,如何设置filter的名称来将xls转换为xlsx?

我以前不得不这样做。 主要思想是使用xlrd模块打开和parsingxls文件,并使用openpyxl模块将内容写入xlsx文件。

这是我的代码。 注意! 它不能处理复杂的xls文件,如果你要使用它,你应该添加你自己的parsing逻辑。

import xlrd from openpyxl.workbook import Workbook from openpyxl.reader.excel import load_workbook, InvalidFileException def open_xls_as_xlsx(filename): # first open using xlrd book = xlrd.open_workbook(filename) index = 0 nrows, ncols = 0, 0 while nrows * ncols == 0: sheet = book.sheet_by_index(index) nrows = sheet.nrows ncols = sheet.ncols index += 1 # prepare a xlsx sheet book1 = Workbook() sheet1 = book1.get_active_sheet() for row in xrange(0, nrows): for col in xrange(0, ncols): sheet1.cell(row=row, column=col).value = sheet.cell_value(row, col) return book1 

你需要在你的机器上安装win32com。 这是我的代码:

 import win32com.client as win32 fname = "full+path+to+xls_file" excel = win32.gencache.EnsureDispatch('Excel.Application') wb = excel.Workbooks.Open(fname) wb.SaveAs(fname+"x", FileFormat = 51) #FileFormat = 51 is for .xlsx extension wb.Close() #FileFormat = 56 is for .xls extension excel.Application.Quit() 

Ray的答案帮了我很多,但对于那些search一个简单的方法来将所有表单从xls转换为xlsx的人,我做了这个Gist :

 import xlrd from openpyxl.workbook import Workbook as openpyxlWorkbook # content is a string containing the file. For example the result of an http.request(url). # You can also use a filepath by calling "xlrd.open_workbook(filepath)". xlsBook = xlrd.open_workbook(file_contents=content) workbook = openpyxlWorkbook() for i in xrange(0, xlsBook.nsheets): xlsSheet = xlsBook.sheet_by_index(i) sheet = workbook.active if i == 0 else workbook.create_sheet() sheet.title = xlsSheet.name for row in xrange(0, xlsSheet.nrows): for col in xrange(0, xlsSheet.ncols): sheet.cell(row=row, column=col).value = xlsSheet.cell_value(row, col) # The new xlsx file is in "workbook", without iterators (iter_rows). # For iteration, use "for row in worksheet.rows:". # For range iteration, use "for row in worksheet.range("{}:{}".format(startCell, endCell)):". 

你可以在这里findxlrd lib和openpyxl(例如你必须在你的项目中下载xlrd)。

这里是我的解决scheme,不考虑字体,图表和图像:

 $ pip install pyexcel pyexcel-xls pyexcel-xlsx 

那就这样做::

 import pyexcel as p p.save_book_as(file_name='your-file-in.xls', dest_file_name='your-new-file-out.xlsx') 

如果你不需要一个程序,你可以安装一个additinal软件包pyexcel-cli ::

 $ pip install pyexcel-cli $ pyexcel transcode your-file-in.xls your-new-file-out.xlsx 

上面的代码转换程序使用xlrd和openpyxl。

我在这里没有find答案100%的权利。 所以我在这里发布我的代码:

 import xlrd from openpyxl.workbook import Workbook def cvt_xls_to_xlsx(src_file_path, dst_file_path): book_xls = xlrd.open_workbook(src_file_path) book_xlsx = Workbook() sheet_names = book_xls.sheet_names() for sheet_index in range(0,len(sheet_names)): sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index]) if sheet_index == 0: sheet_xlsx = book_xlsx.active() sheet_xlsx.title = sheet_names[sheet_index] else: sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index]) for row in range(0, sheet_xls.nrows): for col in range(0, sheet_xls.ncols): sheet_xlsx.cell(row = row+1 , column = col+1).value = sheet_xls.cell_value(row, col) book_xlsx.save(dst_file_path) 

我正在改进 @Jackypengyu方法的性能

合并的单元格也将被转换。

结果

按相同的顺序转换相同的12个文件:

原文

 0:00:01.958159 0:00:02.115891 0:00:02.018643 0:00:02.057803 0:00:01.267079 0:00:01.308073 0:00:01.245989 0:00:01.289295 0:00:01.273805 0:00:01.276003 0:00:01.293834 0:00:01.261401 

改进

 0:00:00.774101 0:00:00.734749 0:00:00.741434 0:00:00.744491 0:00:00.320796 0:00:00.279045 0:00:00.315829 0:00:00.280769 0:00:00.316380 0:00:00.289196 0:00:00.347819 0:00:00.284242 

 def cvt_xls_to_xlsx(*args, **kw): """Open and convert XLS file to openpyxl.workbook.Workbook object @param args: args for xlrd.open_workbook @param kw: kwargs for xlrd.open_workbook @return: openpyxl.workbook.Workbook """ book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw) book_xlsx = openpyxl.workbook.Workbook() sheet_names = book_xls.sheet_names() for sheet_index in range(len(sheet_names)): sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index]) if sheet_index == 0: sheet_xlsx = book_xlsx.active sheet_xlsx.title = sheet_names[sheet_index] else: sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index]) for crange in sheet_xls.merged_cells: rlo, rhi, clo, chi = crange sheet_xlsx.merge_cells( start_row=rlo + 1, end_row=rhi, start_column=clo + 1, end_column=chi, ) def _get_xlrd_cell_value(cell): value = cell.value if cell.ctype == xlrd.XL_CELL_DATE: value = datetime.datetime(*xlrd.xldate_as_tuple(value, 0)) return value for row in range(sheet_xls.nrows): sheet_xlsx.append(( _get_xlrd_cell_value(cell) for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row)) )) return book_xlsx 

Ray的回答是裁剪数据的第一行和最后一列。 这是我修改后的解决scheme(用于python3):

 def open_xls_as_xlsx(filename): # first open using xlrd book = xlrd.open_workbook(filename) index = 0 nrows, ncols = 0, 0 while nrows * ncols == 0: sheet = book.sheet_by_index(index) nrows = sheet.nrows+1 #bm added +1 ncols = sheet.ncols+1 #bm added +1 index += 1 # prepare a xlsx sheet book1 = Workbook() sheet1 = book1.get_active_sheet() for row in range(1, nrows): for col in range(1, ncols): sheet1.cell(row=row, column=col).value = sheet.cell_value(row-1, col-1) #bm added -1's return book1 

我尝试了@Jon Anderson的解决scheme,运行良好,但是有时间格式的单元格,例如HH:mm:ss没有date时出现“年份超出范围”错误。 在那里我再次改进了algorithm:

 def xls_to_xlsx(*args, **kw): """ open and convert an XLS file to openpyxl.workbook.Workbook ---------- @param args: args for xlrd.open_workbook @param kw: kwargs for xlrd.open_workbook @return: openpyxl.workbook.Workbook对象""" book_xls = xlrd.open_workbook(*args, formatting_info=True, ragged_rows=True, **kw) book_xlsx = openpyxl.workbook.Workbook() sheet_names = book_xls.sheet_names() for sheet_index in range(len(sheet_names)): sheet_xls = book_xls.sheet_by_name(sheet_names[sheet_index]) if sheet_index == 0: sheet_xlsx = book_xlsx.active sheet_xlsx.title = sheet_names[sheet_index] else: sheet_xlsx = book_xlsx.create_sheet(title=sheet_names[sheet_index]) for crange in sheet_xls.merged_cells: rlo, rhi, clo, chi = crange sheet_xlsx.merge_cells(start_row=rlo + 1, end_row=rhi, start_column=clo + 1, end_column=chi,) def _get_xlrd_cell_value(cell): value = cell.value if cell.ctype == xlrd.XL_CELL_DATE: datetime_tup = xlrd.xldate_as_tuple(value,0) if datetime_tup[0:3] == (0, 0, 0): # time format without date value = datetime.time(*datetime_tup[3:]) else: value = datetime.datetime(*datetime_tup) return value for row in range(sheet_xls.nrows): sheet_xlsx.append(( _get_xlrd_cell_value(cell) for cell in sheet_xls.row_slice(row, end_colx=sheet_xls.row_len(row)) )) return book_xlsx 

然后工作完美!

解决scheme简单

我需要一个简单的解决scheme将xlx转换为xlsx格式。 这里有很多答案,但他们正在做一些我不完全理解的“魔法”。

chfw提供了一个简单的解决scheme,但并不完整。

安装依赖关系

使用pip来安装

 pip install pyexcel-cli pyexcel-xls pyexcel-xlsx 

执行

所有的样式和macros将不见了,但信息是完整的。

对于单个文件

 pyexcel transcode your-file-in.xls your-new-file-out.xlsx 

对于文件夹中的所有文件,一个class轮

 for file in *.xls; do; echo "Transcoding $file"; pyexcel transcode "$file" "${file}x"; done;