附加数据框,以大pandas
我希望将数据框追加到excel中
这个代码几乎像愿望一样工作。 虽然每次都不附加。 我运行它,它把数据框架在Excel中。 但是每次运行它都不会追加。 我也听说openpyxl是cpu密集型的,但没有听说过很多解决方法 。
import pandas from openpyxl import load_workbook book = load_workbook('C:\\OCC.xlsx') writer = pandas.ExcelWriter('C:\\OCC.xlsx', engine='openpyxl') writer.book = book writer.sheets = dict((ws.title, ws) for ws in book.worksheets) df1.to_excel(writer, index = False) writer.save()
我希望每次运行数据时都要附加数据,这不会发生。
数据输出看起来像原始数据:
ABC HHH
我想在第二次运行后
ABC HHH HHH
道歉,如果这是显而易见的,我新来的python和我练习的例子没有按要求工作。
问题是 – 每次运行我怎样才能追加数据。 我尝试更改为xlsxwriter但获取AttributeError: 'Workbook' object has no attribute 'add_format'
首先,这篇文章是解决scheme的第一部分,您应该在其中指定startrow=
: 使用python pandas将新的数据 startrow=
附加到现有的Excel表单
你也可以考虑header=False
。 所以它应该看起来像:
df1.to_excel(writer, startrow = 2,index = False, Header = False)
如果你想让它自动到达工作表的末尾并追加你的df,那么使用:
startrow = writer.sheets['Sheet1'].max_row
如果您希望它覆盖工作簿中的所有工作表:
for sheetname in writer.sheets: df1.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False)
顺便说一句:对于writer.sheets
你可以使用字典理解(我认为它更干净,但是这取决于你,它产生相同的输出):
writer.sheets = {ws.title: ws for ws in book.worksheets}
所以完整的代码将是:
import pandas from openpyxl import load_workbook book = load_workbook('test.xlsx') writer = pandas.ExcelWriter('test.xlsx', engine='openpyxl') writer.book = book writer.sheets = {ws.title: ws for ws in book.worksheets} for sheetname in writer.sheets: df1.to_excel(writer,sheet_name=sheetname, startrow=writer.sheets[sheetname].max_row, index = False,header= False) writer.save()
这是一个辅助函数:
def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None, **to_excel_kwargs): """ Append a DataFrame [df] to existing Excel file [filename] into [sheet_name] Sheet. If [filename] doesn't exist, then this function will create it. Parameters: filename : File path or existing ExcelWriter (Example: '/path/to/file.xlsx') df : dataframe to save to workbook sheet_name : Name of sheet which will contain DataFrame. (default: 'Sheet1') startrow : upper left cell row to dump data frame. Per default (startrow=None) calculate the last row in the existing DF and write to the next row... to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel` [can be dictionary] Returns: None """ from openpyxl import load_workbook # ignore [engine] parameter if it was passed if 'engine' in to_excel_kwargs: to_excel_kwargs.pop('engine') # create a writer for this month and year writer = pd.ExcelWriter(filename, engine='openpyxl') try: # try to open an existing workbook writer.book = load_workbook(filename) # get the last row in the existing Excel sheet # if it was not specified explicitly if not startrow and sheet_name in writer.book.get_sheet_names(): startrow = writer.book.get_sheet_by_name(sheet_name).max_row # copy existing sheets writer.sheets = dict( (ws.title, ws) for ws in writer.book.worksheets) except FileNotFoundError: # file does not exist yet, we will create it pass if not startrow: startrow = 0 # write out the new sheet df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs) # save the workbook writer.save()
用法示例:
filename = r'C:\OCC.xlsx' append_df_to_excel(filename, df) append_df_to_excel(filename, df, header=None, index=False) append_df_to_excel(filename, df, sheet_name='Sheet2', index=False) append_df_to_excel(filename, df, sheet_name='Sheet2', index=False, startrow=25)