合并文件擅长覆盖Python中的第一列使用pandas
我有很多文件excel,我想使用下面的代码附加多个excel文件:
import pandas as pd import glob import os import openpyxl df = [] for f in glob.glob("*.xlsx"): data = pd.read_excel(f, 'Sheet1') data.index = [os.path.basename(f)] * len(data) df.append(data) df = pd.concat(df) writer = pd.ExcelWriter('output.xlsx') df.to_excel(writer,'Sheet1') writer.save()
Excel文件有这样的结构:
输出如下:
为什么python在连接excel文件时改变第一列?
我认为你需要:
df = [] for f in glob.glob("*.xlsx"): data = pd.read_excel(f, 'Sheet1') name = os.path.basename(f) #create Multiindex for not overwrite original index data.index = pd.MultiIndex.from_product([[name], data.index], names=('files','orig')) df.append(data) #reset index for columns from MultiIndex df = pd.concat(df).reset_index()
另一个解决scheme是在concat
使用参数keys
:
files = glob.glob("*.xlsx") names = [os.path.basename(f) for f in files] dfs = [pd.read_excel(f, 'Sheet1') for f in files] df = pd.concat(dfs, keys=names).rename_axis(('files','orig')).reset_index()
什么是一样的:
df = [] names = [] for f in glob.glob(".xlsx"): df.append(pd.read_excel(f, 'Sheet1')) names.append(os.path.basename(f)) df = pd.concat(df, keys=names).rename_axis(('files','orig')).reset_index()
最后写入excel没有索引和列名称:
writer = pd.ExcelWriter('output.xlsx') df.to_excel(writer,'Sheet1', index=False, header=False) writer.save()