在pandas数据框中设置索引时发生KeyError
当我试图设置我的数据框的索引时,我得到一个keyerror。 我以前没有遇到过这种情况,用同样的方法设置索引,我想知道发生了什么问题? 数据没有列标题,因此DataFrame标题是0,1,2,4,5等。错误发生在任何列标题。
我尝试使用第一列(我想用作唯一的索引)时收到KeyError:“0”。
对于上下文:在下面的示例中,我select了启用macros的excel电子表格,挤压数据,读取数据并将其转换为DataFrame。
然后,我想包括在列中的文件名,设置索引和删除空白,以便我可以使用索引标签来提取我需要的数据。 并不是每个工作表都有索引标签,所以我有尝试,除了跳过索引中不包含这些标签的工作表。 然后,我想将每个结果连接成一个DataFrame并挤压未使用的列。
import itertools import glob from openpyxl import load_workbook from pandas import DataFrame import pandas as pd import os def get_data(ws): for row in ws.values: row_it = iter(row) for cell in row_it: if cell is not None: yield itertools.chain((cell,), row_it) break def read_workbook(file_): wb = load_workbook(file_, data_only=True) for sheet in wb.worksheets: ws = sheet return DataFrame(get_data(ws)) path =r'dir' allFiles = glob.glob(path + "/*.xlsm") frame = pd.DataFrame() list_ = [] for file_ in allFiles: parsed_file = read_workbook(file_) parsed_file['filename'] = os.path.basename(file_) parsed_file.set_index(['0'], inplace = True) parsed_file.index.str.strip() try: parsed_file.loc["Staff" : "Total"].copy() list_.append(parsed_file) except KeyError: pass frame = pd.concat(list_) print(frame.dropna(axis='columns', thresh=2, inplace = True))
示例数据框,需要的索引位置和要提取的标签。
index 0 1 2 0 5 2 4 1 RTJHD 5 9 2 ABCD 4 6 3 Staff 9 3 --- extract from here 4 FHDHSK 3 2 5 IRRJWK 7 1 6 FJDDCN 1 8 7 67 4 7 8 Total 5 3 --- to here
错误
Traceback (most recent call last): File "<ipython-input-29-d8fd24ca84ec>", line 1, in <module> runfile('dir.py', wdir='C:/dir/Documents') File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile execfile(filename, namespace) File "C:\ProgramData\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile exec(compile(scripttext, filename, 'exec'), glob, loc) File "dir.py", line 36, in <module> parsed_file.set_index(['0'], inplace = True) File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 2830, in set_index level = frame[col]._values File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__ return self._getitem_column(key) File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column return self._get_item_cache(key) File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache values = self._data.get(item) File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\internals.py", line 3590, in get loc = self.items.get_loc(item) File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280) File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126) File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523) File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477) KeyError: '0'
您收到此错误,因为您的数据框没有任何标题读入。 这意味着你的头是Int64Index
types的:
Int64Index([0, 1, 2, 3, ...], dtype='int64')
在这一点上,我build议只要索引访问df.columns
,无论你被迫处理它们:
parsed_file.set_index(parsed_file.columns[0], inplace = True)
如果您正在通过索引访问,请不要硬编码您的列名。 另一种方法是分配一些你自己的列名,然后引用它们。