分别为pandas中的列标题创build一个参数数组

我有一个在Excel中的表格和参数头的数字。 看起来像这样,我只需要使用从A到E的单元格(并忽略所有其他单元格)。 正如你所看到的,F在标题行中,但我需要select特定的单元格进行迭代(如上所述)。

ABCDEF 1 50 30 10 5 1 String 2 Oval, Round NaN Irregular Nan Nan String2 3 Circumscribed NaN Nan Nan Obscured, Microlobulated 4 High density NaN Equal Nan Fat-containing 

我需要分别创build两个数组到cols头。 例如,如果它是第二行,我需要有一个输出为两个数组:

 prob_arr = [50, 50, 10] val_arr = ['Oval', 'Round', 'Irregular'] 

而对于第三排应该是:

 prob_arr = [50, 1, 1] val_arr = ['Circumscribed', 'Obscured', 'Microlobulated'] 

现在我有这个function:

 def concatvals(row, col, width, start, stop): prob_head = list(df)[start:stop] for i in range(width): value_temp = df.iloc[row, col] if isinstance(value_temp, float) is False: value = [x.strip() for x in value_temp.split(',')] len_val = len(value) prob_arr = [prob_head[i] for _ in range(len_val)] val_arr = [value[x] for x in range(len_val)] col += 1 randparameter = random.choices(val_arr, prob_arr, k=1) return randparameter 

而且它不会正确创buildarrs。 有什么build议么?

 import pandas as pd def concatvals(df, row_idx, col_start_idx, col_end_idx): """ Input parameter `df` is table data as `pd.DataFrame`. Input parameter `row_idx` is index of requested dataframe row as `int`. Input parameter `col_start_idx` is index of first requested column as `int`. Input parameter `col_end_idx` is index of last requested column as `int`. """ # Initialize return variables as empty lists prob_arr = [] val_arr = [] # Extract slice from a single dataframe row as Series object row = df.iloc[row_idx, col_start_idx: col_end_idx + 1] # Iterate through all header-value pairs of the row Series for header, value in row.iteritems(): # If value is a string if isinstance(value, str): # Split string value upon commas subs = [x.strip() for x in value.split(',')] # Append current header to return list # (as many times as there are strings in `subs`) prob_arr += len(subs) * [header] # Append comma-delimited strings to return list val_arr += subs return prob_arr, val_arr if __name__ == '__main__': # Read excel worksheet into dataframe df = pd.read_excel('test.xlsx') # Convert first row (which has row index 0) prob_arr1, val_arr1 = concatvals(df, row_idx=0, col_start_idx=0, col_end_idx=4) print(prob_arr1) print(val_arr1) # Convert second row (which has row index 1) prob_arr2, val_arr2 = concatvals(df, row_idx=1, col_start_idx=0, col_end_idx=4) print(prob_arr2) print(val_arr2) 

给出输出:

 [50, 50, 10] ['Oval', 'Round', 'Irregular'] [50, 1, 1] ['Circumscribed', 'Obscured', 'Microlobulated']