然后编辑几个列的值连接成一个（pandas，python）

我正在寻找一种方法来使用pandas和Python，将已知列名的Excel工作表中的多个列组合成一个新的，单一的，保留所有重要的信息，如下面的例子：

input：

ID,tp_c,tp_b,tp_p 0,transportation - cars,transportation - boats,transportation - planes 1,checked,-,- 2,-,checked,- 3,checked,checked,- 4,-,checked,checked 5,checked,checked,checked

所需的输出：

 ID,tp_all 0,transportation 1,cars 2,boats 3,cars+boats 4,boats+planes 5,cars+boats+planes

ID为0的行包含列的内容的描述。理想情况下，代码将parsing第二行中的描述，看看“ – ”并在新的“tp_all”列中连接这些值。

OK一个更dynamic的方法：

 In [63]: # get a list of the columns col_list = list(df.columns) # remove 'ID' column col_list.remove('ID') # create a dict as a lookup col_dict = dict(zip(col_list, [df.iloc[0][col].split(' - ')[1] for col in col_list])) col_dict Out[63]: {'tp_b': 'boats', 'tp_c': 'cars', 'tp_p': 'planes'} In [64]: # define a func that tests the value and uses the dict to create our string def func(x): temp = '' for col in col_list: if x[col] == 'checked': if len(temp) == 0: temp = col_dict[col] else: temp = temp + '+' + col_dict[col] return temp df['combined'] = df[1:].apply(lambda row: func(row), axis=1) df Out[64]: ID tp_c tp_b tp_p \ 0 0 transportation - cars transportation - boats transportation - planes 1 1 checked NaN NaN 2 2 NaN checked NaN 3 3 checked checked NaN 4 4 NaN checked checked 5 5 checked checked checked combined 0 NaN 1 cars 2 boats 3 cars+boats 4 boats+planes 5 cars+boats+planes [6 rows x 5 columns] In [65]: df = df.ix[1:,['ID', 'combined']] df Out[65]: ID combined 1 1 cars 2 2 boats 3 3 cars+boats 4 4 boats+planes 5 5 cars+boats+planes [5 rows x 2 columns]

这是相当有趣的，因为它是一个反向get_dummies …

我想我会手动闯入列名，以便你有一个布尔值DataFrame：

 In [11]: df1 # df == 'checked' Out[11]: cars boats planes 0 1 True False False 2 False True False 3 True True False 4 False True True 5 True True True

现在，您可以使用zip加载：

 In [12]: df1.apply(lambda row: '+'.join([col for col, b in zip(df1.columns, row) if b]), axis=1) Out[12]: 0 1 cars 2 boats 3 cars+boats 4 boats+planes 5 cars+boats+planes dtype: object

现在你只需要调整标题，以获得所需的CSV。

如果有一个较less的手动方法/更快的做反向get_dummies …

这是一个方法：

 newCol = pandas.Series('',index=d.index) for col in d.ix[:, 1:]: name = '+' + col.split('-')[1].strip() newCol[d[col]=='checked'] += name newCol = newCol.str.strip('+')

然后：

 >>> newCol 0 cars 1 boats 2 cars+boats 3 boats+planes 4 cars+boats+planes dtype: object

您可以使用此列创build新的DataFrame，或者按照您喜欢的方式进行操作。

编辑：我看到你已经编辑了你的问题，以便运输方式的名称现在在行0而不是在列标题。如果他们在列标题（如我的答案假设），并且您的新列标题似乎不包含任何其他有用的信息，那么它应该更容易，所以你应该开始设置列名信息从行0，并删除第0行。

然后编辑几个列的值连接成一个（pandas，python）

嵌套在Excel中的列表

运行时错误“1004”对不起，我们找不到…文件path。可能移动，重命名或删除

VBA：如何确定一行中的最低值的列？

如何更改命名范围中的特定值

数组以发现的顺序返回信息

解除表单/工作簿Excel VBA

从outlook收到电子邮件date。单元格中信息格式和search的问题

无论如何select数据input的基础上，它是在Excel中input的date？

如何获得Active Workbook？

Countif不计算前几年

然后编辑几个列的值连接成一个（pandas，python）

嵌套在Excel中的列表

运行时错误“1004”对不起，我们找不到…文件path。 可能移动，重命名或删除

VBA：如何确定一行中的最低值的列？

如何更改命名范围中的特定值

数组以发现的顺序返回信息

解除表单/工作簿Excel VBA

从outlook收到电子邮件date。 单元格中信息格式和search的问题

无论如何select数据input的基础上，它是在Excel中input的date？

如何获得Active Workbook？

Countif不计算前几年

运行时错误“1004”对不起，我们找不到…文件path。可能移动，重命名或删除

从outlook收到电子邮件date。单元格中信息格式和search的问题