通过数据types的多列对xls文件内容进行sorting

我必须按升序排列xls文件内容4列。

我把xls文件的内容转换成列表。 以下是input

input

data = """ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015""" 

各个string格式的输出

  data = """ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 """ 

首先我将数据拆分为列表格式:

  # Split data to list. >>> data_list = [i.split(", ") for i in data.split("\n")] >>> print "\n".join([", ".join(i) for i in data_list]) ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015 

以下是分拣要求

 - We have to sort by index0 , if index0 have same values for multiple items then sort by Index2 if index0 and index2 are same for multiple items then sort by Index3 if index0, index2 and index3 are same for multiple items then sort by Index5 

我的逻辑是

  1. 创build索引0,索引2,索引5和索引5的string
  2. 使用步骤1中的键创build词典
  3. 使用sorting函数对键列表进行sorting
  4. 再次创buildxls文件。

码:

 >>> from collections import defaultdict >>> data_dict = defaultdict(list) >>> for i in data_list: ... key = "%s%s%s%s"%(i[0].strip(), i[2].strip(), i[3].strip(), i[5].strip()) ... data_dict[key].append(i) ... >>> sorted_keys = sorted(data_dict.keys()) >>> >>> for i in sorted_keys: ... for j in data_dict[i]: ... print j ... ... ['ABC', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015'] ['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '30/12/2015'] ['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '31/12/2015'] ['ABC', 'Do not Consider1', '101', 'Title and Subtitle', 'Do not Consider2', '30/12/2015'] ['ABC', 'Do not Consider1', '98', 'Title and Subtitle', 'Do not Consider2', '25/12/2015 '] ['ABC', 'Do not Consider1', '99', 'BIC Codes', 'Do not Consider2', '31/12/2015'] ['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015'] ['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015'] 

但是索引2中有数字,即第二列和第五列中的date,所以没有得到sorting后的数据。

你能帮我解决这个问题吗?

您可以sorted如下方式使用sorted函数按多个键进行sorted : –

 sorted_list = sorted(data_list, key=lambda item: (item[0], int(item[2]), item[3])) print "\n".join([", ".join(i) for i in sorted_list]) 

回报

 ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 

诀窍是让你的key lambda返回一个包含所有值sorting的元组,并通过使用int()函数将第三列的值转换为整数。

你应该可以通过一个单独的sorted()调用来完成你所需要的。 csv模块可以用来parsing数据:

 import csv import StringIO from itertools import groupby data = """ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015 ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015 ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015""" csv_input = csv.reader(StringIO.StringIO(data), skipinitialspace=True) rows = sorted(list(csv_input), key=lambda x: (x[0], int(x[2]), x[3], x[5])) for row in rows: print row 

这会给你以下几点:

 ['ABC', 'Do not Consider1', '98', 'Title and Subtitle', 'Do not Consider2', '25/12/2015 '] ['ABC', 'Do not Consider1', '99', 'BIC Codes', 'Do not Consider2', '31/12/2015'] ['ABC', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015'] ['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '30/12/2015'] ['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '31/12/2015'] ['ABC', 'Do not Consider1', '101', 'Title and Subtitle', 'Do not Consider2', '30/12/2015'] ['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015'] ['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']