如何将行转换为重复的基于列的数据?

我试图采取这样的数据集:

数据的源格式

并将logging转换为以下格式:

目的地格式

结果格式将有两列,一列用于旧列名称,一列用于值。 如果有10,000行,则新格式应该有10,000组数据。

我打开所有不同的方法,Excel公式,SQL(MySQL),或直接的Ruby代码也适用于我。 解决这个问题的最好方法是什么?

只是为了好玩:

# Input file format is tab separated values # name search_term address code # Jim jim jim_address 123 # Bob bob bob_address 124 # Lisa lisa lisa_address 126 # Mona mona mona_address 129 infile = File.open("inputfile.tsv") headers = infile.readline.strip.split("\t") puts headers.inspect of = File.new("outputfile.tsv","w") infile.each_line do |line| row = line.split("\t") headers.each_with_index do |key, index| of.puts "#{key}\t#{row[index]}" end end of.close # A nicer way, on my machine it does 1.6M rows in about 17 sec File.open("inputfile.tsv") do | in_file | headers = in_file.readline.strip.split("\t") File.open("outputfile.tsv","w") do | out_file | in_file.each_line do | line | row = line.split("\t") headers.each_with_index do | key, index | out_file << key << "\t" << row[index] end end end end 

您可以在数据的左侧添加一个ID列,并使用反向数据透视表方法。

  • 按Alt + D + P访问数据透视向导 ,步骤如下:

     1. Multiple Consolidation Ranges 2a. I will create the page fields 2b. Range: eg. sheet1!A1:A4 How Many Page Fields: 0 3. Existing Worksheet: H1 
  • 在数据透视表中:

     Uncheck Row and Column from the Field List Double-Click the Grand Total as shown 

在这里输入图像说明

 destination = File.open(dir, 'a') do |d| #choose the destination file and open it source = File.open(dir , 'r+') do |s| #choose the source file and open it headers = s.readline.strip.split("\t") #grab the first row of the source file to use as headers s.each do |line| #interate over each line from the source currentLine = line.strip.split("\t") #create an array from the current line count = 0 #track the count of each array index currentLine.each do |c| #iterate over each cell of the currentline finalNewLine = '"' + "#{headers[count]}" + '"' + "\t" + '"' + "#{currentLine[count]}" + '"' + "\n" #build each new line as one big string d.write(finalNewLine) #write final line to the destination file. count += 1 #increment the count to work on the next cell in the line end end end end