从两个文件中匹配多列中的元素，然后更新或合并它们

文件1：

77, 4, -3, A0080 235, 5, -1, K0511

文件2：

 A0132, 77, -1, -2, 19.776 00000, 77, 4, -3, 18.608, A0794, 235, -2, -2, 22.81 A0796, 235, -2, -5, 12.27 00000, 235, 5, -1, 18.992

所需的输出：

 A0132, 77, -1, -2, 19.776 A0080, 77, 4, -3, 18.608, A0794, 235, -2, -2, 22.81 A0796, 235, -2, -5, 12.27 K0511, 235, 5, -1, 18.992

基本上是将file1的column1，column2，column3与file2的column2，column3，column4匹配，如果匹配，则用file1的column4的值replacefile2的column1。

我用了：

 awk 'FNR==NR {a[$1,$2,$3]++;next} a[$2,$3,$4] {print $0}' file1 file2

得到输出

 00000, 77, 4, -3, 18.608, 00000, 235, 5, -1, 18.992

然后我卡住了。请帮忙。顺便说一句，这是2个文件，一般如何大约2个文件。

显然，尾随空格有一些问题。这会使事情变得复杂一些，因为你需要做一些技巧$field+=0来克服它（去掉尾随空格）。

你可以试试这个：

 awk -F"," -v OFS="," 'FNR==NR {$1+=0; $2+=0; $3+=0; a[$1,$2,$3]=$4;next} {$2+=0; $3+=0; $4+=0 if (($2,$3,$4) in a) {$1=a[$2,$3,$4]} print }' f1 f2

基本上，它将索引（第1，第2，第3）列中的值存储在第4列中。然后，在读取第二个文件时，它检查给定的索引是否匹配那里的第二，第三和第四列; 如果是这样，它将取代第一个字段。

对于你给定的input，它返回：

 $ awk -F"," -v OFS="," 'FNR==NR {$1+=0; $2+=0; $3+=0; a[$1,$2,$3]=$4;next} {$2+=0; $3+=0; $4+=0; if (($2,$3,$4) in a) {$1=a[$2,$3,$4]} print}' f1 f2 A0132,77,-1,-2, 19.776 A0080,77,4,-3, 18.608, A0794,235,-2,-2, 22.81 A0796,235,-2,-5, 12.27 K0511,235,5,-1, 18.992

awk 'FILENAME==ARGV[1]{max++;a1[FNR]=$1;a2[FNR]=$2;a3[FNR]=$3;a4[FNR]=$4;next} {done=0;for (i=0;i<$max;i++) {if ($2==a1[i] && $3==a2[i] && $4=a3[i]) {$1=""; print a4[i]","$0; done=1; break}}; if (done==0){ print}}' file1 file2

或者更容易阅读：

 awk 'FILENAME==ARGV[1]{ ## process file 1 max++; ## keep track of how many entries in file 1 a1[FNR]=$1; ## build separate arrays for each field we care about a2[FNR]=$2; a3[FNR]=$3; a4[FNR]=$4; next} ## go to next file {done=0; ## set a flag so we know when we have no match for (i=0;i<$max;i++) ## loop over all array entries in file 1 {if ($2==a1[i] && $3==a2[i] && $4=a3[i]) ## if columns match in our pairing {$1=""; ## get rid of column 1 print a4[i]","$0; ## print out file 1 column 4 & column 2 onward for file 2 done=1; ## set the flag so we know we had a match break}}; ## break for loop, no need to waste time processing more if (done==0) { ## if we did not match, print out the existing file 2 line print}}' file1 file2

如果你想扩展更多的文件，你可以添加更多的子句来设置文件名的ARGV（这当然会改变你想要的逻辑） – 同样，如果你希望自动化和灵活，你可以用shell循环并使用eval来执行它：

  awk 'FILENAME==ARGV[1]{a[FNR]=$0;a1[FNR]=$1;a2[FNR]=$2;a3[FNR]=$3;a4[FNR]=$4;next} FILENAME==ARGV[2]{b[FNR]=$0;b2[FNR]=$2;b3[FNR]=$3;b4[FNR]=$4;next} FILENAME==ARGV[3]{print "hi" a1[FNR] b2[FNR]}' file1 file2 file3

更新来处理评论中列出的数据结果：

awk 'FILENAME==ARGV[1]{max++;a[FNR]=$0;a1[FNR]=$1;a2[FNR]=$2;a3[FNR]=$3;a4[FNR]=$4;$1="";$2="";$3="";$4="";a[FNR]=$0;gsub(",+$","",a[FNR]);next} {done=0;for (i=0;i<$max;i++) {if ($2==a1[i] && $3==a2[i] && $4=a3[i]) {$1=""; gsub(",+$","",$0);gsub(" +","",a[i]);print " "a4[i]$0","a[i]; done=1; break}}; if (done==0){ print}}' file1 file2

所做的更改是从文件1添加结束字段并清理一些化妆品：

 awk 'FILENAME==ARGV[1] ## save $0 in new array {max++;a[FNR]=$0;a1[FNR]=$1;a2[FNR]=$2;a3[FNR]=$3;a4[FNR]=$4; ## skip the first fields of new array up to field 4 and rid the trailing comma $1="";$2="";$3="";$4="";a[FNR]=$0;gsub(",+$","",a[FNR]); next} {done=0;for (i=0;i<$max;i++) {if ($2==a1[i] && $3==a2[i] && $4=a3[i]) {$1=""; gsub(",+$","",$0);gsub(" +","",a[i]); ## rid unnecessary whitespace ## print the rest of file 1 line entry print " "a4[i]$0","a[i]; done=1; break}}; if (done==0){ print}}' file1 file2

这可能适用于你（GNU sed）：

 sed -r 's|^(.*,)\s*(.*)|s/^(.*,) \1/\2, \1/|' file1 | sed -rf - file2

从file1创build一个sed脚本以针对file2运行。

 cat file1 file2 \ | sed -n 'H;${x :cycle # \n #:11 # 77, 4, -3, A0080 #^2222222222 44444 #^33 # A0132, 77, -1, -2, 19.776 #^55555555555555555555555555 # 00000, 77, 4, -3, 18.608, #^ 2222222222 s/\(\n\)\(\([^,]*,\)\{3\}\) \([A-Z0-9]*\)\(.*\)00000, \2/\1\2 \4\5\4, \2/ t cycle :clean s/\(\n\)\([^,]*,\)\{3\} [A-Z0-9]*\1/\1/g t clean s/^\n// p }'

posix sed（so --posix on GNU sed）。用#^给出分组索引，因此2222222222是在模式中稍后使用的\2的内容

加载工作缓冲区中的所有行
find每个三元组[ ($[^,]*,$\{3\}\)在后面一行[the \2 ]中s///行后面:cycle 00000,作为前缀，replace为跟随三联体的“名字”[“4`]
如果发现/replace，重试（1改变下一个三元组，所以一个g将总是只有1变化最大）通过t cycle ，意思是如果s ///发生，转到标签周期，如果不是继续下一行
清理三元组（以三行代替任何一行，以新行开始的模式，只有新行的指纹为[ \1 ]，并删除第一行添加的第一行，第一行添加h当前行到缓冲区（所以第一行只是一个新行）
打印结果

从两个文件中匹配多列中的元素，然后更新或合并它们

AWK参数打印不需要的换行符

如何使用awk将文本文件中的数据写入excel文件？

填补空的领域

Excel公式/ AWK等效

将每个x（dynamic）数量的行移动到一行

使用Bash写入Excel表格

删除行，如果file1中的列属于其他文件中的两列中声明的范围内

如果文件（a）和（b）中的字段1匹配，则将文件（a）的字段2打印到文件（b）的字段9

grep信息并保存为可读取的Excel文件？

如何在cygwin下使用awk打印Excel电子表格中的字段？