如何在cygwin下使用awk打印Excel电子表格中的字段?

我们似乎看到越来越多关于在Excel电子表格上执行awk的问题,所以这里是关于如何做这件事情的Q / A。

我有一个Excel电子表格"$D/staff.xlsx" (其中"$D"是我的桌面的path)的这些信息:

 Name Position Sue Manager Bill Secretary Pat Engineer 

我想打印一个给定名称的位置字段,例如输出Secretary给出inputBill

我现在可以从Excel保存为CSV来获得:

 $ cat "$D/staff.csv" Name,Position Sue,Manager Bill,Secretary Pat,Engineer 

然后运行:

 $ awk -F, -v name="Bill" '$1==name{print $2}' "$D/staff.csv" Secretary 

但这只是一个较大任务的一小部分,所以我必须能够从shell脚本自动执行此操作,而无需手动打开Excel以导出CSV文件。 我如何从运行cygwin的Windows PC上执行此操作?

以下VBS和shell脚本的组合为Excel电子表格中的每个工作表创build一个CSV文件:

 $ cat xls2csv.vbs csv_format = 6 Dim strFilename Dim objFSO Set objFSO = CreateObject("scripting.filesystemobject") strFilename = objFSO.GetAbsolutePathName(WScript.Arguments(0)) If objFSO.fileexists(strFilename) Then Call Writefile(strFilename) Else wscript.echo "no such file!" End If Set objFSO = Nothing Sub Writefile(ByVal strFilename) Dim objExcel Dim objWB Dim objws Set objExcel = CreateObject("Excel.Application") Set objWB = objExcel.Workbooks.Open(strFilename) For Each objws In objWB.Sheets objws.Copy objExcel.ActiveWorkbook.SaveAs objWB.Path & "\" & objws.Name & ".csv", csv_format objExcel.ActiveWorkbook.Close False Next objWB.Close False objExcel.Quit Set objExcel = Nothing End Sub 

 $ cat xls2csv PATH="$HOME:$PATH" # the original XLS input file path components inXlsPath="$1" inXlsDir=$(dirname "$inXlsPath") xlsFile=$(basename "$inXlsPath") xlsBase="${xlsFile%.*}" # The tmp dir we'll copy the XLS to and run the tool on # to get the CSVs generated tmpXlsDir="/usr/tmp/${xlsBase}.$$" tmpXlsPath="${tmpXlsDir}/${xlsFile}" absXlsPath="C:/cygwin64/${tmpXlsPath}" # need an absolute path for VBS to work mkdir -p "$tmpXlsDir" trap 'rm -f "${tmpXlsDir}/${xlsFile}"; rmdir "$tmpXlsDir"; exit' 0 cp "$inXlsPath" "$tmpXlsDir" cygstart "$HOME/xls2csv.vbs" "$absXlsPath" printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be created:\n" >&2 while [ ! -f "${tmpXlsDir}/~\$${xlsFile}" ] do # VBS is done when this tmp file is created and later removed printf "." >&2 sleep 1 done printf " Done.\n" >&2 printf "Waiting for \"${tmpXlsDir}/~\$${xlsFile}\" to be removed:\n" >&2 while [ -f "${tmpXlsDir}/~\$${xlsFile}" ] do # VBS is done when this tmp file is removed printf "." >&2 sleep 1 done printf " Done.\n" >&2 numFiles=0 for file in "$tmpXlsDir"/*.csv do numFiles=$(( numFiles + 1 )) done if (( numFiles >= 1 )) then outCsvDir="${inXlsDir}/${xlsBase}.csvs" mkdir -p "$outCsvDir" mv "$tmpXlsDir"/*.csv "$outCsvDir" fi 

现在,我们执行内部调用cygstart的shell脚本来运行VBS脚本,以便根据Excel文件名(例如,Excel文件的staff.xlsx名称)在存在Excel文件的同一目录下的子目录中生成CSV文件(每张一个) staff.xlsx生成staff.csvs目录staff.csvs ):

 $ ./xls2csv "$D/staff.xlsx" Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be created: .. Done. Waiting for "/usr/tmp/staff.2700/~$staff.xlsx" to be removed: . Done. 

在目标Excel文件"$D/staff.xlsx" ,只有一张工作表的默认名称是Sheet1 ,所以上面的输出是一个文件"$D/staff.csvs/Sheet1.csv"

 $ cat "$D/staff.csvs/Sheet1.csv" Name,Position Sue,Manager Bill,Secretary Pat,Engineer $ awk -F, -v name="Bill" '$1==name{print $2}' "$D/staff.csvs/Sheet1.csv" Secretary