尝试将.log转换为.xml

我一直在寻找各种方法将六个20 MB的.log文件转换为.xml

人们把我介绍给在线转换器,它不转换。

其他人则说使用EXCEL的开发者工具,但是

  1. 数据格式不正确

  2. 将20 MB数据复制并粘贴到EXCEL上需要花费很长时间。

我正在考虑两个select:

  1. 在powershell中,将每行存储为一个数组,然后为了到达行数组,创build另一个数组,但是行格式不一致,您将会看到

  2. 研究如何使用php或python转换成xml

这里是数据的格式

06/01 01:25:58 [2024:2588] 10.4.10.10<AgentInfo DomainID="8CB49C910AFB16720044B53CD014E7D9" AgentType="105" UserDomain="MYAWESOMEDOMAIN.ORG" LoginUser="admIN" ComputerDomain="myAWESOMEDOMAIN.org" ComputerName="AWESOMES001" PreferredGroup="My%20Company%5cServers%5cGUP" PreferredMode="1" KnownClientID="C35A5B1E0AFB16760019AE74888EA38A" HardwareKey="46E04E5469DC41949F33E73FDC0C5FCF" IsNPVDIClient="0" SiteDomainName=""/> 06/01 01:26:07 [2024:3280] 10.24.10.97<AgentInfo DomainID="8CB49C910AFB16720044B53CD014E7D9" AgentType="105" UserDomain="LocalComputer" LoginUser="Student%208" ComputerDomain="WORKGROUP" ComputerName="DC9Spartan" PreferredGroup="My%20Company%5cDefault%20Group" PreferredMode="1" HardwareKey="208B60B45CE2D02192B2FBB30CA1470A" SiteDomainName=""/> 06/01 01:26:07 [2024:3280] 10.24.10.97<AgentInfo DomainID="8CB49C910AFB16720044B53CD014E7D9" AgentType="105" UserDomain="LocalComputer" LoginUser="Student%208" ComputerDomain="WORKGROUP" ComputerName="DC9Spartan" PreferredGroup="My%20Company%5cDefault%20Group" PreferredMode="1" HardwareKey="208B60B45CE2D02192B2FBB30CA1470A" SiteDomainName=""/> AgentID=2C26221A0AFB167201AE7F6B29E365AD AgentType=105 ComputerID=1E6BFEF50AFB167201AE7F6BBA576A0C Hash Key=69C6250108E5B7FBB6ACF8294B6564FE 06/01 01:26:19 [2024:2748] 10.21.36.6<AgentInfo DomainID="8CB49C910AFB16720044B53CD014E7D9" AgentType="105" UserDomain="LocalComputer" LoginUser="student" ComputerDomain="WORKGROUP" ComputerName="DingDing9461JZ6" PreferredGroup="My%20Company%5cDefault%20Group" PreferredMode="1" HardwareKey="C6D4F00C9C2952182D8DAB03045C6E30" SiteDomainName=""/> 06/01 01:26:19 [2024:2748] 10.21.36.6<AgentInfo DomainID="8CB49C910AFB16720044B53CD014E7D9" AgentType="105" UserDomain="LocalComputer" LoginUser="student" ComputerDomain="WORKGROUP" ComputerName="DingDing9461JZ6" PreferredGroup="My%20Company%5cDefault%20Group" PreferredMode="1" HardwareKey="C6D4F00C9C2952182D8DAB03045C6E30" SiteDomainName=""/> AgentID=BBF24AB00AFB167200D94A8E46E57D3C AgentType=105 ComputerID=1B4EF5B30AFB167200D94A8E8EBB8E65 Hash Key=6BAC96603C7495DE08E5F305EEF310EE 06/01 01:26:33 [2024:3376] 5 Server returned: 500 Internal Server Error 06/01 01:26:33 [2024:3376] 10.16.64.16<AgentInfo DomainID="ACD6E7230AFB160401B335F917AFF5BE" AgentType="105" UserDomain="LocalComputer" LoginUser="admin" ComputerDomain="myAWESOMEDOMAIN.org" ComputerName="LLR0MGVY" PreferredGroup="My%20Company%5cDefault%20Group" PreferredMode="1" HardwareKey="EFDBD800D66488B08936A51F19B5496A" IsNPVDIClient="0" SiteDomainName=""/>--FAILED 

我正在尝试将IP地址与DomainID等其他数据DomainID 。因为有一些服务器错误消息,因此会变得复杂,然后NEXT行列出IP地址,

我想如果我可以进入XML格式,查询数据会更容易。 还是有另一种方法来完成我想要做的?

谢谢

OUTPUT

我不太熟悉XML,但我猜我正在寻找的输出是

 <Date>06/01 01:25:58 <ID>[2024:2588] <IP>10.4.10.10</IP> <AgentInfo> <DomainID></DomainID> <AgentType></AgentType> <UserDomain></UserDomain> <LoginUser></LoginUser> etc, etc, the other fields within AgentInfo </AgentInfo> </ID> </Date> <Date>06/01 01:26:33 <ID>[2024:3376] <IP>10.16.64.16</IP> <Msg>5 Server returned: 500 Internal Server Error</Msg> </ID> </Date> 

我将使用RegEx, ConvertFrom-StringData (可能可以使用XMLtypes转换AgentInfo,但这对我的解决scheme来说更简单)提供PowerShell解决scheme,并将对象的HashTable收集到对象中,然后将其转换到XML,并清理,因为PowerShell的ConvertTo-XML cmdlet过于详细恕我直言。

 $InputData = Get-Content 'C:\Path\To\File.log' $Records = @{} $InputData | ?{$_ -match "^(?<Date>\d{2}\/\d{2} \d{2}:\d{2}:\d{2}) (?<ID>\[.+?\]) (?<IP>\S+)\<AgentInfo (?<AgentInfo>.+?)\/\>" -or $_ -match "^(?<Date>\d{2}\/\d{2} \d{2}:\d{2}:\d{2}) (?<ID>\[.+?\]) (?<Msg>.+)"}| %{$Record = [pscustomobject]@{ [string]'Date'=$Matches['Date'] [string]'ID'=$Matches['ID'] [string]'IP'=$Matches['IP'] [string]'Msg'=$Matches['Msg'] 'AgentInfo'=New-Object PSObject -Prop ($Matches['AgentInfo'] -replace '(?<=") ',"`r`n" | ConvertFrom-StringData) } $record If($Matches['ID'] -notin $Records.Keys){ $Records.Add($Matches['ID'], $Record) }Else{ $Record|Get-Member -MemberType Properties | Where{![string]::IsNullOrEmpty($Record.($_.Name))} | ForEach{$Records."$($Matches['ID'])"|Add-Member "$($_.Name)" $Record.$($_.Name) -Force} } } $records.Values|select Date,ID,IP,Msg,AgentInfo|convertto-xml -Depth 2 -NoTypeInformation -as Stream|%{$_ -replace 'Property Name="(.+?)(?=">)"(.*)Property(?=>)','$1$2$1' -replace 'Property Name="(.+?)"(?= />)','$1' -replace '<Property Name="(.+?)">','<$1>' -replace '</Property>','</AgentInfo>'} | Set-Content C:\Path\To\OutFile.xml 

这将输出:

 <?xml version="1.0"?> <Objects> <Object> <Date>06/01 01:26:19</Date> <ID>[2024:2748]</ID> <IP>10.21.36.6</IP> <Msg /> <AgentInfo> <LoginUser>"student"</LoginUser> <ComputerDomain>"WORKGROUP"</ComputerDomain> <ComputerName>"DingDing9461JZ6"</ComputerName> <DomainID>"8CB49C910AFB16720044B53CD014E7D9"</DomainID> <HardwareKey>"C6D4F00C9C2952182D8DAB03045C6E30"</HardwareKey> <SiteDomainName>""</SiteDomainName> <PreferredGroup>"My%20Company%5cDefault%20Group"</PreferredGroup> <AgentType>"105"</AgentType> <PreferredMode>"1"</PreferredMode> <UserDomain>"LocalComputer"</UserDomain> </AgentInfo> </Object> <Object> <Date>06/01 01:25:58</Date> <ID>[2024:2588]</ID> <IP>10.4.10.10</IP> <Msg /> <AgentInfo> <LoginUser>"admIN"</LoginUser> <IsNPVDIClient>"0"</IsNPVDIClient> <ComputerDomain>"myAWESOMEDOMAIN.org"</ComputerDomain> <ComputerName>"AWESOMES001"</ComputerName> <DomainID>"8CB49C910AFB16720044B53CD014E7D9"</DomainID> <HardwareKey>"46E04E5469DC41949F33E73FDC0C5FCF"</HardwareKey> <SiteDomainName>""</SiteDomainName> <PreferredGroup>"My%20Company%5cServers%5cGUP"</PreferredGroup> <AgentType>"105"</AgentType> <KnownClientID>"C35A5B1E0AFB16760019AE74888EA38A"</KnownClientID> <PreferredMode>"1"</PreferredMode> <UserDomain>"MYAWESOMEDOMAIN.ORG"</UserDomain> </AgentInfo> </Object> <Object> <Date>06/01 01:26:07</Date> <ID>[2024:3280]</ID> <IP>10.24.10.97</IP> <Msg /> <AgentInfo> <LoginUser>"Student%208"</LoginUser> <ComputerDomain>"WORKGROUP"</ComputerDomain> <ComputerName>"DC9Spartan"</ComputerName> <DomainID>"8CB49C910AFB16720044B53CD014E7D9"</DomainID> <HardwareKey>"208B60B45CE2D02192B2FBB30CA1470A"</HardwareKey> <SiteDomainName>""</SiteDomainName> <PreferredGroup>"My%20Company%5cDefault%20Group"</PreferredGroup> <AgentType>"105"</AgentType> <PreferredMode>"1"</PreferredMode> <UserDomain>"LocalComputer"</UserDomain> </AgentInfo> </Object> <Object> <Date>06/01 01:26:33</Date> <ID>[2024:3376]</ID> <IP>10.16.64.16</IP> <Msg>5 Server returned: 500 Internal Server Error</Msg> <AgentInfo> <LoginUser>"admin"</LoginUser> <IsNPVDIClient>"0"</IsNPVDIClient> <ComputerDomain>"myAWESOMEDOMAIN.org"</ComputerDomain> <ComputerName>"LLR0MGVY"</ComputerName> <DomainID>"ACD6E7230AFB160401B335F917AFF5BE"</DomainID> <HardwareKey>"EFDBD800D66488B08936A51F19B5496A"</HardwareKey> <SiteDomainName>""</SiteDomainName> <PreferredGroup>"My%20Company%5cDefault%20Group"</PreferredGroup> <AgentType>"105"</AgentType> <PreferredMode>"1"</PreferredMode> <UserDomain>"LocalComputer"</UserDomain> </AgentInfo> </Object> </Objects> 

这很接近你想要的输出。

我肯定会使用Python。

像这样的东西:

 import sys inFile = sys.argv[1] inFile = open(inFile,'r') parser = inFile.readlines() outFile = open('[your_path]\\converted.xml', 'w') for i in parser: slice = i.split(' ') #split each line at spaces and do stuff with each slice outFile.write("<date>" + slice[0] + "</date>" + '\n') outFile.write("<time>" + slice[1] + "</time>" + '\n') and so on... 

XMLWriter Api是为这种工作devise的XML Api。 这里是一个例子,让你开始:

 $xml = new XMLWriter(); $xml->openUri($output); $xml->startDocument(); $xml->setIndent(2); $xml->startElement('log'); $file = fopen($input, 'r'); while (FALSE !== ($line = fgets($file))) { if (FALSE !== ($p = strpos($line, '<'))) { $xml->startElement('line'); $xml->writeElement('date', substr($line, 0, $p - 1)); $xml->writeRaw(substr($line, $p)); $xml->endElement(); } } $xml->endElement(); $xml->endDocument(); $xml->flush(); 

输出:

 <?xml version="1.0"?> <log> <line> <date>06/01 01:25:58 [2024:2588] 10.4.10.1</date> <AgentInfo DomainID="8CB49C910AFB16720044B53CD014E7D9" AgentType="105" UserDomain="MYAWESOMEDOMAIN.ORG" LoginUser="admIN" ComputerDomain="myAWESOMEDOMAIN.org" ComputerName="AWESOMES001" PreferredGroup="My%20Company%5cServers%5cGUP" PreferredMode="1" KnownClientID="C35A5B1E0AFB16760019AE74888EA38A" HardwareKey="46E04E5469DC41949F33E73FDC0C5FCF" IsNPVDIClient="0" SiteDomainName=""/> </line> ... 

您将需要一个文档元素和每个条目/行的元素是一个好主意。 我只把它分成两个基本部分。 你将需要添加更多的逻辑(也许正则expression式),这取决于你的目标XML格式。 代理信息是一个XML元素/文档,因此可以将其复制(原始)到目标XML中。

XMLWriter不是PHP专有的API,您可以find许多语言的实现。