Excel VBA使用正则expression式查找和屏蔽PAN数据,以实现PCI DSS合规性

由于发现文件系统中的信用卡数据的大多数工具不再列出可疑文件,因此需要使用工具来屏蔽必须保留的文件中的所有数据。

对于可能存在大量信用卡数据的excel文件,我使用正则expression式来检测所选列/行中的信用卡数据,并用X代替中间的6-8位数字的macros将对许多用户有用。 可悲的是,我不是正则expression式macros观空间中的大师。

以下基本上只适用于3卡品牌的正则expression式,并且如果PAN在与其他数据(例如注释字段)的单元格中工作,

下面的代码工作,但可以改善。 通过包含LUHNalgorithm检查,改进正则expression式可以使其适用于更多/所有卡品牌并减less误报。

剩余的改进/问题:

  • 将所有卡品牌的PAN与扩展的正则expression式匹配
  • 包括Luhnalgorithm检查(FIXED – 好主意Ron)
  • 改进Do While逻辑(由stribizhev修复)
  • 更好地处理不含PAN的细胞(FIXED)

这是我迄今为止对于美国运通,维萨和万事达似乎正在工作的东西:

Sub PCI_mask_card_numbers() ' Written to mask credit card numbers in excel files in accordance with PCI DSS. ' Highlight the credit card data in the Excel sheet, then run this macro. Dim strPattern As String: strPattern = "([4][0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})|" & _ "([5][0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})|" & _ "([3][0-9]{2})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})|" & _ "([3][0-9]{3})([^a-zA-Z0-9_]?[0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})|" & _ "([3][0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})|" & _ "([3][0-9]{3})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{4})([^a-zA-Z0-9_]?[0-9]{3})|" & _ "([3][0-9]{3})([^a-zA-Z0-9_]?[0-9]{6})([^a-zA-Z0-9_]?[0-9]{5})" ' Regex patterns for PANs above are broken into multiple parts (between the brackets) ' As such the when regex matches the first part of a PAN will fit into one of rMatch(k).SubMatches(#) where # is 0, 4, 8, 12, 16, 20 or 24. ' Visa start with a 4 and is 16 digits long. Typically the data entry pattern is four groups of four digits ' MasterCard start with a 5 and is 16 digits long. Typically the data entry pattern is four groups of four digits ' AmEx start with a 3 and is 15 digits long. Typically the pattern is 4-6-5, but data entry seems inconsistent Dim strReplace As String: strReplace = "" ' Dim regEx As New RegExp ' if this line is used instead of the next 2, the MS VBS RegEx v5.5 needs to be enabled manually. The next 2 lines seem to do it from within the script Dim regEx As Object Set regEx = CreateObject("VBScript.RegExp") Dim regEx As New RegExp Dim strInput As String Dim Myrange As Range Dim NewPAN As String Dim Aproblem As String Dim Masked As Long Dim Problems As Long Dim Total As Long With regEx .Global = True .MultiLine = True .IgnoreCase = False .Pattern = strPattern ' sets the regex pattern to match the pattern above End With Set Myrange = Selection MsgBox ("The macro will now start masking credit card numbers identified in the selected cells only. If entire columns are selected, each column will take 10-30 seconds to complete. Ditto for Rows.") For Each cell In Myrange Total = Total + 1 ' Check that the cell is a likely candidate for holding a PAN, not just a long number If strPattern <> "" _ And cell.HasFormula = False _ And Left(cell.NumberFormat, 1) <> "$" _ And Mid(cell.NumberFormat, 3, 1) <> "$" Then ' cell.NumberFormat = "@" strInput = cell.Value ' Depending on the data matching the regex pattern, fix it If regEx.Test(strInput) Then Set rMatch = regEx.Execute(strInput) For k = 0 To rMatch.Count - 1 toReplace = rMatch(k).Value ' If the regex matched, replace the PAN based on its regex segment Select Case 2 Case Is < Len(rMatch(k).SubMatches(0)) strReplace = rMatch(k).SubMatches(0) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(3)) Masked = Masked + 1 Case Is < Len(rMatch(k).SubMatches(4)) strReplace = rMatch(k).SubMatches(4) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(7)) Masked = Masked + 1 Case Is < Len(rMatch(k).SubMatches(8)) strReplace = rMatch(k).SubMatches(8) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(11)) Masked = Masked + 1 Case Is < Len(rMatch(k).SubMatches(12)) strReplace = rMatch(k).SubMatches(12) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(13)) Masked = Masked + 1 Case Is < Len(rMatch(k).SubMatches(16)) strReplace = rMatch(k).SubMatches(16) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(19)) Masked = Masked + 1 Case Is < Len(rMatch(k).SubMatches(20)) strReplace = rMatch(k).SubMatches(20) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(23)) Masked = Masked + 1 Case Is < Len(rMatch(k).SubMatches(24)) strReplace = rMatch(k).SubMatches(24) & "xxxxxxxx" & Trim(rMatch(k).SubMatches(26)) Masked = Masked + 1 Case Else Aproblem = cell.Value Problems = Problems + 1 ' MsgBox (Aproblem) ' only needed when curios End Select If cell.Value <> Aproblem Then cell.Value = Replace(strInput, toReplace, strReplace) End If Next k Else ' Adds the cell value to a variable to allow the macro to move past the cell ' Once the macro is trusted not to loop forever, the message box can be removed ' MsgBox ("Problem. Regex fail? Bad data = " & Aproblem) End If End If Next cell ' All done, tell the user MsgBox ("Cardholder data is now masked" & vbCr & vbCr & "Total cells highlighted (including blanks) = " & Total & vbCr & "Cells masked = " & Masked & vbCr & "Possible problem cells = " & Problems & vbCr & "All other cells were ignored") End Sub 

从度假返回。 这是一个简单的VBA函数,可以testingLUHNalgorithm。 参数是一串数字; 结果是布尔值。

它会生成一个校验和数字,并将该数字与您input的数字string中的数字进行比较。


 Option Explicit Function Luhn(sNum As String) As Boolean 'modulus 10 algorithm for various numbers Dim X As Long, I As Long, J As Long For I = Len(sNum) - 1 To 1 Step -2 X = X + DoubleSumDigits(Mid(sNum, I, 1)) If I > 1 Then X = X + Mid(sNum, I - 1, 1) Next I If Right(sNum, 1) = (X * 9) Mod 10 Then Luhn = True Else Luhn = False End If End Function Function DoubleSumDigits(L As Long) As Long Dim X As Long X = L * 2 If X > 9 Then X = Val(Left(X, 1)) + Val(Right(X, 1)) DoubleSumDigits = X End Function