从导出到excel的扫描的pdf文件中提取数据

以下是我正在使用的Excel工作簿的数据结构的一些屏幕截图:

在这里输入图像说明

在这里输入图像说明

在这里输入图像说明

在这里输入图像说明

在这里输入图像说明

在这里输入图像说明

好的,我已经根据大家所说的来编辑代码了。 它仍然需要很多工作。

我现在卡住的是error handling。 显然,如果没有find关键词 – Last FirstMiddle或者Rank – 这会给我一个错误。

我最终试图做的是输出一个空白,如果关键词没有任何价值(单词),价值单词,如果有一个。 如果关键字丢失,我想输出一个空白。 值字也可以在关键字下面的行中。 我也想在这种情况下输出这个值。

我正在尝试使用IfElse语句来执行此操作。 但是,我认为他们可能写错了,因为如果没有find关键词,我得到一个错误。

 Option Explicit Sub find2() Dim lrd As Long Dim lrdWS1 As Long Dim iRow As Integer Dim celltosplit As String Dim result As String '-------------------------------------------------------------------------------------------------------------------------------------- lrdWS1 = Sheets("Table 1").Cells(Sheets("Table 1").Rows.count, 1).End(xlUp)(2).Row Sheets.Add(After:=Sheets(Sheets.count)).name = "FieldValues" lrd = Sheets("FieldValues").Cells(Sheets("FieldValues").Rows.count, 1).End(xlUp)(1).Row Worksheets("Table 1").Activate '-------------------------------------------------------------------------------------------------------------------------------------- Do While Worksheets("Table 1").Activate And Cells.find(What:="Last", After:=ActiveCell, LookIn:=xlFormulas, LookAt _ :=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _ False, SearchFormat:=False).Activate Application.Goto (Cells(1, 1)) '-------------------------------------------------------------------------------------------------------------------------------------- Worksheets("Table 1").Activate Application.Goto (Cells(1, 1)) If Cells.find(What:="Last", After:=ActiveCell, LookIn:=xlFormulas, LookAt _ :=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _ False, SearchFormat:=False).Activate Then Selection.Copy Sheets("FieldValues").Activate Range("A" & lrd).Activate ActiveSheet.Paste Columns("A:A").EntireColumn.AUTOFIT Cells.Replace What:="Last", Replacement:="", LookAt:=xlPart, SearchOrder _ :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False lrd = Sheets("FieldValues").Cells(Sheets("FieldValues").Rows.count, 1).End(xlUp)(2).Row Worksheets("Table 1").Activate ActiveCell.UnMerge Selection.Replace What:="Last", Replacement:="", LookAt:=xlPart, SearchOrder _ :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False Else Cells(1, lrd) = "" lrd = Sheets("FieldValues").Cells(Sheets("FieldValues").Rows.count, 1).End(xlUp)(2).Row End If '------------------------------------------------------------------------------------------------------------------------------------- Worksheets("Table 1").Activate Application.Goto (Cells(1, 1)) If Cells.find(What:="First", After:=ActiveCell, LookIn:=xlFormulas, LookAt _ :=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _ False, SearchFormat:=False).Activate Then Selection.Copy Sheets("FieldValues").Activate Range("A" & lrd).Activate ActiveSheet.Paste Columns("A:A").EntireColumn.AUTOFIT Cells.Replace What:="First", Replacement:="", LookAt:=xlPart, SearchOrder _ :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False lrd = Sheets("FieldValues").Cells(Sheets("FieldValues").Rows.count, 1).End(xlUp)(2).Row Worksheets("Table 1").Activate ActiveCell.UnMerge Selection.Replace What:="First", Replacement:="", LookAt:=xlPart, SearchOrder _ :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False Else Cells("1", lrd) = "" lrd = Sheets("FieldValues").Cells(Sheets("FieldValues").Rows.count, 1).End(xlUp)(2).Row End If '------------------------------------------------------------------------------------------------------------------------------------- Worksheets("Table 1").Activate Application.Goto (Cells(1, 1)) If Cells.find(What:="Middle", After:=ActiveCell, LookIn:=xlFormulas, LookAt _ :=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _ False, SearchFormat:=False).Activate Then Selection.Copy Sheets("FieldValues").Activate Range("A" & lrd).Activate ActiveSheet.Paste Columns("A:A").EntireColumn.AUTOFIT Cells.Replace What:="Middle", Replacement:="", LookAt:=xlPart, SearchOrder _ :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False lrd = Sheets("FieldValues").Cells(Sheets("FieldValues").Rows.count, 1).End(xlUp)(2).Row Worksheets("Table 1").Activate ActiveCell.UnMerge Selection.Replace What:="Middle", Replacement:="", LookAt:=xlPart, SearchOrder _ :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False Else Cells("A", lrd) = "" lrd = Sheets("FieldValues").Cells(Sheets("FieldValues").Rows.count, 1).End(xlUp)(2).Row End If '----------------------------------------------------------------------------------------------------------------------------------------------------------------- Worksheets("Table 1").Activate Application.Goto (Cells(1, 1)) If Cells.find(What:="Rank", After:=ActiveCell, LookIn:=xlFormulas, LookAt _ :=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:= _ False, SearchFormat:=False).Activate Then Selection.Copy Sheets("FieldValues").Activate Range("A" & lrd).Activate ActiveSheet.Paste Columns("A:A").EntireColumn.AUTOFIT Cells.Replace What:="Rank", Replacement:="", LookAt:=xlPart, SearchOrder _ :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False lrd = ActiveCell.Row + 2 Worksheets("Table 1").Activate ActiveCell.UnMerge Selection.Replace What:="Rank", Replacement:="", LookAt:=xlPart, SearchOrder _ :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False Else Cells("A", lrd) = "" lrd = ActiveCell.Row + 2 End If Loop 

我很抱歉,但我必须说:你的代码是一个完整的混乱! 每个人在评论中所说的一切都适用。 和更多。

另外,你说:

在代码中我有一个循环设置运行,它运行良好的第一次

不。 不对。 尝试使用多于一个单词的FirstMiddleRank字段值,看看你得到什么!

您发布的特定问题出现了,因为在将字段值复制到FieldValues表之后,而不是仅从所find的字段中删除字段名称,而是从Table 1表中的所有单元格中删除该字段名称! 您正在使用Cells.Replace而不是Selection.Replace

但是,使用Replace()函数而不是使用<Range>.Replace方法远远好于以下情况,例如:

 Selection.value = replace(Selection.value2,"Last","") 

请注意,我绝不主张使用Selection 。 正确的方法是有一个范围对象variables,例如, rngFoundField ,并像这样使用它:

 rngFoundField.value = replace(rngFoundField.value2,"Last","") 

编辑: v0.2 – 增加了基本的ID提取

基于提供的屏幕上限,我已经设法编写一个程序,将正确提取四个字段Last First Middle的值,并Rank和输出到一个新的工作表:

 '============================================================================================ ' Module : <in any standard module> ' Version : 0.2 ' Part : 1 of 1 ' References : Microsoft Scripting Runtime ' Source : https://stackoverflow.com/a/46166984/1961728 '============================================================================================ Private Enum i_ ž__NONE = 0 ID Last First Middle Rank ž__ ž__FIRST = ž__NONE + 1 ž__LAST = ž__ - 1 End Enum Public Sub ExtractFieldValues() Const l_Table_1 As String = "Table 1" Const l_FieldValues As String = "FieldValues" Const l_last_first_middle As String = "last first middle" Const s_FieldNames As String = "id " & l_last_first_middle & " rank" Const n_OutputRowsPerRecord As Long = 6 Dim ƒ As Excel.WorksheetFunction: Set ƒ = Excel.WorksheetFunction Dim ¡ As Long With Worksheets On Error Resume Next .Add(After:=.Item(.Count)).Name = l_FieldValues On Error GoTo 0 Application.DisplayAlerts = False If .Item(.Count).Name <> l_FieldValues Then .Item(.Count).Delete .Item(l_FieldValues).UsedRange.Clear End If .Item(l_FieldValues).Columns(1).NumberFormat = "@" Application.DisplayAlerts = True .Item(l_Table_1).Activate End With Dim astrFieldNames() As String astrFieldNames = Split(" " & s_FieldNames, " ") ' Force index zero to a blank -> treat as base 1 Dim dictFields As Scripting.Dictionary '##Late Binding: CreateObject("Scripting.Dictionary") Set dictFields = New Scripting.Dictionary '##Late Binding: As Object With dictFields .CompareMode = TextCompare For ¡ = i_.ž__FIRST To i_.ž__LAST dictFields.Add astrFieldNames(¡), "" Next ¡ End With Dim lngLastUsedRow As Long lngLastUsedRow _ = Cells _ .Find _ ( _ What:="*" _ , After:=Cells(1) _ , LookIn:=xlFormulas _ , Lookat:=xlPart _ , SearchOrder:=xlByRows _ , SearchDirection:=xlPrevious _ ) _ .Row With Range(Rows(1), Rows(lngLastUsedRow)) Dim arngFoundCells(i_.ž__FIRST To i_.ž__LAST) As Range For ¡ = i_.ž__FIRST To i_.ž__LAST Set arngFoundCells(¡) = .Find(What:=astrFieldNames(¡), After:=Cells(1)) Next ¡ Dim lngFirstFoundRow As Long lngFirstFoundRow _ = ƒ.Min _ ( _ arngFoundCells(i_.Last).Row _ , arngFoundCells(i_.First).Row _ , arngFoundCells(i_.Middle).Row _ ) Dim lngOuputSheetNextRow As Long lngOuputSheetNextRow = 1 Dim varFoundCell As Variant Dim lngNextFoundRow As Long Dim rngNextFindStart As Range Dim astrSplitValues() As String Dim strFoundValue As String Dim lngFieldCount As Long Do For ¡ = i_.ž__FIRST To i_.ž__LAST ' Debug.Print arngFoundCells(¡).Address; " "; dictFields.Item(astrFieldNames(¡)) = "" Next ¡ ' Debug.Print Select Case True Case arngFoundCells(i_.First).Row = arngFoundCells(i_.Middle).Row: ' Edge case: missing rank (found rank is for next employee) -> copy first to rank (simplifies following code) If arngFoundCells(i_.Rank).Row <> arngFoundCells(i_.First).Row Then Set arngFoundCells(i_.Rank) = arngFoundCells(i_.First) End If For Each varFoundCell In arngFoundCells strFoundValue = ƒ.Trim(Replace(Replace(varFoundCell.Value2, vbLf, " "), ":", "")) & " " If strFoundValue Like "[']*" Then strFoundValue = Mid$(strFoundValue, 2) ' ID field: only retain the first word of value If LCase$(strFoundValue) Like astrFieldNames(i_.ID) & "*" Then strFoundValue = Left$(strFoundValue, InStr(InStr(strFoundValue, " ") + 1, strFoundValue, " ")) End If ' Edge case: no last name value in merged cell -> assume value is in first cell of following row If LCase$(strFoundValue) Like astrFieldNames(i_.Last) & " " Then strFoundValue = ƒ.Trim(strFoundValue & Rows(varFoundCell.Row + 1).Cells(1).Value2) & " " End If ' Edge case: Field names only in row -> assume field values are on the following row If LCase$(strFoundValue) Like l_last_first_middle & "*" _ And Len(strFoundValue) - Len(Replace(strFoundValue, " ", "")) < 5 _ Then strFoundValue = ƒ.Trim(strFoundValue & Rows(varFoundCell.Row + 1).Cells(1).Value2) & " " End If astrSplitValues = Split(" " & strFoundValue, " ") ' Force index zero to a blank -> treat as base 1 ' Array contains one/some/all field names first and then the values (with a possible extra blank value) lngFieldCount = Int(UBound(astrSplitValues) / 2) For ¡ = 1 To lngFieldCount dictFields.Item(astrSplitValues(¡)) = astrSplitValues(¡ + lngFieldCount) Next ¡ Next varFoundCell ' Only allow the id to be on the previous row If arngFoundCells(i_.ID).Row <> arngFoundCells(i_.First).Row - 1 Then dictFields.Item(astrFieldNames(i_.ID)) = 0 End If Case Else Debug.Print " SKIPPED: "; For ¡ = i_.ž__FIRST To i_.ž__LAST Debug.Print arngFoundCells(¡).Address; " "; Next ¡ Debug.Print For ¡ = i_.ž__FIRST To i_.ž__LAST Debug.Print " "; ƒ.Trim(arngFoundCells(¡).Value2) Next ¡ Debug.Print End Select Sheets(l_FieldValues).Columns(1).Cells(lngOuputSheetNextRow).Resize(n_OutputRowsPerRecord - 1).Value _ = ƒ.Transpose(dictFields.Items) lngOuputSheetNextRow = lngOuputSheetNextRow + n_OutputRowsPerRecord Set rngNextFindStart = Rows(arngFoundCells(i_.First).Row + 2).Cells(1) For ¡ = i_.ž__FIRST To i_.ž__LAST Set arngFoundCells(¡) = .Find(What:=astrFieldNames(¡), After:=rngNextFindStart) Next ¡ lngNextFoundRow _ = ƒ.Min _ ( _ arngFoundCells(i_.Last).Row _ , arngFoundCells(i_.First).Row _ , arngFoundCells(i_.Middle).Row _ ) Loop While lngNextFoundRow <> lngFirstFoundRow End With End Sub 

我预计会有一些边缘案例被遗漏。 希望这些将显示在VBE的直接窗口。