在包含大量文本的Excel列中计算最常用的单词?

我有一个很大的电子表格,我想对特定的列进行字数统计,找出最常用的字词。 此列包含大量的数据和文本。

例如,“员工爬上梯子,从顶架上取货”,梯子开始晃动,员工失去平衡,跌倒,右腿受伤。 有大约1000个这样的不同logging。 我希望使用一个数据透视表来找出在这个列中的所有单元格中最常用的单词是什么。

我不知道如何做到这一点。 任何人都可以协助如何做到这一点?

目前使用下面的代码:

Option Explicit Sub MakeWordList() Dim InputSheet As Worksheet Dim WordListSheet As Worksheet Dim PuncChars As Variant, x As Variant Dim i As Long, r As Long Dim txt As String Dim wordCnt As Long Dim AllWords As Range Dim PC As PivotCache Dim PT As PivotTable Application.ScreenUpdating = False Set InputSheet = ActiveSheet Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count)) WordListSheet.Range("A1") = "All Words" WordListSheet.Range("A1").Font.Bold = True InputSheet.Activate wordCnt = 2 PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _ "$", "%", "&", "(", ")", " - ", "_", "--", "+", _ "=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*") r = 1 ' Loop until blank cell is encountered Do While Cells(r, 1) <> "" ' covert to UPPERCASE txt = UCase(Cells(r, 1)) ' Remove punctuation For i = 0 To UBound(PuncChars) txt = Replace(txt, PuncChars(i), "") Next i ' Remove excess spaces txt = WorksheetFunction.Trim(txt) ' Extract the words x = Split(txt) For i = 0 To UBound(x) WordListSheet.Cells(wordCnt, 1) = x(i) wordCnt = wordCnt + 1 Next i r = r + 1 Loop ' Create pivot table WordListSheet.Activate Set AllWords = Range("A1").CurrentRegion Set PC = ActiveWorkbook.PivotCaches.Add _ (SourceType:=xlDatabase, _ SourceData:=AllWords) Set PT = PC.CreatePivotTable _ (TableDestination:=Range("C1"), _ TableName:="PivotTable1") With PT .AddDataField .PivotFields("All Words") .PivotFields("All Words").Orientation = xlRowField End With End Sub 

这是一个快速和肮脏的macros(今天我感觉更多的帮助)。 把它放在你的工作簿模块中。 注意:我假设你将要激活的工作表是A列中所有文本的工作表

 Sub Test() Dim lastRow&, i&, tempLastRow& Dim rawWS As Worksheet, tempWS As Worksheet Set rawWS = ActiveSheet Set tempWS = Sheets.Add tempWS.Name = "Temp" rawWS.Activate 'tempWS.Columns(1).Value = rawWS.Columns(1).Value tempLastRow = 1 With rawWS .Columns("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _ TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _ Semicolon:=False, Comma:=False, Space:=True, Other:=False, TrailingMinusNumbers:=True lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row For i = lastRow To 1 Step -1 .Rows(i).EntireRow.Copy tempWS.Range("A" & tempLastRow).PasteSpecial Paste:=xlPasteAll, Transpose:=True ' tempWS.Range ("A" & tempLastRow) tempLastRow = tempWS.Cells(tempWS.Rows.Count, 1).End(xlUp).Row + 1 Next i Application.CutCopyMode = False End With With tempWS ' Now, let's get unique words and run a count .Range("A:A").Copy .Range("C:C") .Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo tempLastRow = .Cells(.Rows.Count, 3).End(xlUp).Row .Range(.Cells(1, 4), .Cells(tempLastRow, 4)).FormulaR1C1 = "=COUNTIF(C[-3],RC[-1])" .Sort.SortFields.Clear .Sort.SortFields.Add Key:=Range("D1:D1048576") _ , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal With .Sort .SetRange Range("C1:D1048576") .Header = xlGuess .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With End With End Sub 

基本上,它会创build一个新的表格,统计所有的单词,并将单词(和计数)放在一个列中,按照最常用的顺序sorting。 你可以根据需要调整。

注意:在添加代码之前我做了这个。 它不创build一个数据透视表,但是从我所了解的你需要的数据透视表来看,如果你只需要最常用的单词,那么这个数据透视表就太过分了。 但是,让我知道,如果你需要任何编辑或更改!

这是一个例程来显示每个单词和它出现的次数(使用SplitCollection

用法: CountTheWordsInRange Range("A1:A4")

 Sub CountTheWordsInRange(RangeToCheck As Range) Dim wordList As New Collection Dim keyList As New Collection Dim c For Each c In RangeToCheck Dim words As Variant words = Split(c, " ") 'Pick a delimiter For Each w In words Dim temp temp = -1 On Error Resume Next temp = wordList(w) On Error GoTo 0 If temp = -1 Then wordList.Add 1, Key:=w keyList.Add w, Key:=w Else wordList.Remove (w) keyList.Remove (w) wordList.Add temp + 1, w keyList.Add w, Key:=w End If Next w Next c 'Here we can display the word counts 'KeyList is a collection that contains each word 'WordList is a collection that contains each amount Dim x For x = 1 To wordList.Count With Sheets("Sheet1") .Cells(x, "E").Value = keyList(x) 'Display Word in column "E" .Cells(x, "F").Value = wordList(x) 'Display Count in column "F" End With Next x End Sub 

结果:

结果