列表中最常用的单词

我在Excel中有一个列表,其中的一个子集是这样的:

Food and Human Nutrition Food and Human Nutrition with Placement Food and Nutrition with Professional Experience Food Marketing and Nutrition Food Marketing and Nutrition with Placement Food, Nutrition and Health 

我想find这个列表中最常用的n单词。 我试着用这个公式find最常见的词:

 =INDEX(rng,MODE(MATCH(rng,rng,0))) 

与此相关的问题是,它将每个单元视为一个单独的string,并且由于6行中的每一行都不相同,所以找不到最常见的单词 。 我想要做的是输出“食物”,“营养”和“和”作为最常见的词,其次是“营销”,“安置”,“与”等。

如果你知道并且想要使用VBA,那么这将是一个相互交织的任务。 因此,一些像这样=MostCommonWords(Range;Optional WordsNumber)自定义公式会给你这个结果:

在这里输入图像描述

这是公式背后的代码:

 Public Function MostCommonWords(inputRange As Range, _ Optional NumberOfWords As Long = 1) As String Dim myCell As Range Dim inputString As String, tempString As String, myResult As String Dim myArr As Variant, myKey As Variant Dim cnt As Long, topNumber As Long Dim myColl As Object Set myColl = CreateObject("Scripting.Dictionary") For Each myCell In inputRange tempString = LCase(Replace(myCell, ",", "")) inputString = inputString & " " & tempString Next myCell myArr = Split(inputString) For cnt = LBound(myArr) To UBound(myArr) If myColl.exists(myArr(cnt)) Then myColl(myArr(cnt)) = myColl(myArr(cnt)) + 1 Else myColl.Add myArr(cnt), 1 End If Next cnt For cnt = 1 To NumberOfWords topNumber = 0 myResult = vbNullString For Each myKey In myColl If topNumber < myColl(myKey) Then topNumber = myColl(myKey) myResult = myKey End If Next myKey MostCommonWords = MostCommonWords & " " & myResult myColl.Remove myResult Next cnt End Function 

它是如何工作的?

  • 它将input范围内的所有单元格添加到名为inputString的string中。
  • 逗号被删除,所有的单词都被转换成小写。
  • 使用字典结构,每个单词都被添加为一个关键字,并将其用作一个值。
  • 根据需要显示的单词数量,检查字典中最大的值,并显示其密钥。
  • 一旦find最大值,键将从字典中删除,因此可以find第二大值 – myColl.Remove myResult

最简单的方法可能是使用一个和声程序(比如用Word),但也很简单,就是在Word中转换为一个单独的列表,然后在Excel中进行转换。 Food,Food将显示为不同的单词,如果表格被拆分为空格,所以build议首先删除标点符号(查找/replace)。

这是一个VBAmacros,它提供了你似乎想要的东西。

  • 唯一性通过使用字典对象进行testing
  • 计数在字典中完成
  • 结果然后sorting

在代码中仔细阅读意见,了解需要做出的假设。 并需要设置的参考

另外请注意,标点符号可能导致同一个词被计入不同的类别。 如果这可能是一个问题,我们只需要分割源数据,或者在分割空格之前删除所有的标点符号,或者使用正则expression式来分割。

 'Set Reference to Microsoft Scripting Runtime Option Explicit Sub UniqueWordCounts() Dim wsSrc As Worksheet, wsRes As Worksheet Dim rSrc As Range, rRes As Range Dim vSrc As Variant, vRes As Variant Dim vWords As Variant Dim dWords As Dictionary Dim I As Long, J As Long Dim V As Variant, vKey As Variant 'Assume source data is in column 1, starting at A1 ' Could easily be anyplace Set wsSrc = Worksheets("sheet2") With wsSrc Set rSrc = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)) End With 'Results to go a few columns over Set wsRes = Worksheets("sheet2") Set rRes = rSrc(1, 1).Offset(0, 2) 'Read source data into vba array (for processing speed) vSrc = rSrc 'Collect individual words and counts into dictionary Set dWords = New Dictionary dWords.CompareMode = TextCompare For I = 1 To UBound(vSrc, 1) 'Split the sentence into individual words For Each vKey In Split(vSrc(I, 1)) If Not dWords.Exists(vKey) Then dWords.Add Key:=vKey, Item:=1 Else dWords(vKey) = dWords(vKey) + 1 End If Next vKey Next I 'Size results array ReDim vRes(0 To dWords.Count, 1 To 2) 'Column headers vRes(0, 1) = "Word" vRes(0, 2) = "Count" 'Populate the columns I = 0 For Each V In dWords.Keys I = I + 1 vRes(I, 1) = V vRes(I, 2) = dWords(V) Next V 'Size results range Set rRes = rRes.Resize(UBound(vRes, 1) + 1, UBound(vRes, 2)) 'Populate, format and sort the Results range With rRes .EntireColumn.Clear .Value = vRes With .Rows(1) .Font.Bold = True .HorizontalAlignment = xlCenter End With .EntireColumn.AutoFit .Sort key1:=.Columns(2), order1:=xlDescending, key2:=.Columns(1), order2:=xlAscending, MatchCase:=False, Header:=xlYes End With End Sub 

在这里输入图像说明