Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the most frequently used words in an Excel column containing A LOT of text?

Tags:

excel

vba

I have a large spreadsheet and I'd like to perform a word count on a specific column to figure out the most frequently used words. This column contains a very large amount of data and text.

For example, "Employee was climbing a ladder to retrieve merchandise off the top shelf. The ladder began to sway and the employee lost his balance and fell. Injury to the right leg". There are about 1000 different records like this. I was hoping use a pivot table to figure out what the most frequently used words are throughout all the cells in this column.

I'm not sure how to do this. Can anyone assist in how to do this?

Currently using the following code:

Option Explicit

Sub MakeWordList()
    Dim InputSheet As Worksheet
    Dim WordListSheet As Worksheet
    Dim PuncChars As Variant, x As Variant
    Dim i As Long, r As Long
    Dim txt As String
    Dim wordCnt As Long
    Dim AllWords As Range
    Dim PC As PivotCache
    Dim PT As PivotTable

    Application.ScreenUpdating = False
    Set InputSheet = ActiveSheet
    Set WordListSheet = Worksheets.Add(after:=Worksheets(Sheets.Count))
    WordListSheet.Range("A1") = "All Words"
    WordListSheet.Range("A1").Font.Bold = True
    InputSheet.Activate
    wordCnt = 2
    PuncChars = Array(".", ",", ";", ":", "'", "!", "#", _
        "$", "%", "&", "(", ")", " - ", "_", "--", "+", _
        "=", "~", "/", "\", "{", "}", "[", "]", """", "?", "*")
    r = 1

'   Loop until blank cell is encountered
    Do While Cells(r, 1) <> ""
'       covert to UPPERCASE
        txt = UCase(Cells(r, 1))
'       Remove punctuation
        For i = 0 To UBound(PuncChars)
            txt = Replace(txt, PuncChars(i), "")
        Next i
'       Remove excess spaces
        txt = WorksheetFunction.Trim(txt)
'       Extract the words
        x = Split(txt)
        For i = 0 To UBound(x)
            WordListSheet.Cells(wordCnt, 1) = x(i)
            wordCnt = wordCnt + 1
        Next i
    r = r + 1
    Loop

'   Create pivot table
    WordListSheet.Activate
    Set AllWords = Range("A1").CurrentRegion
    Set PC = ActiveWorkbook.PivotCaches.Add _
        (SourceType:=xlDatabase, _
        SourceData:=AllWords)
    Set PT = PC.CreatePivotTable _
        (TableDestination:=Range("C1"), _
        TableName:="PivotTable1")
    With PT
        .AddDataField .PivotFields("All Words")
        .PivotFields("All Words").Orientation = xlRowField
    End With
End Sub
like image 711
ranopano Avatar asked Oct 23 '25 23:10

ranopano


1 Answers

Here's a quick and dirty macro (I'm feeling extra helpful today). Put this in your workbook module. Note: I'm assuming the sheet you will have active is the one with all the text in column A.

Sub Test()
Dim lastRow&, i&, tempLastRow&
Dim rawWS As Worksheet, tempWS As Worksheet

Set rawWS = ActiveSheet
Set tempWS = Sheets.Add
tempWS.Name = "Temp"
rawWS.Activate

'tempWS.Columns(1).Value = rawWS.Columns(1).Value
tempLastRow = 1

With rawWS
    .Columns("A:A").TextToColumns Destination:=Range("A1"), DataType:=xlDelimited, _
                                  TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=True, Tab:=True, _
                                  Semicolon:=False, Comma:=False, Space:=True, Other:=False, TrailingMinusNumbers:=True

    lastRow = .Cells(.Rows.Count, 1).End(xlUp).Row

    For i = lastRow To 1 Step -1
        .Rows(i).EntireRow.Copy
        tempWS.Range("A" & tempLastRow).PasteSpecial Paste:=xlPasteAll, Transpose:=True
        ' tempWS.Range ("A" & tempLastRow)
        tempLastRow = tempWS.Cells(tempWS.Rows.Count, 1).End(xlUp).Row + 1
    Next i
    Application.CutCopyMode = False
End With

With tempWS
    ' Now, let's get unique words and run a count
    .Range("A:A").Copy .Range("C:C")
    .Range("C:C").RemoveDuplicates Columns:=1, Header:=xlNo
    tempLastRow = .Cells(.Rows.Count, 3).End(xlUp).Row

    .Range(.Cells(1, 4), .Cells(tempLastRow, 4)).FormulaR1C1 = "=COUNTIF(C[-3],RC[-1])"
    .Sort.SortFields.Clear
    .Sort.SortFields.Add Key:=Range("D1:D1048576") _
                              , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
    With .Sort
        .SetRange Range("C1:D1048576")
        .Header = xlGuess
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With

End With

End Sub

Basically, it creates a new sheet, counts all the individual words, and puts the words (and count) in a column, sorted by most used. You can tweak as needed.

Note: I made this before you added your code. It doesn't create a pivot table, but from what I understand you need, a Pivot Table would be overkill if you just need the most used words. But, let me know if you need any edits or changes!

like image 92
BruceWayne Avatar answered Oct 26 '25 18:10

BruceWayne



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!