Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cleaning bad data in excel, splitting words by capital letters

I'm using excel 2011 on Mac OSX. I have a data set with about 3000 entries. In the fields that contain names, many of the names are not separated. First and last names are separated by a space, but separate names are bunched together.

Here's what I have, (one cell):

Grant MorrisonSholly Fisch
Ben OliverCarlos Alberto Fernandez Urbano
Ben OliverCarlos Alberto Fernandez Urbano
Ben Oliver
Ben Oliver

Here's what I want to accomplish, (one cell, comma separated with one space after comma):

Grant Morrison, Sholly Fisch, Ben Oliver, Carlos Alberto, Fernandez Urbano, Ben Oliver, Carlos Alberto, Fernandez Urbano, Ben Oliver, Ben Oliver

I have found a few VBA scripts that will split words by capital letters, but the ones I've tried will add spaces where I don't need them like this one...

Function splitbycaps(inputstr As String) As String

Dim i As Long
Dim temp As String

If inputstr = vbNullString Then
    splitbycaps = temp
    Exit Function
Else
    temp = inputstr
    For i = 1 To Len(temp)
        If Mid(temp, i, 1) = UCase(Mid(temp, i, 1)) Then
            If i <> 1 Then
                temp = Left(temp, i - 1) + " " + Right(temp, Len(temp) - i + 1)
                i = i + 1
            End If
        End If
    Next i
    splitbycaps = temp

End If
End Function

There was another one that I found here that used RegEx, (forgive me, I'm just learning all of this so I may sound a little dumb) but when I tried that one, it wouldn't work at all, and my research pointed me to a way to add references to the library that would add the necessary tools so I could use it. Unfortunately, I cannot, for the life of me, find how to add a reference to the library on my mac version of excel... I may be doing something wrong, but this is the answer that I could not get to work...

Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
    .Global = True
    .Pattern = "([a-z])([A-Z])"
    SplitCaps = .Replace(strIn, "$1 $2")
End With
End Function

I am basically brand new at adding custom functions via VBA through excel, and there may even be a better way to do this, but it seems like every answer that I come to just doesn't quite get the data right. Thanks for any answers!

like image 601
LVPoker Avatar asked Nov 09 '22 06:11

LVPoker


1 Answers

My function from Split Uppercase words in Excel needs udpdating for your additional string matching.

You would use this function in cell B1 for text in A1 as follows enter image description here

One assumption your cleansing does make is people have only two names, so

Ben OliverCarlos Alberto

is broken to

Ben Oliver
Carlos Alberto

is that actually what should happen? (needs a minor tweak if so)

code

Function SplitCaps(strIn As String) As String
Dim objRegex As Object
Set objRegex = CreateObject("vbscript.regexp")
With objRegex
    .Global = True
    .Pattern = "([a-z])([A-Z])"
    SplitCaps = Replace(.Replace(strIn, "$1, $2"), "<br>", ", ")
End With
End Function
like image 187
brettdj Avatar answered Nov 15 '22 07:11

brettdj