Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting cell values to snake_case

I have some data cleansing task. I have a column which start from H6 and further down from their. The column contains data which was supposed to be in snake_case, but is not the case. The cell values are of the form:

  • With camel case: "CamelCase"
  • With spaces: "Spaced Value"
  • With some initial call caps: ALLCAPSPREFIX_rest
  • Combination of above

I know there mat not be concrete algo to bring this all to snake_case, but I want to come up with code which will at least bring most cells to snake_case.

I tried VBA code to replace spaces with underscores and gets the index of underscores. Now I was thinking to make all character just after underscores to be lower case. Further I was thinking to replace sequence of two characters: first lower case and next in uppercase, say lC to l_c as I dont want CCC to get converted to c_c_c, but to ccc. But before moving further, I want to know if there can be simpler approach to this.

like image 820
anir Avatar asked Oct 27 '25 04:10

anir


2 Answers

Here's one method that might do what you want:

Option Explicit
Function Snake_case(s As String) As String
    Dim RE As Object
    Const sPat As String = "([A-Za-z0-9]+)(?=[ _A-Z])[ _]?(\S+)"
    Const sRepl As String = "$1_$2"
    Dim v As Variant

Set RE = CreateObject("vbscript.regexp")
With RE
    .Global = True
    .ignorecase = False
    .Pattern = sPat
    v = Split(.Replace(s, sRepl), "_")
End With

v(0) = WorksheetFunction.Proper(v(0))
v(1) = LCase(v(1))
Snake_case = Join(v, "_")

End Function

enter image description here

And here is an explanation of the regex and replacement strings:

Snake_case conversion

([A-Za-z0-9]+)(?=[ _A-Z])[ _]?(\S+)

Options: Case sensitive; ^$ match at line breaks

  • Match the regex below and capture its match into backreference number 1 ([A-Za-z0-9]+)
    • Match a single character present in the list below [A-Za-z0-9]+
      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
      • A character in the range between “A” and “Z” A-Z
      • A character in the range between “a” and “z” a-z
      • A character in the range between “0” and “9” 0-9
  • Assert that the regex below can be matched starting at this position (positive lookahead) (?=[ _A-Z])
    • Match a single character present in the list below [ _A-Z]
      • A single character from the list “ _” _
      • A character in the range between “A” and “Z” A-Z
  • Match a single character from the list “ _” [ _]?
    • Between zero and one times, as many times as possible, giving back as needed (greedy) ?
  • Match the regex below and capture its match into backreference number 2 (\S+)
    • Match a single character that is NOT a “whitespace character” \S+
      • Between one and unlimited times, as many times as possible, giving back as needed (greedy) +

$1_$2

  • Insert the text that was last matched by capturing group number 1 $1
  • Insert the character “_” literally _
  • Insert the text that was last matched by capturing group number 2 $2

Created with RegexBuddy

like image 159
Ron Rosenfeld Avatar answered Oct 29 '25 19:10

Ron Rosenfeld


Ron Rosenfeld solution didn't work for me. I used this one instead. Took similar on forum https://www.mrexcel.com/board/threads/camelcase-to-proper-case.381040/

Option Explicit
Function ToSnakeCase(txt As String) As String
'
' If cell A1 contains: SanFrancisco
' cell B1: =ToSnakeCase(A1)
' cell B1: San_Francisco
'
Dim Hold As String, i As Long
Hold = Left(txt, 1)
For i = 2 To Len(txt) Step 1
  If Asc(Mid(txt, i, 1)) > 96 Then
    Hold = Hold & Mid(txt, i, 1)
  Else
    Hold = Hold & "_" & Mid(txt, i, 1)
  End If
Next i
ToSnakeCase = Hold
End Function
like image 41
Alex A Avatar answered Oct 29 '25 20:10

Alex A



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!