Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to normalize filenames listed in a range

I have a list of filenames in a spreadsheet in the form of "Smith, J. 010112.pdf". However, they're in the varying formats of "010112.pdf", "01.01.12.pdf", and "1.01.2012.pdf". How could I change these to one format of "010112.pdf"?

like image 324
CountDerp Avatar asked Jun 27 '12 19:06

CountDerp


3 Answers

Personally I hate using VBA where worksheet functions will work, so I've worked out a way to do this with worksheet functions. Although you could cram this all into one cell, I've broken it out into a lot of independent steps in separate columns so you can see how it's working, step by step.

For simplicity I'm assuming your file name is in A1

B1 =LEN(A1)
determine the length of the filename

C1 =SUBSTITUTE(A1," ","")
replace spaces with nothing

D1 =LEN(C1)
see how long the string is if you replace spaces with nothing

E1 =B1-D1
determine how many spaces there are

F1 =SUBSTITUTE(A1," ",CHAR(8),E1)
replace the last space with a special character that can't occur in a file name

G1 =SEARCH(CHAR(8), F1)
find the special character. Now we know where the last space is

H1 =LEFT(A1,G1-1)
peel off everything before the last space

I1 =MID(A1,G1+1,255)
peel off everything after the last space

J1 =FIND(".",I1)
find the first dot

K1 =FIND(".",I1,J1+1)
find the second dot

L1 =FIND(".",I1,K1+1)
find the third dot

M1 =MID(I1,1,J1-1)
find the first number

N1 =MID(I1,J1+1,K1-J1-1)
find the second number

O1 =MID(I1,K1+1,L1-K1-1)
find the third number

P1 =TEXT(M1,"00")
pad the first number

Q1 =TEXT(N1,"00")
pad the second number

R1 =TEXT(O1,"00")
pad the third number

S1 =IF(ISERR(K1),M1,P1&Q1&R1)
put the numbers together

T1 =H1&" "&S1&".pdf"
put it all together

It's kind of a mess because Excel hasn't added a single new string manipulation function in over 20 years, so things that should be easy (like "find last space") require severe trickery.

like image 108
Joel Spolsky Avatar answered Nov 06 '22 06:11

Joel Spolsky


Here's a screenshot of a simple four-step method based on Excel commands and formulas, as suggested in a comment to the answered post (with a few changes)...

enter image description here

like image 30
lori_m Avatar answered Nov 06 '22 06:11

lori_m


This function below works. I've assumed that the date is in ddmmyy format, but adjust as appropriate if it's mmddyy -- I can't tell from your example.

Function FormatThis(str As String) As String

    Dim strDate As String
    Dim iDateStart As Long
    Dim iDateEnd As Long
    Dim temp As Variant

    ' Pick out the date part
    iDateStart = GetFirstNumPosition(str, False)
    iDateEnd = GetFirstNumPosition(str, True)
    strDate = Mid(str, iDateStart, iDateEnd - iDateStart + 1)

    If InStr(strDate, ".") <> 0 Then
        ' Deal with the dot delimiters in the date
        temp = Split(strDate, ".")
        strDate = Format(DateSerial( _
            CInt(temp(2)), CInt(temp(1)), CInt(temp(0))), "ddmmyy")
    Else
        ' No dot delimiters... assume date is already formatted as ddmmyy
        ' Do nothing
    End If

    ' Piece it together
    FormatThis = Left(str, iDateStart - 1) _
        & strDate & Right(str, Len(str) - iDateEnd)
End Function

This uses the following helper function:

Function GetFirstNumPosition(str As String, startFromRight As Boolean) As Long
    Dim i As Long
    Dim startIndex As Long
    Dim endIndex As Long
    Dim indexStep As Integer

    If startFromRight Then
        startIndex = Len(str)
        endIndex = 1
        indexStep = -1
    Else
        startIndex = 1
        endIndex = Len(str)
        indexStep = 1
    End If

    For i = startIndex To endIndex Step indexStep
        If Mid(str, i, 1) Like "[0-9]" Then
            GetFirstNumPosition = i
            Exit For
        End If
    Next i
End Function

To test:

Sub tester()

    MsgBox FormatThis("Smith, J. 01.03.12.pdf")
    MsgBox FormatThis("Smith, J. 010312.pdf")
    MsgBox FormatThis("Smith, J. 1.03.12.pdf")
    MsgBox FormatThis("Smith, J. 1.3.12.pdf")

End Sub

They all return "Smith, J. 010312.pdf".

like image 6
Jean-François Corbett Avatar answered Nov 06 '22 06:11

Jean-François Corbett