Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Excel to remove duplicates one column at a time for many columns

Tags:

excel

vba

I have an Excel workbook with many sheets(40+) which have many columns in each(30+).

My goal is to remove duplicates in each column but not based on any other columns. I would like to repeat this for all columns in all sheets.

I tried to create a macro but upon execution the macro will only select the column that I had selected when I created the macro.

like image 200
Kevin Avatar asked Jan 07 '23 21:01

Kevin


2 Answers

This code will remove the duplicates from each column in the workbook - treating each column as a separate entity.

Sub RemoveDups()

    Dim wrkSht As Worksheet
    Dim lLastCol As Long
    Dim lLastRow As Long
    Dim i As Long

    'Work through each sheet in the workbook.
    For Each wrkSht In ThisWorkbook.Worksheets

        'Find the last column on the sheet.
        lLastCol = LastCell(wrkSht).Column

        'Work through each column on the sheet.
        For i = 1 To lLastCol

            'Find the last row for each column.
            lLastRow = LastCell(wrkSht, i).Row

            'Remove the duplicates.
            With wrkSht
                .Range(.Cells(1, i), .Cells(lLastRow, i)).RemoveDuplicates Columns:=1, Header:=xlNo
            End With
        Next i

    Next wrkSht

End Sub

'This function will return a reference to the last cell in either the sheet, or specified column on the sheet.
Public Function LastCell(wrkSht As Worksheet, Optional Col As Long = 0) As Range

    Dim lLastCol As Long, lLastRow As Long

    On Error Resume Next

    With wrkSht
        If Col = 0 Then
            lLastCol = .Cells.Find("*", , , , xlByColumns, xlPrevious).Column
            lLastRow = .Cells.Find("*", , , , xlByRows, xlPrevious).Row
        Else
            lLastCol = .Cells.Find("*", , , , xlByColumns, xlPrevious).Column
            lLastRow = .Columns(Col).Find("*", , , , xlByColumns, xlPrevious).Row
        End If

        If lLastCol = 0 Then lLastCol = 1
        If lLastRow = 0 Then lLastRow = 1

        Set LastCell = wrkSht.Cells(lLastRow, lLastCol)
    End With
    On Error GoTo 0

End Function

As Joshua has said - RemoveDuplicates won't work in earlier version. Providing you have two spare columns at the end of each sheet, this version will work on Excel 2003. It takes advantage of the Advanced Filter to copy the unique values to the end column, clears the original column and pastes the data back again.

Sub RemoveDups()

    Dim wrkSht As Worksheet
    Dim lLastCol As Long
    Dim lLastRow As Long
    Dim i As Long

    'Work through each sheet in the workbook.
    For Each wrkSht In ThisWorkbook.Worksheets

            'Find the last column on the sheet.
            lLastCol = LastCell(wrkSht).Column

            'Work through each column on the sheet.
            For i = 1 To lLastCol

                'Find the last row for each column.
                lLastRow = LastCell(wrkSht, i).Row

                'Only continue if there's more than 1 row of data.
                If lLastRow > 1 Then
                    With wrkSht
                        FilterToUnique .Range(.Cells(1, i), .Cells(lLastRow, i)), .Cells(1, i)
                    End With
                End If
            Next i
    Next wrkSht

End Sub

'This function will return a reference to the last cell in either the sheet, or specified column on the sheet.
Public Function LastCell(wrkSht As Worksheet, Optional Col As Long = 0) As Range

    Dim lLastCol As Long, lLastRow As Long

    On Error Resume Next

    With wrkSht
        If Col = 0 Then
            lLastCol = .Cells.Find("*", , , , xlByColumns, xlPrevious).Column
            lLastRow = .Cells.Find("*", , , , xlByRows, xlPrevious).Row
        Else
            lLastCol = .Cells.Find("*", , , , xlByColumns, xlPrevious).Column
            lLastRow = .Columns(Col).Find("*", , , , xlByColumns, xlPrevious).Row
        End If

        If lLastCol = 0 Then lLastCol = 1
        If lLastRow = 0 Then lLastRow = 1

        Set LastCell = wrkSht.Cells(lLastRow, lLastCol)
    End With
    On Error GoTo 0

End Function

Public Sub FilterToUnique(rSourceRange As Range, rSourceTarget As Range)

    Dim rLastCell As Range
    Dim rNewRange As Range

    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    'Find the last cell and copy the unique values to the last column + 2 '
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    Set rLastCell = LastCell(rSourceRange.Parent)
    rSourceRange.AdvancedFilter Action:=xlFilterCopy, CopyToRange:=rLastCell.Parent.Cells(rSourceRange.Row, rLastCell.Column + 2), Unique:=True

    ''''''''''''''''''''''''''''''''''''''''
    'Get a reference to the filtered data. '
    ''''''''''''''''''''''''''''''''''''''''
    Set rLastCell = LastCell(rSourceRange.Parent, rLastCell.Column + 2)
    With rSourceRange.Parent
        Set rNewRange = .Range(.Cells(rSourceRange.Row, rLastCell.Column), rLastCell)
    End With

    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    'Clear the column where the data is going to be moved to. '
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    rSourceRange.ClearContents

    ''''''''''''''''''''''''''''''''''''''''''''''
    'Move the filtered data to its new location. '
    ''''''''''''''''''''''''''''''''''''''''''''''
    rNewRange.Cut Destination:=rSourceTarget

End Sub
like image 68
Darren Bartrup-Cook Avatar answered May 09 '23 22:05

Darren Bartrup-Cook


Here is some code to get you started.

What I did was first created a simple list with some duplicates. I used the macro recorder (Developer --> Record Macro).

I selected the list and then went to Data --> Remove Duplicates.

I stopped recording to see this code:

Range("A1:A11").Select
ActiveSheet.Range("$A$1:$A$11").RemoveDuplicates Columns:=1, Header:=xlNo

I adapted the .RemoveDuplicates method to loop through worksheets as such:

Sub RemoveDups()
        Dim ws As Worksheet
        Dim col As Range

        For Each ws In ActiveWorkbook.Sheets
                For Each col In ws.UsedRange.Columns
                        ws.Range(col.Address).RemoveDuplicates Columns:=1, Header:=xlNo
                Next col
        Next ws

End Sub

I did notice that this would throw a run-time error if you have an empty worksheet in your workbook, so I added some logic to test for an empty worksheet. The test consists of checking used rows, used columns, and the value of cell A1 on the sheet. If the row and column count are both 1 and nothing is in cell A1, I consider the sheet empty and the code will skip it. This is totally optional if you're sure that your workbook won't have an empty sheet. I just included it for completeness.

Sub RemoveDups()
        Dim ws As Worksheet
        Dim col As Range
        Dim IsSheetEmpty As Boolean

        IsSheetEmpty = False

        For Each ws In ActiveWorkbook.Sheets
                IsSheetEmpty = ws.UsedRange.Rows.Count = 1 _
                        And ws.UsedRange.Columns.Count = 1 _
                        And ws.Cells(1, 1).Value = ""
                If IsSheetEmpty = False Then
                        For Each col In ws.UsedRange.Columns
                                ws.Range(col.Address).RemoveDuplicates Columns:=1, Header:=xlNo
                        Next col
                End If
        Next ws

End Sub

The .RemoveDuplicates method was added in Office 2007, if you're using an earlier version that will require a different approach.

like image 25
Soulfire Avatar answered May 09 '23 22:05

Soulfire