Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort alphanumeric data where there are hyphens

I have two sheets in a workbook, each with its own column of e-mail addresses, among other data. I will reference Column1 in Sheet1 and Column2 in Sheet2, where only Column1 may have duplicate e-mail addresses listed.

I need to identify whether the e-mail addresses in Column1 are found within Column2, and each time this is true certain code must be run.

I had solved this with two nested Do While loops, where the external loop went through each cell in Column1, named Cell1, from top to bottom and the internal loop compared Cell1 with each cell in Column2, named Cell2, also from top to bottom, exiting the internal loop early if identical values were found.

To make this more efficient, I thought to sort each column in ascending order, and have each Cell1 only look through Column2 until the value of the string in Cell2 is larger than the value of the string in Cell1, and when the next Cell1 is iterated it will continue from Cell2 where the last loop stopped, since earlier Cell2 values are all smaller than Cell1 and cannot have an equal value.

The code I've come up with is an external loop going through each cell in Column1, and an internal loop as below:

'x1 is the row number of Cell1
'x2 is the row number of Cell2
'below is the code for the internal loop looking through Column2

Do While Sheets(2).Cells(x2, 1).Value <> 0
    If LCase(Sheets(1).Cells(x1, 1).Value) < LCase(Sheets(2).Cells(x2, 1).Value) Then
        Exit Do
    ElseIf LCase(Sheets(1).Cells(x1, 1).Value) = LCase(Sheets(2).Cells(x2, 1).Value) Then

        '... code is run

        Exit Do
    End If
    x2 = x2 + 1
Loop

The issue is e-mail addresses can have hyphens (-) and apostrophes ('). While Excel ignores them when sorting columns, VBA doesn't ignore them when comparing alphanumeric values.

If I have:

     A           B
1  Noemi      Noemi
2  no-reply   no-reply
3  notify     notify

The code will compare A1 with B1 and see A1=B1, then A2 with B1 and see that A2<B1 and then skip to A3.

My first question is, can I force Excel to sort alphanumeric text including hyphens and apostrophes?

If not, so far I've only thought of a workaround by looking at Cell1 and Cell2 whether they have - or ' in them, if TRUE for any of them then using new variables to extract the text from Cell1 and Cell2 without the hyphens and apostrophes, and continuing to use these new values in the internal loop.

My second question is, how can I work around this in a more efficient way?

EDIT:

Microsoft recognizes that Excel ignores dashes and apostrophes when sorting:

http://office.microsoft.com/en-001/excel-help/default-sort-orders-HP005199669.aspx http://support.microsoft.com/kb/322067

like image 825
Dirk101 Avatar asked May 22 '14 12:05

Dirk101


2 Answers

If I have been asked yesterday, I would have agreed with David's view on the expected result of an Excel sort. However, after experimentation, I am forced to agree with Dirk. This is important to note:

Apostrophes (') and hyphens (-) are ignored, with one exception: If two text strings are the same except for a hyphen, the text with the hyphen is sorted last. source

Sort examples

Column A contains the unsorted values I used for testing Dirk's claim.

Column B has been subjected to a regular Excel sort. As you can see, the column is not in ASCII/Unicode sequence because "single quote" should come before "hyphen" should come before "letter a".

Excel uses tilde (~) as an escape character for Find so I wondered if it would do the same for Sort. AdjustedSort1 replaces "single quote" by "tilde single quote" and "hyphen" by "tilde hyphen", sorts and then restores "single quote" and "hyphen". The result is shown in column C. The sequence is better but not ASCII/Unicode because "aa-b" comes before "aa'c".

Column D uses a VBA Shell Sort routine I wrote years ago. You would probably be better to search the web for "VBA Quick Sort" if your lists are very large but my sort should give acceptable performance for reasonably sized lists.

Sub AdjustedSort1()

  With Worksheets("Sheet2").Columns("C")

    .Replace What:="'", Replacement:="~'", LookAt:=xlPart, _
             SearchOrder:=xlByRows, MatchCase:=False, _
             SearchFormat:=False, ReplaceFormat:=False
    .Replace What:="-", Replacement:="~-", LookAt:=xlPart, _
             SearchOrder:=xlByRows, MatchCase:=False, _
             SearchFormat:=False, ReplaceFormat:=False
    .Sort Key1:=Range("C2"), Order1:=xlAscending, Header:=xlYes, _
          OrderCustom:=1, MatchCase:=False, _
          Orientation:=xlTopToBottom, DataOption1:=xlSortNormal
    .Replace What:="~~-", Replacement:="-", LookAt:=xlPart, _
          SearchOrder:=xlByRows, MatchCase:=False, _
          SearchFormat:=False, ReplaceFormat:=False
    .Replace What:="~~'", Replacement:="'", LookAt:=xlPart, _
          SearchOrder:=xlByRows, MatchCase:=False, _
          SearchFormat:=False, ReplaceFormat:=False

  End With

End Sub
Sub AdjustedSort2()

  Dim Inx As Long
  Dim RngValue As Variant
  Dim RowLast As Long
  Dim ColValue() As String

  With Worksheets("Sheet2")

    RowLast = .Cells(Rows.Count, "D").End(xlUp).Row

    ' Load values from column D excluding header
    RngValue = .Range(.Cells(2, "D"), .Cells(RowLast, "D")).Value

    ' Copy values from 2D array to 1D array
    ReDim ColValue(LBound(RngValue, 1) To UBound(RngValue, 1))
    For Inx = LBound(RngValue, 1) To UBound(RngValue, 1)
      ColValue(Inx) = RngValue(Inx, 1)
    Next

    ' Sort array
    Call ShellSort(ColValue, UBound(ColValue))

    ' Copy values back to 2D array
    For Inx = LBound(ColValue) To UBound(ColValue)
      RngValue(Inx, 1) = ColValue(Inx)
    Next

    ' Copy values back to column D
    .Range(.Cells(2, "D"), .Cells(RowLast, "D")).Value = RngValue

  End With

End Sub
Public Sub ShellSort(arrstgTgt() As String, inxLastToSort As Integer)

  ' Coded 2 March 07
  ' Algorithm and text from Algorithms (Second edition) by Robert Sedgewick

  '   The most basic sort is the insertion sort in which adjacent elements are compared
  ' and swapped as necessary.  This can be very slow if the smallest elements are at
  ' end.  ShellSort is a simple extension which gains speed by allowing exchange of
  ' elements that are far apart.
  '   The idea is to rearrange the file to give it the property that taking every h-th
  ' element (starting anywhere) yields a sorted file.  Such a file is said to be
  ' h-sorted.  Put another way, an h-sorted file is h independent sorted files,
  ' interleaved together.  By h-sorting for large value of H, we can move elements
  ' in the array long distances and thus make it easier to h-sort for smaller values of
  ' h.  Using such a procedure for any sequence of values of h which ends in 1 will
  ' produce a sorted file.
  '   This program uses the increment sequence: ..., 1093, 364, 121, 40, 13, 4, 1.  This
  ' is known to be a good sequence but cannot be proved to be the best.
  '   The code looks faulty but it is not.  The inner loop compares an
  ' entry with the previous in the sequence and if necessary moves it back down the
  ' sequence to its correct position.  It does not continue with the rest of the sequence
  ' giving the impression it only partially sorts a sequence.  However, the code is not
  ' sorting one sequence then the next and so on.  It examines the entries in element
  ' number order.  Having compared an entry against the previous in its sequence, it will
  ' be intH loops before the next entry in the sequence in compared against it.

  Dim intNumRowsToSort          As Integer
  Dim intLBoundAdjust           As Integer
  Dim intH                      As Integer
  Dim inxRowA                   As Integer
  Dim inxRowB                   As Integer
  Dim inxRowC                   As Integer
  Dim stgTemp                   As String

  intNumRowsToSort = inxLastToSort - LBound(arrstgTgt) + 1
  intLBoundAdjust = LBound(arrstgTgt) - 1

  ' Set intH to 1, 4, 13, 40, 121, ..., 3n+1, ... until intH > intNumRowsToSort
  intH = 1
  Do While intH <= intNumRowsToSort
    intH = 3 * intH + 1
  Loop

  Do While True
    If intH = 1 Then Exit Do
    ' The minimum value on entry to this do-loop will be 4 so there is at least
    ' one repeat of the loop.
    intH = intH \ 3
    For inxRowA = intH + 1 To intNumRowsToSort
      stgTemp = arrstgTgt(inxRowA + intLBoundAdjust)
      inxRowB = inxRowA
      Do While True
        ' The value of element inxRowA has been saved.  Now move the element intH back
        ' from row inxRowA into this row if it is smaller than the saved value.  Repeat
        ' this for earlier elements until one is found that is larger than the saved
        ' value which is placed in the gap.
        inxRowC = inxRowB - intH
        If arrstgTgt(inxRowC + intLBoundAdjust) <= stgTemp Then Exit Do
        arrstgTgt(inxRowB + intLBoundAdjust) = arrstgTgt(inxRowC + intLBoundAdjust)
        inxRowB = inxRowC
        If inxRowB <= intH Then Exit Do
      Loop
      arrstgTgt(inxRowB + intLBoundAdjust) = stgTemp
    Next
  Loop

End Sub
like image 86
Tony Dallimore Avatar answered Oct 23 '22 05:10

Tony Dallimore


Replace all the "-" with a "^" and then Excel will not ignore the "^" in the sort as it does the "-".

Then can replace the "^" back to "-" if you want to.

like image 32
Steve M Avatar answered Oct 23 '22 05:10

Steve M