Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtle Quicksort Stability Issue

I'm trying to create a quicksort implementation for a school project that sorts a CSV file and am having a very hard time with it. According to the project specification, sorting each column of a CSV file in order will force an unstable sort to become stable, that is: "./sort --algorithm quicksort -k 1,2,3,4,5 input.csv" should produce the same results as "./sort --algorithm insertion -k 1,2,3,4,5 input.csv"

In order to preserve previous sorts the sorts are performed in reverse, like this:

for (int current_key = config.sort_columns.size()-1; current_key >= 0; current_key--){
    sorting_algorithm(records, config, config.sort_columns[current_key]-1);
}

Where config.sort_columns is vector of all of the sort columns specified by the -k argument.

Here is my input:

name,breed,date of birth,date of death,avg eggs per week 
Marilyn,Isa Red,2011-04-24,N/A,6 
Brian,Derp,2010-01-15,2011-12-01,4 
Chrissy,Ent,2012-02-08,N/A,3 
Mildred,Araucana,2011-05-01,N/A,3 
Jimmy,Ent,2006-02-30,N/A,15 
Mabel,Isa Red,2011-04-26,N/A,5 
Myrtle,Araucana,2011-08-01,N/A,0 
Myrtle,Araucana,2011-05-01,2011-07-13,0 
Adam,Ent,2010-01-01,N/A,10 
Isabel,Ent,2009-04-01,N/A,2 
Jimmy,Ent,2006-02-30,2011-03-24,1 
Jimmy,Derp,2003-02-30,2010-03-24,8 
Myrtle,Herp,2011-08-01,N/A,0 

Here is the output of my insertion sort (should be and appears to be correct):

name,breed,date of birth,date of death,avg eggs per week 
Adam,Ent,2010-01-01,N/A,10 
Brian,Derp,2010-01-15,2011-12-01,4 
Chrissy,Ent,2012-02-08,N/A,3 
Isabel,Ent,2009-04-01,N/A,2 
Jimmy,Derp,2003-02-30,2010-03-24,8 
Jimmy,Ent,2006-02-30,2011-03-24,1 
Jimmy,Ent,2006-02-30,N/A,15 
Mabel,Isa Red,2011-04-26,N/A,5 
Marilyn,Isa Red,2011-04-24,N/A,6 
Mildred,Araucana,2011-05-01,N/A,3 
Myrtle,Araucana,2011-05-01,2011-07-13,0 
Myrtle,Araucana,2011-08-01,N/A,0 
Myrtle,Herp,2011-08-01,N/A,0

And here is the output of my quicksort:

name,breed,date of birth,date of death,avg eggs per week
Adam,Ent,2010-01-01,N/A,10
Brian,Derp,2010-01-15,2011-12-01,4
Chrissy,Ent,2012-02-08,N/A,3
Isabel,Ent,2009-04-01,N/A,2
Jimmy,Ent,2006-02-30,2011-03-24,1
Jimmy,Ent,2006-02-30,N/A,15
Jimmy,Derp,2003-02-30,2010-03-24,8
Mabel,Isa Red,2011-04-26,N/A,5
Marilyn,Isa Red,2011-04-24,N/A,6
Mildred,Araucana,2011-05-01,N/A,3
Myrtle,Herp,2011-08-01,N/A,0
Myrtle,Araucana,2011-08-01,N/A,0
Myrtle,Araucana,2011-05-01,2011-07-13,0

As you can see, this is almost correct except that the second column is wrong when the first column matches (e.g. "Derp" should come before both "Ents").

And finally, here is my quicksort implementation:

int sort_quick_partition(std::vector<Record> &records, bool (*comparison)(string, string), int sort_key, int left, int right){
    /*
    Partition the vector and return the address of the new pivot.

    @param less - Vector of elements less than pivot.
    @param greater - Vector of elements greater than pivot.
    */

    // Setup 
    int store_location;
    Record pivot = records[right];
    Record temp_record;

    // Loop through and partition the vector within the provided bounds
    store_location = left - 1;
    for (int j = left; j < right; j++){
        if (comparison(records[j].fields[sort_key],pivot.fields[sort_key])){
            store_location += 1;
            std::swap(records[store_location], records[j]);
        }
    }

    std::swap(records[store_location+1], records[right]);

    return store_location+1;
}

void sort_quick_helper(std::vector<Record> &records, bool (*comparison)(string, string), int sort_key, int left, int right){
    /*
    Actually performs the quick sort.

    @param sub_list - Vector to sort.
    @param *comparison - Comparison to perform.
    @param sort_key - Which column to sort by.
    @param left - Left side of active sort zone.
    @param right - Right side of active sort zone.
    */

    // Setup
    int new_pivot;

    // Make sure the list has 2 or more items
    if (left < right){
        // Partition the vector and get the new pivot value
        new_pivot = sort_quick_partition(records, comparison, sort_key, left, right);

        // Sort elements less than the pivot
        sort_quick_helper(records, comparison, sort_key, left, new_pivot-1);

        // Sort elements greater than the pivot
        sort_quick_helper(records, comparison, sort_key, new_pivot+1, right);
    }
}

void sort_quick(std::vector<Record> &records, Configuration &config, int sort_key){
    /*
    Perform a quick sort on the records.

    @param &records - Vector of Record structures representing the file.
    @param &config - Reference to a global configuration structure.
    @param sort_key - Which column to sort by.
    */

    // Decide if it needs to be reversed
    bool (*comparison)(string, string);
    if (config.reverse){
        comparison = &comparison_gte;
    } else {
        comparison = &comparison_lte;
    }

    // Call the sort
    sort_quick_helper(records, comparison, sort_key, 0, records.size()-1);
}

Note that "sorting_algorithm" is a function pointer that points to the active sort, "sort_quick" in this case.

Does anyone see what could possible be wrong? I've been trying to fix this for days and I'm pulling my hair out at this point. Thanks everyone!

like image 508
jjb123 Avatar asked Feb 02 '26 06:02

jjb123


1 Answers

It's not true that you can make a stable sort out of an unstable one by sorting repeatedly. Consider the final sort: it isn't guaranteed to preserve the previous orderings when it sees equal keys (that's what unstable means).

Instead, what you need to do is sort on a key which unambiguously orders the input - so you need to do one sort, where the key you sort on is all the columns rather than an individual column.

So when you compare the records "Myrtle,Araucana,2011-08-01,N/A,0" and "Myrtle,Araucana,2011-05-01,2011-07-13,0" you need to compare fields in order until you find a pair that are not equal. (This is known as lexicographic ordering.) You may even need to incorporate the original position, if you need to preserve the order of completely equal records.

Of course if this wasn't homework you'd probably be looking at std::stable_sort. (A sequence of stable sorts on the columns in reverse order would be fine.)

like image 84
Alan Stokes Avatar answered Feb 03 '26 21:02

Alan Stokes