I need a c language code to sort some strings and it should be case sensitive and for the same letter in upper- and lower-cases, the lower-case must come first. For example the result of the sort for the following strings:
eggs
bacon
cheese
Milk
spinach
potatoes
milk
spaghetti
should be:
bacon
cheese
eggs
milk
Milk
potatoes
spaghetti
spinach
I have written a code but the result that I am getting is:
Milk
bacon
cheese
eggs
milk
potatoes
spaghetti
spinach
I have no idea how to improve this and I have searched a lot. Could anyone help me with this?
#include <stdio.h>
#include <string.h>
int main(){
char c;
char name[20][10], temp[10];
int count_name = 0;
int name_index = 0;
int i, j;
while ((c = getchar()) != EOF){
if (c == 10){
name[count_name][name_index] = '\0';
count_name++;
name_index = 0;
} else {
name[count_name][name_index] = c;
name_index++;
}
}
for(i=0; i < count_name-1 ; i++){
for(j=i+1; j< count_name; j++)
{
if(strcmp(name[i],name[j]) > 0)
{
strcpy(temp,name[i]);
strcpy(name[i],name[j]);
strcpy(name[j],temp);
}
}
}
for (i = 0; i < count_name; i++){
printf("%s\n", name[i]);
}
}
For lists of words, it is often more useful to group the "same" words together (even though they differ in case). For example:
Keeping things together: Simple "M after m":
------------------------ -------------------
mars mars
mars bar mars bar
Mars bar milk
milk milk-duds
Milk milky-way
milk-duds Mars bar
milky-way Milk
Milky-way Milky-way
If you want words arranged like the first column, I present three ways of doing so:
strcasecmp()
combined with strcmp()
.isalpha()
, tolower()
, and isupper()
.In the end I discuss two alternatives:
If it is possible to do so, avoid reinventing the wheel. In this case, we can do so by using the POSIX function strcasecmp()
to see if they are equal with a case-insensitive comparison, and falling back on strcmp()
when they are.
int alphaBetize (const char *a, const char *b) {
int r = strcasecmp(a, b);
if (r) return r;
/* if equal ignoring case, use opposite of strcmp() result to get
* lower before upper */
return -strcmp(a, b); /* aka: return strcmp(b, a); */
}
(On some systems, the case-insensitive comparison function is called stricmp()
or _stricmp()
. If one is not available to you, an implementation is provided below.)
#ifdef I_DONT_HAVE_STRCASECMP
int strcasecmp (const char *a, const char *b) {
while (*a && *b) {
if (tolower(*a) != tolower(*b)) {
break;
}
++a;
++b;
}
return tolower(*a) - tolower(*b);
}
#endif
Sometimes, existing functions do not perform well enough, and you have to do something else to make things faster. The following function does the comparison in roughly the same way in a single pass, and without using either strcasecmp()
or strcmp()
. But, it treats all non-alphabetical characters as being less than letters.
int alphaBetize (const char *a, const char *b) {
int weight = 0;
do {
if (*a != *b) {
if (!(isalpha(*a) && isalpha(*b))) {
if (isalpha(*a) || isalpha(*b)) {
return isalpha(*a) - isalpha(*b);
}
return *a - *b;
}
if (tolower(*a) != tolower(*b)) {
return tolower(*a) - tolower(*b);
}
/* treat as equal, but mark the weight if not set */
if (weight == 0) {
weight = isupper(*a) - isupper(*b);
}
}
++a;
++b;
} while (*a && *b);
/* if the words compared equal, use the weight as tie breaker */
if (*a == *b) {
return weight;
}
return !*b - !*a;
}
Using this comparison for sorting will keep milk
and Milk
next to each other even if the list includes milk-duds
.
Here is a way to dynamically create a collating table from a "configuration". It serves to illustrate a contrastive technique to change how strings get compared.
You can map how the letters of the alphabet are compared with a kind of simple table that describes the relative order you want letters (or any character except the NUL byte) to have:
const char * alphaBetical =
"aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ";
From this ordering, we can create a look-up table to see how two letters are supposed to compare to each other. The following function initializes the table if it has not already been done first, and otherwise performs the table look-up.
int alphaBeta_lookup (int c) {
static int initialized;
static char table[CHAR_MAX+1];
if (!initialized) {
/* leave all non-alphaBeticals in their relative order, but below
alphaBeticals */
int i, j;
for (i = j = 1; i < CHAR_MAX+1; ++i) {
if (strchr(alphaBetical, i)) continue;
table[i] = j++;
}
/* now run through the alphaBeticals */
for (i = 0; alphaBetical[i]; ++i) {
table[(int)alphaBetical[i]] = j++;
}
initialized = 1;
}
/* return the computed ordinal of the provided character */
if (c < 0 || c > CHAR_MAX) return c;
return table[c];
}
With this look-up table, we can now simplify the loop body of the alphaBetize()
comparison function:
int alphaBetize (const char *a, const char *b) {
int ax = alphaBeta_lookup(*a);
int bx = alphaBeta_lookup(*b);
int weight = 0;
do {
char al = tolower(*a);
char bl = tolower(*b);
if (ax != bx) {
if (al != bl) {
return alphaBeta_lookup(al) - alphaBeta_lookup(bl);
}
if (weight == 0) {
weight = ax - bx;
}
}
ax = alphaBeta_lookup(*++a);
bx = alphaBeta_lookup(*++b);
} while (ax && bx);
/* if the words compared equal, use the weight as tie breaker */
return (ax != bx) ? !bx - !ax : weight;
}
Using the collating table, you can create many different orderings with a simplified comparison function, like:
int simple_collating (const char *a, const char *b) {
while (alphaBeta_lookup(*a) == alphaBeta_lookup(*b)) {
if (*a == '\0') break;
++a, ++b;
}
return alphaBeta_lookup(*a) - alphaBeta_lookup(*b);
}
Using this same function and through modifying the alphaBetical
string, you can achieve nearly any ordering you want (alphabetical, reverse alphabetical, vowels before consonants, etc.). However, the arrangement of keeping alike words together requires interspersing capitalized words with words in lowercase, and this can only be done by doing a comparison that ignores case.
Note that with the simple_collating()
function above and the alphaBetical
string I provided, Bacon
will come before milk
, but Mars
will go after milk
and before Milk
.
If you want to use a collating sequence that is already defined for your locale, you can set the locale and call the collating comparison function:
/*
* To change the collating locale, use (for example):
setlocale(LC_COLLATE, "en.US");
*/
int iso_collating (const char *a, const char *b) {
return strcoll(a, b);
}
Now, by changing the locale, the sorting order will be based on a standardized collating sequence.
You can write a custom comparison function for sort.
First, look at the default strcmp sort order:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
const char *tgt[]={
"bacon", "Bacon", "mIlk", "Milk", "spinach", "MILK", "milk", "eggs"
};
int tgt_size=8;
static int cmp(const void *p1, const void *p2){
return strcmp(* (char * const *) p1, * (char * const *) p2);
}
int main(int argc, char *argv[]) {
printf("Before sort:\n\t");
for(int n=0; n<tgt_size; n++)
printf("%s ", tgt[n]);
qsort(tgt, tgt_size, sizeof(char *), cmp);
printf("\nAfter sort:\n\t");
for(int n=0; n<tgt_size; n++)
printf("%s ", tgt[n]);
return 0;
}
strcmp
sorts by ASCII character code; i.e., it sorts A-Z
then a-z
so all capital A-Z come before any word with a lowercase letter:
Before sort:
bacon Bacon mIlk Milk spinach MILK milk eggs
After sort:
Bacon MILK Milk bacon eggs mIlk milk spinach
We can write our own comparison function used in cmp
used in qsort
that ignores case. That looks like this:
int mycmp(const char *a, const char *b) {
const char *cp1 = a, *cp2 = b;
for (; toupper(*cp1) == toupper(*cp2); cp1++, cp2++)
if (*cp1 == '\0')
return 0;
return ((toupper(*cp1) < toupper(*cp2)) ? -1 : +1);
}
Be sure to also change cmp
to:
static int cmp(const void *p1, const void *p2){
return mycmp(* (char * const *) p1, * (char * const *) p2);
}
The case ignoring version now prints:
Before sort:
bacon Bacon mIlk Milk spinach MILK milk eggs
After sort:
bacon Bacon eggs Milk MILK milk mIlk spinach
This is the same output you would get with the POSIX function strcasecmp.
The function mycmp
first compares lexicographically in normal order [a|A]-[z|Z]
. This mean you will get like letter words together but you may get bacon, Bacon
as likely as Bacon, bacon
. This is because qsort is not a stable sort and 'Bacon' compares equal to 'bacon'.
Now what we want is if the comparison is 0 while ignoring case (i.e., same word like 'MILK' and 'milk) now compare including case and reverse the order:
int mycmp(const char *a, const char *b) {
const char *cp1 = a, *cp2 = b;
int sccmp=1;
for (; toupper(*cp1) == toupper(*cp2); cp1++, cp2++)
if (*cp1 == '\0')
sccmp = 0;
if (sccmp) return ((toupper(*cp1) < toupper(*cp2)) ? -1 : +1);
for (; *a == *b; a++, b++)
if (*a == '\0')
return 0;
return ((*a < *b) ? +1 : -1);
}
Final version prints:
Before sort:
bacon Bacon mIlk Milk spinach MILK milk eggs
After sort:
bacon Bacon eggs milk mIlk Milk MILK spinach
Unfortunately, this approach becomes unwieldy for UNICODE. For complex sorts, consider using a mapping or a multistep sort with a stable sort.
For complex and location aware alphabetical collations, consider Unicode collations. As an example, in different locations, letters alphabetize differently:
Swedish z < ö
y == w
German ö < z
Danish Z < Å
Lithuanian i < y < k
Tr German ä == æ
Tr Spanish c < ch < d
German Dictionary Sort: of < öf
German Phonebook Sort: öf < of
The default values for these distinctions are captured in in the Default Unicode Collation Element Table (DUCET) that provides a default mapping for UNICODE collations and string comparisons. You can modify the defaults to capture the distinction between dictionary sorting and phonebook sorting, different locations or different treatment for case. Individual location variations are actively tracked in the Unicode Common Locale Data Repository (CLDR).
The reccomendation for multi level sorting is tiered:
Level Description Examples
L1 Base characters role < roles < rule
L2 Accents role < rôle < roles
L3 Case/Variants role < Role < rôle
L4 Punctuation role < “role” < Role
Ln Identical role < ro□le < “role”
A widely used implementation of Unicode collations is in the ICU Library. The default DUCET collation for several examples would be:
b < B < bad < Bad < bäd
c < C < cote < coté < côte < côté
You can explore the ICU library and change the locations and targets with the ICU Explorer
If you wanted to implement your own version of the DUCET for giggles, you can follow the general method used in this Python script. It is not overwhelming, but not trivial.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With