I have a directory listing for which I want to retrieve the filenames and put them in a vector of strings such that they are sorted in a "natural" manner. e.g. { "10.txt" "0.txt" "2.txt" "1.m" "Jan12" "July13.txt" "Nov25.txt" "Jane" "John" }
should be {"0.txt" "1.m" "2.txt" "10.txt" "Jan12" "July13.txt" "Nov25.txt" "Jane" "John" }
. What is the easiest way to do this?
Elaborating on "natural" we assume for a string made up from parts of numbers (N) and text (T) such that ...(N)(T)...
, then for ...(N1)(T1)...
and ...(N2)(T2)...
will be (N1<N2) (<) (T1<T2)
where (<)
implies left term precedence over right term. In this case, numbers take precedence over text fields if they are in the same position in the string, i.e. 1.z (<) 1_t.txt
.
Is there already a library function to do that kind of sort for alphanumeric strings, or directory entries?
DESIRED order in which files should come. The file names will be stored in a vector of strings.
Abhinav@Abhinav-PC /cygdrive/c/AbhinavSamples/shell
$ ls -lv
total 8
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:51 1.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:55 1_t.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:50 3.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:51 4.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:53 10.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:56 10_t.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:56 13.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:53 20.txt
**Simple Sort**
Abhi@Abhi-PC /cygdrive/c/AbhinavSamples/shell
$ ls -l
total 8
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:51 1.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:53 10.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:56 10_t.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:56 13.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:55 1_t.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:53 20.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:50 3.txt
-rw-r--r--+ 1 Abhinav None 2 Mar 17 00:51 4.txt
You need a function which makes the natural comparison between two strings. After that you can use std::sort
with the comparison function as third argument (as @chac already pointed out). In the following I tried to implement such a function in a recursive way. Note, that it can handle arbitrary filenames which don't need to begin with a number part and end with a string part:
bool compareNat(const std::string& a, const std::string& b)
{
if (a.empty())
return true;
if (b.empty())
return false;
if (std::isdigit(a[0]) && !std::isdigit(b[0]))
return true;
if (!std::isdigit(a[0]) && std::isdigit(b[0]))
return false;
if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
{
if (std::toupper(a[0]) == std::toupper(b[0]))
return compareNat(a.substr(1), b.substr(1));
return (std::toupper(a[0]) < std::toupper(b[0]));
}
// Both strings begin with digit --> parse both numbers
std::istringstream issa(a);
std::istringstream issb(b);
int ia, ib;
issa >> ia;
issb >> ib;
if (ia != ib)
return ia < ib;
// Numbers are the same --> remove numbers and recurse
std::string anew, bnew;
std::getline(issa, anew);
std::getline(issb, bnew);
return (compareNat(anew, bnew));
}
Here's a simple test case:
#include <iostream> // std::cout
#include <string>
#include <algorithm> // std::sort, std::copy
#include <iterator> // std::ostream_iterator
#include <sstream> // std::istringstream
#include <vector>
#include <cctype> // std::isdigit
int main()
{
std::vector<std::string> str;
str.push_back("20.txt");
str.push_back("10.txt");
str.push_back("1.txt");
str.push_back("z2.txt");
str.push_back("z10.txt");
str.push_back("z100.txt");
str.push_back("1_t.txt");
std::sort(str.begin(), str.end(), compareNat);
std::copy(str.begin(), str.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
The result is:
1.txt
1_t.txt
10.txt
20.txt
z2.txt
z10.txt
z100.txt
There is a function that does exactly what you want in glibc. Unfortunately it is C, not C++, so if you can live with that here is the simplest possible solution "out of the box", without reimplementing anything and reinventing the wheel. BTW: this is exactly as ls -lv
is implemented. The most important part of it is the versionsort
function which does the natural sort for you. It is used here as a comparison function for scandir
.
The simple example below prints all files/directories in current directory sorted as you wish.
#define _GNU_SOURCE
#include <dirent.h>
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
struct dirent **namelist;
int n,i;
n = scandir(".", &namelist, 0, versionsort);
if (n < 0)
perror("scandir");
else
{
for(i =0 ; i < n; ++i)
{
printf("%s\n", namelist[i]->d_name);
free(namelist[i]);
}
free(namelist);
}
return 0;
}
I have come across an algorithm which works pretty nicely: http://sourcefrog.net/projects/natsort/
I have modified the source a little to meet my needs:
std::string
as parameter.std::sort
and std::vector
.My version of source is here.
A use case example:
#include <iostream>
#include <vector>
#include <algorithm>
#include "strnatcmp.hpp"
int main(){
std::vector<std::string> files;
files.push_back("20.txt");
files.push_back("10.txt");
files.push_back("1.txt");
files.push_back("z2.txt");
files.push_back("z10.txt");
files.push_back("z100.txt");
files.push_back("1_t.txt ");
files.push_back("ABc");
files.push_back("aBCd");
files.push_back("aBc");
files.push_back("aaa");
files.push_back("aBcd");
files.push_back("aaA");
std::sort(files.begin(),files.end(),compareNat);
for(int i=0;i<(int)files.size();i++)std::cout<< files[i]+"\n";
return 0;
}
Output:
1.txt
1_t.txt
10.txt
20.txt
aaa
aaA
ABc
aBc
aBCd
aBcd
z2.txt
z10.txt
z100.txt
This is the strnatcmp.hpp header.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With