Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Naturally Sort Pathlib objects in Python?

I am trying to create a sorted list of files in the ./pages directory. This is what I have so far:

import numpy as np
from PIL import Image
import glob
from pathlib import Path


# sorted( l, key=lambda a: int(a.split("-")[1]) )
image_list = []

for filename in Path('./pages').glob('*.jpg'):
#     sorted( i, key=lambda a: int(a.split("_")[1]) )
#     im=Image.open(filename)
    image_list.append(filename)

print(*image_list, sep = "\n")

current output:

pages/page_1.jpg  
pages/page_10.jpg  
pages/page_11.jpg  
pages/page_12.jpg  
pages/page_2.jpg  
pages/page_3.jpg  
pages/page_4.jpg  
pages/page_5.jpg  
pages/page_6.jpg  
pages/page_7.jpg  
pages/page_8.jpg  
pages/page_9.jpg  

Expected Output:

pages/page_1.jpg   
pages/page_2.jpg  
pages/page_3.jpg  
pages/page_4.jpg  
pages/page_5.jpg  
pages/page_6.jpg  
pages/page_7.jpg  
pages/page_8.jpg  
pages/page_9.jpg  
pages/page_10.jpg  
pages/page_11.jpg  
pages/page_12.jpg

I've tried the solutions found in the duplicate, but they don't work because the pathlib files are class objects, and not strings. They only appear as filenames when I print them.

For example:

print(filename) # pages/page_1.jpg  
print(type(filename)) # <class 'pathlib.PosixPath'>

Finally, this is working code. Thanks to all.

from pathlib import Path
import numpy as np
from PIL import Image
import natsort

def merge_to_single_image():
    image_list1 = []
    image_list2 = []
    image_list3 = []
    image_list4 = []

    for filename in Path('./pages').glob('*.jpg'):
        image_list1.append(filename)

    for i in image_list1:
        image_list2.append(i.stem)
    #     print(type(i.stem))

    image_list3 = natsort.natsorted(image_list2, reverse=False)

    for i in image_list3:
        i = str(i)+ ".jpg"
        image_list4.append(Path('./pages', i))

    images = [Image.open(i) for i in image_list4]
    # for a vertical stacking it is simple: use vstack
    images_combined = np.vstack(images)
    images_combined = Image.fromarray(images_combined)
    images_combined.save('Single_image.jpg')
like image 966
PrasadHeeramani Avatar asked Oct 09 '19 20:10

PrasadHeeramani


People also ask

How do I sort the path of a file in Python?

You can use sorted with a lambda . For the sorting criteria, you can use os to first pull just the file name (using basename ), then you can split off just the filename less the extension (using splitext ). Lastly convert to int so you sort numerically instead of lexicographically.

What is Pathlib path Python?

The pathlib is a Python module which provides an object API for working with files and directories. The pathlib is a standard module. Path is the core object to work with files.

How does Pathlib path work?

With pathlib , file paths can be represented by proper Path objects instead of plain strings as before. These objects make code dealing with file paths: Easier to read, especially because / is used to join paths together. More powerful, with most necessary methods and properties available directly on the object.

Which is better os or Pathlib?

In this article, I have introduced another Python built-in library, the Pathlib. It is considered to be more advanced, convenient and provides more stunning features than the OS library.


3 Answers

Just for posterity, maybe this is more succinct?

natsorted(list_of_pathlib_objects, key=str)
like image 72
OlleNordesjo Avatar answered Oct 30 '22 03:10

OlleNordesjo


Note that sorted doesn't sort your data in place, but returns a new list, so you have to iterate on its output.

In order to get your sorting key, which is the integer value at the end of your filename:

  • You can first take the stem of your path, which is its final component without extension (so, for example, 'page_13').

  • Then, it is better to split it once from the right, in order to be safe in case your filename contains other underscores in the first part, like 'some_page_33.jpg'.

  • Once converted to int, you have the key you want for sorting.

So, your code could look like:

for filename in sorted(Path('./pages').glob('*.jpg'), 
                       key=lambda path: int(path.stem.rsplit("_", 1)[1])):

    print(filename)

Sample output:

pages/ma_page_2.jpg
pages/ma_page_11.jpg
pages/ma_page_13.jpg
pages/ma_page_20.jpg
like image 41
Thierry Lathuille Avatar answered Oct 30 '22 01:10

Thierry Lathuille


One can use natsort lib (pip install natsort. It should look simple too.
[! This works, at least tested for versions 5.5 and 7.1 (current)]

from natsort import natsorted

image_list = Path('./pages').glob('*.jpg')
# convert list of paths to list of string and (naturally)sort it, then convert back to list of paths
image_list = [Path(p) for p in natsorted([str(p) for p in image_list ])]
like image 34
Jaja Avatar answered Oct 30 '22 01:10

Jaja