Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the 3 most recently modified files in a long list of files

I have a file list which I want to sort and extract the top 3 last modified.

Constraint: I cannot use Java 7 due to compatibility issues on downstream apps

My current options

Solution 1

File[] files = directory.listFiles();    
Arrays.sort(files, new Comparator<File>(){
    public int compare(File f1, File f2)
    {
        return Long.valueOf(f1.lastModified()).compareTo(f2.lastModified());
    } });

Solution 2

public static void sortFilesDesc(File[] files) {
  Arrays.sort(files, new Comparator() {
    public int compare(Object o1, Object o2) {
      if ((File)o1).lastModified().compareTo((File)o2).lastModified()) {
        return -1;
      } else if (((File) o1).lastModified() < ((File) o2).lastModified()) {
        return +1;
      } else {
        return 0;
      }
    }
  });
}

Problem

The above two solution takes more time to execute & memory. My file list consists of some 300 tar files with 200MB size each. so it is consuming more time & memory.

Is there is any way to efficiently handle this?

Every compare operation uses a file object which is of high memory is there is any way to release the memory and handle this effectively?

like image 923
Wills Avatar asked Jan 17 '13 05:01

Wills


2 Answers

You can do it much faster.

Arrays.sort(...) uses "quick sort", which takes ~ n * ln(n) operations.

This example takes only one iteration trough the whole array, which is ~ n operations.

public static void sortFilesDesc(File[] files) {        
    File firstMostRecent = null;
    File secondMostRecent = null;
    File thirdMostRecent = null;
    for (File file : files) {
        if ((firstMostRecent == null)
                || (firstMostRecent.lastModified() < file.lastModified())) {
            thirdMostRecent = secondMostRecent;
            secondMostRecent = firstMostRecent;             
            firstMostRecent = file;
        } else if ((secondMostRecent == null)
                || (secondMostRecent.lastModified() < file.lastModified())) {
            thirdMostRecent = secondMostRecent;
            secondMostRecent = file;
        } else if ((thirdMostRecent == null)
                || (thirdMostRecent.lastModified() < file.lastModified())) {
            thirdMostRecent = file;
        }
    }
} 

On small numbers of files you won't see much difference, but even for tens of files the difference will be significant, for bigger numbers - dramatic.

The code to check the algorithm (please put in a correct files structure):

package com.hk.basicjava.clasload.tests2;

import java.io.File;
import java.util.Date;


class MyFile extends File {

    private long time = 0; 

    public MyFile(String name, long timeMills) {
        super(name);
        time = timeMills;
    }
    @Override
    public long lastModified() {
        return time;
    }
}

public class Files {

    /**
     * @param args
     */
    public static void main(String[] args) {

        File[] files = new File[5]; 
        files[0] = new MyFile("File1", new Date(2013,1,15, 7,0).getTime());
        files[1] = new MyFile("File2", new Date(2013,1,15, 7,40).getTime());
        files[2] = new MyFile("File3", new Date(2013,1,15, 5,0).getTime());
        files[3] = new MyFile("File4", new Date(2013,1,15, 10,0).getTime());
        files[4] = new MyFile("File5", new Date(2013,1,15, 4,0).getTime());
        sortFilesDesc(files);
    }

    public static void sortFilesDesc(File[] files) {        
        File firstMostRecent = null;
        File secondMostRecent = null;
        File thirdMostRecent = null;
        for (File file : files) {
            if ((firstMostRecent == null)
                    || (firstMostRecent.lastModified() < file.lastModified())) {
                thirdMostRecent = secondMostRecent;
                secondMostRecent = firstMostRecent;             
                firstMostRecent = file;
            } else if ((secondMostRecent == null)
                    || (secondMostRecent.lastModified() < file.lastModified())) {
                thirdMostRecent = secondMostRecent;
                secondMostRecent = file;
            } else if ((thirdMostRecent == null)
                    || (thirdMostRecent.lastModified() < file.lastModified())) {
                thirdMostRecent = file;
            }
        }
        System.out.println("firstMostRecent : " + firstMostRecent.getName());
        System.out.println("secondMostRecent : " + secondMostRecent.getName());
        System.out.println("thirdMostRecent : " + thirdMostRecent.getName());
    } 

}
like image 157
Alex Kreutznaer Avatar answered Sep 27 '22 20:09

Alex Kreutznaer


You have to check the lastModified of every file, you cannot change that. What you don't need to do is to sort all the elements just to get the top 3. If you can use Guava, you could use Ordering.greatestOf (which uses a good algorithm):

Ordering<File> ordering = Ordering.from( new Comparator(){
        public int compare(File f1, File f2)
        {
            return Long.valueOf(f1.lastModified()).compareTo(f2.lastModified());
        });

List<File> max3 = ordering.greatestOf(Arrays.asList(directory.listFiles()), 3);
like image 29
Aleksander Blomskøld Avatar answered Sep 27 '22 22:09

Aleksander Blomskøld