How do I get last modified date from a Hadoop Sequence File?

Tags:

I am using a mapper that converts BinaryFiles (jpegs) to a Hadoop Sequence File (HSF):

    public void map(Object key, Text value, Context context) 
throws IOException, InterruptedException {

    String uri = value.toString().replace(" ", "%20");
    Configuration conf = new Configuration();

    FSDataInputStream in = null;
    try {
        FileSystem fs = FileSystem.get(URI.create(uri), conf);
        in = fs.open(new Path(uri));
        java.io.ByteArrayOutputStream bout = new ByteArrayOutputStream();
        byte buffer[] = new byte[1024 * 1024];

        while( in.read(buffer, 0, buffer.length) >= 0 ) {
            bout.write(buffer);
        }
        context.write(value, new BytesWritable(bout.toByteArray()));

I then have a second mapper that reads the HSF, thus:

public  class ImagePHashMapper extends Mapper<Text, BytesWritable, Text, Text>{

    public void map(Text key, BytesWritable value, Context context) throws IOException,InterruptedException {
        //get the PHash for this specific file
        String PHashStr;
        try {
            PHashStr = calculatePhash(value.getBytes());

and calculatePhash is:

        static String calculatePhash(byte[] imageData) throws NoSuchAlgorithmException {
        //get the PHash for this specific data
        //PHash requires inputstream rather than byte array
        InputStream is = new ByteArrayInputStream(imageData);
        String ph;
        try {
            ImagePHash ih = new ImagePHash();
            ph = ih.getHash(is);
            System.out.println ("file: " + is.toString() + " phash: " +ph);
        } catch (Exception e) {
            e.printStackTrace();
            return "Internal error with ImagePHash.getHash";
        } 

        return ph;

This all works fine, but I want calculatePhash to write out each jpeg's last modified date. I know I can use file.lastModified() to get the last modified date in a file but is there any way to get this in either map or calculatePhash? I'm a noob at Java. TIA!

244

asked Nov 14 '14 18:11

schoon

1 Answers

In Hadoop each files are consist of BLOCK. Generally Hadoop FileSystem are referred the package org.apache.hadoop.fs. If your input files are present in HDFS means you need to import the above package

FileSystem fs = FileSystem.get(URI.create(uri), conf);
in = fs.open(new Path(uri));

org.apache.hadoop.fs.FileStatus fileStatus=fs.getFileStatus(new Path(uri));
long modificationDate = fileStatus.getModificationTime();

Date date=new Date(modificationDate);
SimpleDateFormat df2 = new SimpleDateFormat("dd/MM/yy HH:mm:ss");
String dateText = df2.format(date);

I hope this will help you.

182

answered Sep 20 '22 15:09

ǨÅVËĔŊ RĀǞĴĄŅ

Related questions
                            
                                Number stored as text warning in excel using POI
                            
                                Phone Gap [error] cmd: Command failed with exit code ENOENT
                            
                                JpaItemWriter: no transaction is in progress
                            
                                Configure dropwizard to server index.html for (almost) all routes?
                            
                                Adding 'Getters' and 'Setters' for the sake of unit testing?
                            
                                Recursive type parameters for an almost-cyclic type bound
                            
                                Mongodb select all fields group by one field and sort by another field
                            
                                ConcurrentHashMap parallelismThreshold
                            
                                Java - Parsing strings - String.split() versus Pattern & Matcher
                            
                                What is fast, wait notify or busy wait in Java?
                            
                                How do I make JavaFX MediaView stretch media to fill parent container?
                            
                                Java 8 Functional interface assignment context
                            
                                Implementing a Scala function in Java
                            
                                Spring security oauth2: get username in REST webservice
                            
                                log4j2: Include PID
                            
                                Spring autowired bean causes null pointer
                            
                                Using Gson with a path
                            
                                How to trigger calls to .serializeWithType() of a class implementing JsonSerializable in Jackson?
                            
                                convert iso 4217 numeric currency code to currency name
                            
                                Having difficulty understanding a Java 8 Lambda

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I get last modified date from a Hadoop Sequence File?

Tags:

java

date

hadoop

mapreduce

schoon

People also ask

1 Answers

ǨÅVËĔŊ RĀǞĴĄŅ

Recent Activity

Donate For Us