I have files like this in S3:
1-2013-08-22-22-something
2-2013-08-22-22-something
etc
without srcPattern I can get all of the files from the bucket easily but I want to get a specific prefix, for example all of the 1's. I've tried using srcPattern but for some reason it's not picking up any of the files.
My current command is:
elastic-mapreduce --jobflow $JOBFLOW --jar /home/hadoop/lib/emr-s3distcp-1.0.jar \
--args '--src,s3n://some-bucket/,--dest,hdfs:///hdfs-input,--srcPattern,[0-9]-.*' \
--step-name "copying over s3 files"
Turns out you need .* in front of the regex
for example I needed
.*[0-9]-.*
I'm guessing because the source pattern also includes the bucket name?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With