This is a question regarding Unix shell scripting (any shell), but any other "standard" scripting language solution would also be appreciated:
I have a directory full of files where the filenames are hash values like this:
fd73d0cf8ee68073dce270cf7e770b97
fec8047a9186fdcc98fdbfc0ea6075ee
These files have different original file types such as png, zip, doc, pdf etc.
Can anybody provide a script that would rename the files so they get their appropriate file extension, probably based on the output of the file
command?
J.F. Sebastian's script will work for both ouput of the filenames as well as the actual renaming.
Here's mimetypes' version:
#!/usr/bin/env python
"""It is a `filename -> filename.ext` filter.
`ext` is mime-based.
"""
import fileinput
import mimetypes
import os
import sys
from subprocess import Popen, PIPE
if len(sys.argv) > 1 and sys.argv[1] == '--rename':
do_rename = True
del sys.argv[1]
else:
do_rename = False
for filename in (line.rstrip() for line in fileinput.input()):
output, _ = Popen(['file', '-bi', filename], stdout=PIPE).communicate()
mime = output.split(';', 1)[0].lower().strip()
ext = mimetypes.guess_extension(mime, strict=False)
if ext is None:
ext = os.path.extsep + 'undefined'
filename_ext = filename + ext
print filename_ext
if do_rename:
os.rename(filename, filename_ext)
Example:
$ ls *.file? | python add-ext.py --rename avi.file.avi djvu.file.undefined doc.file.dot gif.file.gif html.file.html ico.file.obj jpg.file.jpe m3u.file.ksh mp3.file.mp3 mpg.file.m1v pdf.file.pdf pdf.file2.pdf pdf.file3.pdf png.file.png tar.bz2.file.undefined
Following @Phil H's response that follows @csl' response:
#!/usr/bin/env python
"""It is a `filename -> filename.ext` filter.
`ext` is mime-based.
"""
# Mapping of mime-types to extensions is taken form here:
# http://as3corelib.googlecode.com/svn/trunk/src/com/adobe/net/MimeTypeMap.as
mime2exts_list = [
["application/andrew-inset","ez"],
["application/atom+xml","atom"],
["application/mac-binhex40","hqx"],
["application/mac-compactpro","cpt"],
["application/mathml+xml","mathml"],
["application/msword","doc"],
["application/octet-stream","bin","dms","lha","lzh","exe","class","so","dll","dmg"],
["application/oda","oda"],
["application/ogg","ogg"],
["application/pdf","pdf"],
["application/postscript","ai","eps","ps"],
["application/rdf+xml","rdf"],
["application/smil","smi","smil"],
["application/srgs","gram"],
["application/srgs+xml","grxml"],
["application/vnd.adobe.apollo-application-installer-package+zip","air"],
["application/vnd.mif","mif"],
["application/vnd.mozilla.xul+xml","xul"],
["application/vnd.ms-excel","xls"],
["application/vnd.ms-powerpoint","ppt"],
["application/vnd.rn-realmedia","rm"],
["application/vnd.wap.wbxml","wbxml"],
["application/vnd.wap.wmlc","wmlc"],
["application/vnd.wap.wmlscriptc","wmlsc"],
["application/voicexml+xml","vxml"],
["application/x-bcpio","bcpio"],
["application/x-cdlink","vcd"],
["application/x-chess-pgn","pgn"],
["application/x-cpio","cpio"],
["application/x-csh","csh"],
["application/x-director","dcr","dir","dxr"],
["application/x-dvi","dvi"],
["application/x-futuresplash","spl"],
["application/x-gtar","gtar"],
["application/x-hdf","hdf"],
["application/x-javascript","js"],
["application/x-koan","skp","skd","skt","skm"],
["application/x-latex","latex"],
["application/x-netcdf","nc","cdf"],
["application/x-sh","sh"],
["application/x-shar","shar"],
["application/x-shockwave-flash","swf"],
["application/x-stuffit","sit"],
["application/x-sv4cpio","sv4cpio"],
["application/x-sv4crc","sv4crc"],
["application/x-tar","tar"],
["application/x-tcl","tcl"],
["application/x-tex","tex"],
["application/x-texinfo","texinfo","texi"],
["application/x-troff","t","tr","roff"],
["application/x-troff-man","man"],
["application/x-troff-me","me"],
["application/x-troff-ms","ms"],
["application/x-ustar","ustar"],
["application/x-wais-source","src"],
["application/xhtml+xml","xhtml","xht"],
["application/xml","xml","xsl"],
["application/xml-dtd","dtd"],
["application/xslt+xml","xslt"],
["application/zip","zip"],
["audio/basic","au","snd"],
["audio/midi","mid","midi","kar"],
["audio/mpeg","mp3","mpga","mp2"],
["audio/x-aiff","aif","aiff","aifc"],
["audio/x-mpegurl","m3u"],
["audio/x-pn-realaudio","ram","ra"],
["audio/x-wav","wav"],
["chemical/x-pdb","pdb"],
["chemical/x-xyz","xyz"],
["image/bmp","bmp"],
["image/cgm","cgm"],
["image/gif","gif"],
["image/ief","ief"],
["image/jpeg","jpg","jpeg","jpe"],
["image/png","png"],
["image/svg+xml","svg"],
["image/tiff","tiff","tif"],
["image/vnd.djvu","djvu","djv"],
["image/vnd.wap.wbmp","wbmp"],
["image/x-cmu-raster","ras"],
["image/x-icon","ico"],
["image/x-portable-anymap","pnm"],
["image/x-portable-bitmap","pbm"],
["image/x-portable-graymap","pgm"],
["image/x-portable-pixmap","ppm"],
["image/x-rgb","rgb"],
["image/x-xbitmap","xbm"],
["image/x-xpixmap","xpm"],
["image/x-xwindowdump","xwd"],
["model/iges","igs","iges"],
["model/mesh","msh","mesh","silo"],
["model/vrml","wrl","vrml"],
["text/calendar","ics","ifb"],
["text/css","css"],
["text/html","html","htm"],
["text/plain","txt","asc"],
["text/richtext","rtx"],
["text/rtf","rtf"],
["text/sgml","sgml","sgm"],
["text/tab-separated-values","tsv"],
["text/vnd.wap.wml","wml"],
["text/vnd.wap.wmlscript","wmls"],
["text/x-setext","etx"],
["video/mpeg","mpg","mpeg","mpe"],
["video/quicktime","mov","qt"],
["video/vnd.mpegurl","m4u","mxu"],
["video/x-flv","flv"],
["video/x-msvideo","avi"],
["video/x-sgi-movie","movie"],
["x-conference/x-cooltalk","ice"]]
#NOTE: take only the first extension
mime2ext = dict(x[:2] for x in mime2exts_list)
if __name__ == '__main__':
import fileinput, os.path
from subprocess import Popen, PIPE
for filename in (line.rstrip() for line in fileinput.input()):
output, _ = Popen(['file', '-bi', filename], stdout=PIPE).communicate()
mime = output.split(';', 1)[0].lower().strip()
print filename + os.path.extsep + mime2ext.get(mime, 'undefined')
Here's a snippet for old python's versions (not tested):
#NOTE: take only the first extension
mime2ext = {}
for x in mime2exts_list:
mime2ext[x[0]] = x[1]
if __name__ == '__main__':
import os
import sys
# this version supports only stdin (part of fileinput.input() functionality)
lines = sys.stdin.read().split('\n')
for line in lines:
filename = line.rstrip()
output = os.popen('file -bi ' + filename).read()
mime = output.split(';')[0].lower().strip()
try: ext = mime2ext[mime]
except KeyError:
ext = 'undefined'
print filename + '.' + ext
It should work on Python 2.3.5 (I guess).
You can use
file -i filename
to get a MIME-type. You could potentially lookup the type in a list and then append an extension. You can find a list of MIME-types and example file extensions on the net.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With