I have a column FileName in pandas dataframe which consists of strings containing filenames of the form . The filename can contain dots('.') in them. For example, a a.b.c.d.txt is a txt file. I just want to have another column FileType column containing only the file extensions. 
Sample DataFrame:
FileName
a.b.c.d.txt
j.k.l.exe
After processing:
FileName    FileType
a.b.c.d.txt txt
j.k.l.exe   exe
I tried the following:
X['FileType'] = X.FileName.str.split(pat='.')
This help me split the string on .. But how do I get the last element i.e. the file extension?
Something like
X['FileType'] = X.FileName.str.split(pat='.')[-1]
X['FileType'] = X.FileName.str.split(pat='.').pop(-1)
did not give the desired output.
Use the os. path. splitext() method to split a filename on the name and extension, e.g. filename, extension = os.
We can use Python os module splitext() function to get the file extension. This function splits the file path into a tuple having two values - root and extension.
If you need to create or unpack lists in your DataFrames, you can make use of the Series. str. split() and df. explode() methods respectively.
ext is the extension of file file.
Option 1apply
df['FileType'] = df.FileName.apply(lambda x: x.split('.')[-1])
Option 2
Use str twice
df['FileType'] = df.FileName.str.split('.').str[-1]
Option 2b
Use rsplit (thanks @cᴏʟᴅsᴘᴇᴇᴅ)
df['FileType'] = df.FileName.str.rsplit('.', 1).str[-1]
All result in:
      FileName FileType
0  a.b.c.d.txt      txt
1    j.k.l.exe      exe
Python 3.6.4, Pandas 0.22.0 
If you don't want to split the extension from the filename, then I would recommend a list comprehension—
str.rsplit
df['FileType'] = [f.rsplit('.', 1)[-1] for f in df.FileName.tolist()]
df
      FileName FileType
0  a.b.c.d.txt      txt
1    j.k.l.exe      exe
If you want to split the path and the filename, there are a couple of options.
os.path.splitextimport os
pd.DataFrame(
    [os.path.splitext(f) for f in df.FileName], 
    columns=['Name', 'Type']
)
 
      Name  Type
0  a.b.c.d  .txt
1    j.k.l  .exe
str.extractdf.FileName.str.extract(r'(?P<FileName>.*)(?P<FileType>\..*)', expand=True)
      Name  Type
0  a.b.c.d  .txt
1    j.k.l  .exe
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With