I have a pandas Dataframe with one column a list of files <pre class="prettyprint"><code>import pandas as pd df = pd.read_csv('fname.csv') df.head() filename A B C fn1.txt 2 4 5 fn2.txt 1 2 1 fn3.txt .... .... </code></pre> I would like to delete the file extension <code>.txt</code> from each entry in <code>filename</code>. How do I accomplish this? I tried: <pre class="prettyprint"><code>df['filename'] = df['filename'].map(lambda x: str(x)[:-4]) </code></pre> but when I look at the column entries afterwards with <code>df.head()</code>, nothing has changed. How does one do this?

You can use <code>str.rstrip</code> to remove the endings: <pre class="prettyprint"><code>df['filename'] = df['filename'].str.rstrip('.txt') </code></pre> should work

You may want: <code>df['filename'] = df.apply(lambda x: x['filename'][:-4], axis = 1) </code>

use list comprehension <pre class="prettyprint"><code>df['filename'] = [x[:-4] for x in df['filename']] </code></pre>

Remove ends of string entries in pandas DataFrame column

Tags:

dataframe

I have a pandas Dataframe with one column a list of files

import pandas as pd
df = pd.read_csv('fname.csv')

df.head()

filename    A    B    C
fn1.txt   2    4    5
fn2.txt   1    2    1
fn3.txt   ....
....

I would like to delete the file extension .txt from each entry in filename. How do I accomplish this?

I tried:

df['filename'] = df['filename'].map(lambda x: str(x)[:-4])

but when I look at the column entries afterwards with df.head(), nothing has changed.

How does one do this?

483

asked May 03 '16 10:05

5 Answers

I think you can use str.replace with regex .txt$' ( $ - matches the end of the string):

import pandas as pd

df = pd.DataFrame({'A': {0: 2, 1: 1}, 
                   'C': {0: 5, 1: 1}, 
                   'B': {0: 4, 1: 2}, 
                   'filename': {0: "txt.txt", 1: "x.txt"}}, 
                columns=['filename','A','B', 'C'])

print df
  filename  A  B  C
0  txt.txt  2  4  5
1    x.txt  1  2  1

df['filename'] = df['filename'].str.replace(r'.txt$', '')
print df
  filename  A  B  C
0      txt  2  4  5
1        x  1  2  1

df['filename'] = df['filename'].map(lambda x: str(x)[:-4])
print df
  filename  A  B  C
0      txt  2  4  5
1        x  1  2  1

df['filename'] = df['filename'].str[:-4]
print df
  filename  A  B  C
0      txt  2  4  5
1        x  1  2  1

EDIT:

rstrip can remove more characters, if the end of strings contains some characters of striped string (in this case ., t, x):

Example:

print df
  filename  A  B  C
0  txt.txt  2  4  5
1    x.txt  1  2  1

df['filename'] = df['filename'].str.rstrip('.txt')

print df
  filename  A  B  C
0           2  4  5
1           1  2  1

answered Oct 12 '22 15:10

EdChum

You may want:

df['filename'] = df.apply(lambda x: x['filename'][:-4], axis = 1)

answered Oct 12 '22 17:10

update 2021 + speedtest

Starting from pandas 1.4, the equivalent of str.removesuffix, the pandas.Series.str.removesuffix is implemented, so one can use

df['filename'].str.removesuffix('.txt')

speed test

tl;dr: the fastest is

dat["fname"].map(lambda x: x[:-4] if x[-4:] == ".txt" else x)

In the speed test, I wanted to consider the different methods collected in this SO page. I excluded rstrip, because it would strip other than .txt endings too, and as regexp contains conditional, therefore it would be fair to modify the other functions too so that they remove the last 4 chars only if they are .txt.

The testing code is

import pandas as pd
import time

ITER = 10


def rm_re(dat: pd.DataFrame) -> pd.Series:
    """Use regular expression."""
    return dat["fname"].str.replace(r'.txt$', '', regex=True)


def rm_map(dat: pd.DataFrame) -> pd.Series:
    """Use pandas map, find occurrences and remove with []"""
    where = dat["fname"].str.endswith(".txt")
    dat.loc[where, "fname"] = dat["fname"].map(lambda x: x[:-4])
    return dat["fname"]


def rm_map2(dat: pd.DataFrame) -> pd.Series:
    """Use pandas map with lambda conditional."""
    return dat["fname"].map(lambda x: x[:-4] if x[-4:] == ".txt" else x)


def rm_apply_str_suffix(dat: pd.DataFrame) -> pd.Series:
    """Use str method suffix with pandas apply"""
    return dat["fname"].apply(str.removesuffix, args=(".txt",))


def rm_suffix(dat: pd.DataFrame) -> pd.Series:
    """Use pandas removesuffix from version 1.6"""
    return dat["fname"].str.removesuffix(".txt")


functions = [rm_map2, rm_apply_str_suffix, rm_map, rm_suffix, rm_re]
for base in range(12, 23):
    size = 2**base
    data = pd.DataFrame({"fname": ["fn"+str(i) for i in range(size)]})
    data.update(data.sample(frac=.5)["fname"]+".txt")
    for func in functions:
        diff = 0
        for _ in range(ITER):
            data_copy = data.copy()
            start = time.process_time()
            func(data_copy)
            diff += time.process_time() - start

        print(diff, end="\t")

The output is plotted below:

runtime for different functions

It can be seen from the plot that the slowest solution is the regexp, and the fastest is the pandas.Series.map with a conditional. In later versions of pandas, this may change and I'd expect an improvement in pandas.Series.str.removesuffix, as it has a greater potential in vectorization.

Pandas had to be installed from the source as of 2021-11-30, because version 1.4 is in the developement stage only. I installed it by following the instructions from pandas dev repo, by cloning the project and installing with python setup.py install.

My machine:

AMD Ryzen 5 2400G with Radeon Vega Graphics, 3.60 GHz
Windows 10 20H2
Python 3.10.0, pandas.version '1.4.0.dev0+1267.gaee662a7e3', numpy.version '1.21.4'

answered Oct 12 '22 17:10

DanielTuzes

Related questions
                            
                                #!/usr/bin/env python: Getting command not found and Permission Denied
                            
                                How to launch multiple other python scripts all together from one and send them arguments?
                            
                                Accessing UtcTimeStamp from Python via SWIG
                            
                                pyQt4 QGraphicsView on mouse event help needed
                            
                                Getting EOFError along with exceptions when using ftplib
                            
                                override recursive method in python
                            
                                How to display a window on a secondary display in PyQT?
                            
                                AppEngine - When to use a parent relationship?
                            
                                Python save serialization that correctly handles str/unicode?
                            
                                Installing node.js on CentOS 5
                            
                                Running python command line interpreter inside PyDev
                            
                                Find out the unicode script of a character
                            
                                How to pass information using an HTTP redirect (in Django)
                            
                                Are django signals also included inside of the transaction.atomic decorator?
                            
                                Generate a String that matches a RegEx in Python [duplicate]
                            
                                Sans-serif math with latex in matplotlib
                            
                                How do I type hint a filename in a function?
                            
                                Numpy python find minimum value of each column and subtract this value from each column
                            
                                How to read UTF-8 files with Pandas?
                            
                                Passing command Line argument to Python script within Eclipse(Pydev)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove ends of string entries in pandas DataFrame column

Tags:

python

string-matching

pandas

dataframe

ShanZhengYang

People also ask

5 Answers

jezrael

EdChum

Paweł Kordek

Blue Moon

update 2021 + speedtest

speed test

DanielTuzes

Recent Activity

Donate For Us