Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Remove a Substring of String in a Dataframe Column?

I have this simplified dataframe:

ID, Date
1 8/24/1995
2 8/1/1899 :00

How can I use the power of pandas to recognize any date in the dataframe which has extra :00 and removes it.

Any idea how to solve this problem?

I have tried this syntax but did not help:

df[df["Date"].str.replace(to_replace="\s:00", value="")]

The Output Should Be Like:

ID, Date
1 8/24/1995
2 8/1/1899
like image 889
MEhsan Avatar asked Aug 01 '16 19:08

MEhsan


People also ask

How do I remove unwanted parts from strings in a column?

To remove unwanted parts from strings in a column with Python Pandas, we can use the map method. to call map with a lambda function that returns the original string with the unwanted parts removed with lstrip and rstrip .


2 Answers

You need to assign the trimmed column back to the original column instead of doing subsetting, and also the str.replace method doesn't seem to have the to_replace and value parameter. It has pat and repl parameter instead:

df["Date"] = df["Date"].str.replace("\s:00", "")

df
#   ID       Date
#0   1  8/24/1995
#1   2   8/1/1899
like image 109
Psidom Avatar answered Sep 22 '22 13:09

Psidom


To apply this to an entire dataframe, I'd stack then unstack

df.stack().str.replace(r'\s:00', '').unstack()

enter image description here

functionalized

def dfreplace(df, *args, **kwargs):
    s = pd.Series(df.values.flatten())
    s = s.str.replace(*args, **kwargs)
    return pd.DataFrame(s.values.reshape(df.shape), df.index, df.columns)

Examples

df = pd.DataFrame(['8/24/1995', '8/1/1899 :00'], pd.Index([1, 2], name='ID'), ['Date'])

dfreplace(df, '\s:00', '')

enter image description here


rng = range(5)
df2 = pd.concat([pd.concat([df for _ in rng]) for _ in rng], axis=1)

df2

enter image description here

dfreplace(df2, '\s:00', '')

enter image description here

like image 42
piRSquared Avatar answered Sep 20 '22 13:09

piRSquared