Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Only keep df column values that contain a string from list of string

I Have a list of strings like this:

stringlist = [JAN, jan, FEB, feb, mar]

And I have a dataframe that looks like this:

**date**            **value**
01MAR16                1
05FEB16                12
10jan17                5
10mar15                9
03jan05                7
04APR12                3

I only want to keep the dates which contain one string from stringlist in it, the result should look like this:

**date**            **value**
NA                     1
05FEB16                12
10jan17                5
10mar15                9
03jan05                7
NA                     3

Im new to using regular expression so having some trouble wrapping my head around it, would appreciate some help.

like image 492
ljourney Avatar asked May 02 '21 20:05

ljourney


People also ask

How do you check if values in DataFrame column are in a list?

You can check if a column contains/exists a particular value (string/int), list of multiple values in pandas DataFrame by using pd. series() , in operator, pandas.

Can pandas DataFrame hold string?

Pandas' different string dtypes DataFrame , have a dtype: the type of object stored inside it. By default, Pandas will store strings using the object dtype, meaning it store strings as NumPy array of pointers to normal Python object.

Can a DataFrame cell contains a list Python?

Conclusion. By using df.at() , df. iat() , df. loc[] method you can insert a list of values into a pandas DataFrame cell.

How do I filter a Dataframe based on column COL1 values?

The following is the syntax: Here, allowed_values is the list of values of column Col1 that you want to filter the dataframe for. Any row with its Col1 value not present in the given list is filtered out. Let’s look at an example to see the filtering in action.

How to check if column has a value from a string?

Syntax: dataframe [dataframe [‘column_name’].isin (list_of_strings)] column_name is the column to check the list of strings present in that column Example: Python program to check if pandas column has a value from a list of strings Here NumPy also uses isin () operator to check if pandas column has a value from a list of strings.

How to check if pandas column has a value from list_of_value?

Example: Python program to check if pandas column has a value from a list of strings Here NumPy also uses isin () operator to check if pandas column has a value from a list of strings. Syntax: dataframe [~numpy.isin (dataframe [‘column’], list_of_value)]

How to filter a pandas Dataframe on a set of values?

How to filter a pandas dataframe on a set of values? To filter rows of a dataframe on a set or collection of values you can use the isin () membership function. This way, you can have only the rows that you’d like to keep based on the list values.


Video Answer


3 Answers

stringlist = ["JAN", "jan", "FEB", "feb", "mar"]

m = df["date"].str.contains("|".join(stringlist))
df.loc[~m, "date"] = np.nan
print(df)

Prints:

      date  value
0      NaN      1
1  05FEB16     12
2  10jan17      5
3  10mar15      9
4  03jan05      7
5      NaN      3
like image 116
Andrej Kesely Avatar answered Oct 29 '22 06:10

Andrej Kesely


You can use the Series.str.contains method as demonstrated here: Select by partial string from a pandas DataFrame:

import pandas as pd

df = pd.DataFrame({'date': ['NA', '05FEB16', '10jan17', '10mar15', '03jan05', 'NA'],
                   'value': [1, 12, 5, 9, 7, 3]})

stringlist = ['JAN', 'jan', 'FEB', 'feb', 'mar']

print(df[df['date'].str.contains('|'.join(stringlist))])

Output:

      date  value
1  05FEB16     12
2  10jan17      5
3  10mar15      9
4  03jan05      7
like image 1
Ann Zen Avatar answered Oct 29 '22 05:10

Ann Zen


Another play on regular expressions is to extract the characters (assumption here is that the months will alway be sandwiched between day and year), then check if each extract can be found in stringlist:

(df.assign(months = df.date.str.extract(r'([a-zA-Z]+)'), 
           date = lambda df: df.where(df.months.isin(stringlist))
          )
   .iloc[:, :-1]
)

      date  value
0      NaN      1
1  05FEB16     12
2  10jan17      5
3  10mar15      9
4  03jan05      7
5      NaN      3
like image 1
sammywemmy Avatar answered Oct 29 '22 05:10

sammywemmy