Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing lists from each cell in pandas dataframe

I have one dataframe that contains lists in many of the individual cells. Some cells do not have lists and are just strings and some are just integers or numbers.

I would like to get rid of all lists in the dataframe (keeping the value or string that was in the list of course). How would I go about this?

Below are two dataframes, one is the "raw data" which has lists and numbers and strings throughout. The second is the clean data that I am hoping to create.

What is the simplest and most efficient way to do this?

import pandas as pd

#create two dataframes, one called raw, one called end result
#raw data
raw_data = {'Name': [['W1'], ['W3'], ['W2'], ['W1'], ['W2'],['W3'],['G1']],
            'EVENT':['E1', 'E2', 'E3', 'E4', 'E5','E6','E1'],
        'DrillDate': [['01/01/2000'], 23, '04/01/2000', ['05/15/2000'], [''],[''],'02/02/2000']}
dfRaw = pd.DataFrame(raw_data, columns = ['Name','EVENT','DrillDate'])
dfRaw


# cleaned data
clean_data = {'Name': ['W1', 'W3', 'W2', 'W1', 'W2','W3','G1'],
            'EVENT':['E1', 'E2', 'E3', 'E4', 'E5','E6','E1'],
        'DrillDate': ['01/01/2000', 23, '04/01/2000', '05/15/2000', '','','02/02/2000']}
dfEndResult = pd.DataFrame(clean_data, columns = ['Name','EVENT','DrillDate'])
dfEndResult
like image 285
brandog Avatar asked Jul 13 '17 17:07

brandog


People also ask

How do I remove a column from a DataFrame list?

DataFrame. drop() method removes the column/columns from the DataFrame, by default it doesn't remove on the existing DataFrame instead it returns a new DataFrame after dropping the columns specified with the drop method. In order to remove columns on the existing DataFrame object use inplace=True param.

How do I delete unwanted rows in pandas?

You can use the drop function to delete rows and columns in a Pandas DataFrame.

How do I remove data from a DataFrame in Python?

To delete a row from a DataFrame, use the drop() method and set the index label as the parameter.

How do you remove an index from a DataFrame in Python?

The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index. The method will also simply insert the dataframe index into a column in the dataframe.


2 Answers

Using, applymap and check the type using isinstance on cell values.

In [666]: dfRaw.applymap(lambda x: x[0] if isinstance(x, list) else x)
Out[666]:
  Name EVENT   DrillDate
0   W1    E1  01/01/2000
1   W3    E2          23
2   W2    E3  04/01/2000
3   W1    E4  05/15/2000
4   W2    E5
5   W3    E6
6   G1    E1  02/02/2000

Update, if you've empty lists and want blank string output.

In [689]: dfRaw.applymap(lambda x: x if not isinstance(x, list) else x[0] if len(x) else '')
Out[689]:
  Name EVENT   DrillDate
0   W1    E1  01/01/2000
1   W3    E2          23
2   W2    E3  04/01/2000
3   W1    E4  05/15/2000
4   W2    E5
5   W3    E6
6   G1    E1  02/02/2000
like image 98
Zero Avatar answered Oct 23 '22 12:10

Zero


I like @JohnGalt's answer better... But

dfRaw.update(dfRaw.DrillDate[dfRaw.DrillDate.apply(type) == list].str[0])
dfRaw.update(dfRaw.Name.str[0])

dfRaw

  Name EVENT   DrillDate
0   W1    E1  01/01/2000
1   W3    E2          23
2   W2    E3  04/01/2000
3   W1    E4  05/15/2000
4   W2    E5            
5   W3    E6            
6   G1    E1  02/02/2000
like image 44
piRSquared Avatar answered Oct 23 '22 12:10

piRSquared