I have one dataframe that contains lists in many of the individual cells. Some cells do not have lists and are just strings and some are just integers or numbers. I would like to get rid of all lists in the dataframe (keeping the value or string that was in the list of course). How would I go about this? Below are two dataframes, one is the "raw data" which has lists and numbers and strings throughout. The second is the clean data that I am hoping to create. What is the simplest and most efficient way to do this? <pre class="prettyprint"><code>import pandas as pd #create two dataframes, one called raw, one called end result #raw data raw_data = {'Name': [['W1'], ['W3'], ['W2'], ['W1'], ['W2'],['W3'],['G1']], 'EVENT':['E1', 'E2', 'E3', 'E4', 'E5','E6','E1'], 'DrillDate': [['01/01/2000'], 23, '04/01/2000', ['05/15/2000'], [''],[''],'02/02/2000']} dfRaw = pd.DataFrame(raw_data, columns = ['Name','EVENT','DrillDate']) dfRaw # cleaned data clean_data = {'Name': ['W1', 'W3', 'W2', 'W1', 'W2','W3','G1'], 'EVENT':['E1', 'E2', 'E3', 'E4', 'E5','E6','E1'], 'DrillDate': ['01/01/2000', 23, '04/01/2000', '05/15/2000', '','','02/02/2000']} dfEndResult = pd.DataFrame(clean_data, columns = ['Name','EVENT','DrillDate']) dfEndResult </code></pre>

Using, <code>applymap</code> and check the type using <code>isinstance</code> on cell values. <pre class="prettyprint"><code>In [666]: dfRaw.applymap(lambda x: x[0] if isinstance(x, list) else x) Out[666]: Name EVENT DrillDate 0 W1 E1 01/01/2000 1 W3 E2 23 2 W2 E3 04/01/2000 3 W1 E4 05/15/2000 4 W2 E5 5 W3 E6 6 G1 E1 02/02/2000 </code></pre> Update, if you've empty lists and want blank string output. <pre class="prettyprint"><code>In [689]: dfRaw.applymap(lambda x: x if not isinstance(x, list) else x[0] if len(x) else '') Out[689]: Name EVENT DrillDate 0 W1 E1 01/01/2000 1 W3 E2 23 2 W2 E3 04/01/2000 3 W1 E4 05/15/2000 4 W2 E5 5 W3 E6 6 G1 E1 02/02/2000 </code></pre>

Removing lists from each cell in pandas dataframe

Tags:

python

list

pandas

dataframe

I have one dataframe that contains lists in many of the individual cells. Some cells do not have lists and are just strings and some are just integers or numbers.

I would like to get rid of all lists in the dataframe (keeping the value or string that was in the list of course). How would I go about this?

Below are two dataframes, one is the "raw data" which has lists and numbers and strings throughout. The second is the clean data that I am hoping to create.

What is the simplest and most efficient way to do this?

import pandas as pd

#create two dataframes, one called raw, one called end result
#raw data
raw_data = {'Name': [['W1'], ['W3'], ['W2'], ['W1'], ['W2'],['W3'],['G1']],
            'EVENT':['E1', 'E2', 'E3', 'E4', 'E5','E6','E1'],
        'DrillDate': [['01/01/2000'], 23, '04/01/2000', ['05/15/2000'], [''],[''],'02/02/2000']}
dfRaw = pd.DataFrame(raw_data, columns = ['Name','EVENT','DrillDate'])
dfRaw


# cleaned data
clean_data = {'Name': ['W1', 'W3', 'W2', 'W1', 'W2','W3','G1'],
            'EVENT':['E1', 'E2', 'E3', 'E4', 'E5','E6','E1'],
        'DrillDate': ['01/01/2000', 23, '04/01/2000', '05/15/2000', '','','02/02/2000']}
dfEndResult = pd.DataFrame(clean_data, columns = ['Name','EVENT','DrillDate'])
dfEndResult

285

asked Jul 13 '17 17:07

brandog

2 Answers

Using, applymap and check the type using isinstance on cell values.

In [666]: dfRaw.applymap(lambda x: x[0] if isinstance(x, list) else x)
Out[666]:
  Name EVENT   DrillDate
0   W1    E1  01/01/2000
1   W3    E2          23
2   W2    E3  04/01/2000
3   W1    E4  05/15/2000
4   W2    E5
5   W3    E6
6   G1    E1  02/02/2000

Update, if you've empty lists and want blank string output.

In [689]: dfRaw.applymap(lambda x: x if not isinstance(x, list) else x[0] if len(x) else '')
Out[689]:
  Name EVENT   DrillDate
0   W1    E1  01/01/2000
1   W3    E2          23
2   W2    E3  04/01/2000
3   W1    E4  05/15/2000
4   W2    E5
5   W3    E6
6   G1    E1  02/02/2000

answered Oct 23 '22 12:10

Zero

I like @JohnGalt's answer better... But

dfRaw.update(dfRaw.DrillDate[dfRaw.DrillDate.apply(type) == list].str[0])
dfRaw.update(dfRaw.Name.str[0])

dfRaw

  Name EVENT   DrillDate
0   W1    E1  01/01/2000
1   W3    E2          23
2   W2    E3  04/01/2000
3   W1    E4  05/15/2000
4   W2    E5            
5   W3    E6            
6   G1    E1  02/02/2000

answered Oct 23 '22 12:10

piRSquared

Related questions
                            
                                Python math range error
                            
                                Segmentation fault when I try to run Anaconda Navigator
                            
                                Retrieving config from a blueprint in Sanic app
                            
                                Splitting a Column on Positive and Negative values
                            
                                Sort Python list using multiple keys
                            
                                Pick random value from multiple lists with equal probability
                            
                                How to upload images using wordpress REST api in python?
                            
                                fastest method to dump numpy array into string
                            
                                Dlib "Error deserializing object of type short"
                            
                                Check if key is missing after loading json from file in python
                            
                                virtualenv activate does not work
                            
                                ImportError: No module named 'ldap' Python 3.5
                            
                                How to encrypt a password field in django
                            
                                Grouping by with Where conditions in Pandas
                            
                                How to print the content of the generator?
                            
                                Python numpy unwrap function
                            
                                Python round() too slow, faster way to reduce precision?
                            
                                What does .div do in Pandas (Python)
                            
                                how to use rowcount in mysql using python
                            
                                How to return a generator from another function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With