Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using pandas .map to change values

I'm trying to change the strings in my data do numerical value using map function.

This is the data:

    label   sms_message
0   ham     Go until jurong point, crazy.. Available only ...
1   ham     Ok lar... Joking wif u oni...
2   spam    Free entry in 2 a wkly comp to win FA Cup fina...
3   ham     U dun say so early hor... U c already then say...
4   ham     Nah I don't think he goes to usf, he lives aro...

I'm trying to change 'spam' to 1 and 'ham' to 0 using this:

df['label'] = df.label.map({'ham':0, 'spam':1})

But the result is:

    label   sms_message
0   NaN     Go until jurong point, crazy.. Available only ...
1   NaN     Ok lar... Joking wif u oni...
2   NaN     Free entry in 2 a wkly comp to win FA Cup fina...
3   NaN     U dun say so early hor... U c already then say...
4   NaN     Nah I don't think he goes to usf, he lives aro...

Do anyone can identify the problem?

like image 306
CAB Avatar asked Dec 18 '25 08:12

CAB


2 Answers

You are correct, I think you executed the same statement twice (1 after 1). The following statements executed on Python interactive terminal clarifies that.

Note: If you pass dictionary, map() replaces all values from Series with NaN if it does not match with dictionary's keys (I think, you have also done the same i.e. executing the statement twice). Check pandas map(), apply().

Pandas documentation note: when arg is a dictionary, values in Series that are not in the dictionary (as keys) are converted to NaN.

>>> import pandas as pd
>>>
>>> d = {
...     "label": ["ham", "ham", "spam", "ham", "ham"],
...     "sms_messsage": [
...     "Go until jurong point, crazy.. Available only ...",
...     "Ok lar... Joking wif u oni...",
...     "Free entry in 2 a wkly comp to win FA Cup fina...",
...     "U dun say so early hor... U c already then say...",
...     "Nah I don't think he goes to usf, he lives aro..."
...    ]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
  label                                       sms_messsage
0   ham  Go until jurong point, crazy.. Available only ...
1   ham                      Ok lar... Joking wif u oni...
2  spam  Free entry in 2 a wkly comp to win FA Cup fina...
3   ham  U dun say so early hor... U c already then say...
4   ham  Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
   label                                       sms_messsage
0      0  Go until jurong point, crazy.. Available only ...
1      0                      Ok lar... Joking wif u oni...
2      1  Free entry in 2 a wkly comp to win FA Cup fina...
3      0  U dun say so early hor... U c already then say...
4      0  Nah I don't think he goes to usf, he lives aro...
>>>
>>> df['label'] = df.label.map({'ham':0, 'spam':1})
>>> df
   label                                       sms_messsage
0    NaN  Go until jurong point, crazy.. Available only ...
1    NaN                      Ok lar... Joking wif u oni...
2    NaN  Free entry in 2 a wkly comp to win FA Cup fina...
3    NaN  U dun say so early hor... U c already then say...
4    NaN  Nah I don't think he goes to usf, he lives aro...
>>>

Other ways to obtain the same result

>>> import pandas as pd
>>>
>>> d = {
...     "label": ['spam', 'ham', 'ham', 'ham', 'spam'],
...     "sms_message": ["M1", "M2", "M3", "M4", "M5"]
... }
>>>
>>> df = pd.DataFrame(d)
>>> df
  label sms_message
0  spam          M1
1   ham          M2
2   ham          M3
3   ham          M4
4  spam          M5
>>>

1st way - using map() with dictionary parameter

>>> new_values = {'spam': 1, 'ham': 0}
>>>
>>> df
  label sms_message
0  spam          M1
1   ham          M2
2   ham          M3
3   ham          M4
4  spam          M5
>>>
>>> df.label = df.label.map(new_values)
>>> df
   label sms_message
0      1          M1
1      0          M2
2      0          M3
3      0          M4
4      1          M5
>>>

2nd way - using map() with function parameter

>>> df.label = df.label.map(lambda v: 0 if v == 'ham' else 1)
>>> df
   label sms_message
0      1          M1
1      0          M2
2      0          M3
3      0          M4
4      1          M5
>>>

3rd way - using apply() with function parameter

>>> df.label = df.label.apply(lambda v: 0 if v == "ham" else 1)
>>>
>>> df
   label sms_message
0      1          M1
1      0          M2
2      0          M3
3      0          M4
4      1          M5
>>>

Thank you.

like image 68
hygull Avatar answered Dec 19 '25 22:12

hygull


Maybe your issue is with read_table function.

Try do it:

df = pd.read_table('smsspamcollection/SMSSpamCollection',
                   sep='\t', 
                   header=None,
                   names=['label', 'sms_message'])
like image 35
Lucas André da Silva Avatar answered Dec 19 '25 23:12

Lucas André da Silva



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!