Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame ApplyMap method

Tags:

python

pandas

I wanted to try out the functionality of applymap method of Pandas DataFrame object. Here is the Use case:

Let say my DataFrame df1 is as follows:

Age   ID       Name
0   27  101    John
1   22  102    Bob
2   19  103    Alok
3   27  104    Tom
4   32  105    Matt
5   19  106    Steve
6    5  107    Tom
7   55  108    Dick
8   67  109    Harry

Now I want to create a flag variable with the logic that if length of element is less than 2, then flag=1 else flag=0.

In order to run this element-wise, I wanted to use applymap method. So for that I created a user defined function as follows:

def f(x): 
   if len(str(x))>2: 
       df1['Flag']=1
   else: 
      df1['Flag']=0

Then I ran df1.applymap(f) which gave:

    Age    ID  Name
0  None  None  None
1  None  None  None
2  None  None  None
3  None  None  None
4  None  None  None
5  None  None  None
6  None  None  None
7  None  None  None
8  None  None  None

instead of creating a flag variable with the flag value. How can I achieve the desired functionality using applymap?

Can't we use the DataFrame variable name or pandas statement inside the user defined function? I.e., is df1['Flag'] valid inside the definition of f()?

like image 491
Baktaawar Avatar asked Feb 12 '14 11:02

Baktaawar


1 Answers

the function f(x) is not special to pandas -- it is just a regular python function. So the only data in scope within f is the variable x Other members of df1 are not available.

From applymap docs:

func : function

Python function, returns a single value from a single value

So you could try this:

def f(x):
    if len(str(x)) <= 3: return 1
    else: return 0

Outputting 1/0 for each element in the frame when applied:

df1.applymap(f)

>>>
   Age  ID  Name
0    1   1     0
1    1   1     1
2    1   1     0
3    1   1     1
4    1   1     0
5    1   1     0
6    1   1     1
7    1   1     0
8    1   1     0

To use the result to add another variable in each row, you need one value per row , e.g.,

df1['Flag'] = df1.applymap(f).all(axis=1).astype(bool)

>>> df1

   Age   ID   Name   Flag
0   27  101   John  False
1   22  102    Bob   True
2   19  103   Alok  False
3   27  104    Tom   True
4   32  105   Matt  False
5   19  106  Steve  False
6    5  107    Tom   True
7   55  108   Dick  False
8   67  109  Harry  False

Also check out https://stackoverflow.com/a/19798528/1643946 which covers apply, map as well as applymap.

like image 52
Bonlenfum Avatar answered Nov 03 '22 09:11

Bonlenfum