Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying a function to pandas dataframe

Tags:

python

pandas

I'm trying to perform some text analysis on a pandas dataframe, but am having some trouble with the flow. Alternatively, maybe I just not getting it... PS - I'm a python beginner-ish.

Dataframe example:

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})


     Document   Type
0    a          7
1    1          E
2    a          Y
3    6          6
4    7          C
5    N          9

I'm trying to build a flow that if 'Document' or 'Type' is a number or not, do something.

Here is a simple function to return whether 'Document' is a number (edited to show how I am trying some if/then flow on the field):

def fn(dfname):
    if dfname['Document'].apply(str.isdigit):
        dfname['Check'] = 'Y'
    else:
        dfname['Check'] = 'N'

Now, I apply it to the dataframe:

df.apply(fn(df), axis=0)

I get this error back:

TypeError: ("'NoneType' object is not callable", u'occurred at index Document')

From the error message, it looks that I am not handling the index correctly. Can anyone see where I am going wrong?

Lastly - this may or may not be related to the issue, but I am really struggling with how indexes work in pandas. I think I have run into more issues with the index than any other issue.

like image 608
mikebmassey Avatar asked Jun 12 '26 13:06

mikebmassey


2 Answers

You're close.

The thing you have to realize about apply is you need to write functions that operate on scalar values and return the result that you want. With that in mind:

import pandas as pd

df = pd.DataFrame({'Document' : ['a','1','a', '6','7','N'], 'Type' : ['7', 'E', 'Y', '6', 'C', '9']})

def fn(val):
    if str(val).isdigit():
        return 'Y'
    else:
        return 'N'

df['check'] = df['Document'].apply(fn)

gives me:

  Document Type check
0        a    7     N
1        1    E     Y
2        a    Y     N
3        6    6     Y
4        7    C     Y
5        N    9     N

Edit:

Just want to clarify that when using apply on a series, you should write function that accept scalar values. When using apply on a DataFrame, however, the functions should accept either full columns (when axis=0 -- the default) or full rows (when axis=1).

like image 162
Paul H Avatar answered Jun 14 '26 03:06

Paul H


It's worth noting that you can do this (without using apply, so more efficiently) using str.contains:

In [11]: df['Document'].str.contains('^\d+$')
Out[11]: 
0    False
1     True
2    False
3     True
4     True
5    False
Name: Document, dtype: bool

Here the regex ^ and $ mean start and end respectively.

like image 36
Andy Hayden Avatar answered Jun 14 '26 01:06

Andy Hayden



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!