Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Conversion Error while applying a function to each row in pandas Python

I have a data frame in pandas in python which resembles something like this -

    contest_login_count  contest_participation_count  ipn_ratio
0                    1                            1   0.000000
1                    3                            3   0.083333
2                    3                            3   0.000000
3                    3                            3   0.066667
4                    5                           13   0.102804
5                    2                            3   0.407407
6                    1                            3   0.000000
7                    1                            2   0.000000
8                   53                           91   0.264151
9                    1                            2   0.000000

Now I want to apply a function to each row of this dataframe The function is written as this -

def findCluster(clusterModel,data):
    return clusterModel.predict(data)

I apply this function to each row in this manner -

df_fil.apply(lambda x : findCluster(cluster_all,x.reshape(1,-1)),axis=1)

When I run this code, I get a warning saying -

DataConversionWarning: Data with input dtype object was converted to float64.

warnings.warn(msg, DataConversionWarning)

This warning is printed once for each row. Since, I have around 450K rows in my data frame, my computer hangs while printing all these warning messages that too on ipython notebook.

But to test my function I created a dummy dataframe and tried applying the same function on that and it works well. Here is the code for that -

t = pd.DataFrame([[10.35,100.93,0.15],[10.35,100.93,0.15]])
t.apply(lambda x:findCluster(cluster_all,x.reshape(1,-1)),axis=1)

The output to this is -

   0  1  2
0  4  4  4
1  4  4  4

Can anyone suggest what am I doing wrong or what can I change to make this error go away?

like image 560
dragster Avatar asked Aug 29 '16 19:08

dragster


People also ask

How do I apply custom function to pandas data frame for each row?

Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.

How to convert data type in Python pandas?

The best way to convert one or more columns of a DataFrame to numeric values is to use pandas.to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

What is TypeError in pandas?

The Python TypeError is an exception that occurs when the data type of an object in an operation is inappropriate. This can happen when an operation is performed on an object of an incorrect type, or it is not supported for the object.


1 Answers

I think there is problem dtype of some column is not float.

You need cast it by astype:

df['colname'] = df['colname'].astype(float)
like image 79
jezrael Avatar answered Sep 19 '22 00:09

jezrael