Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct use of map for mapping a function onto a df, python pandas

Tags:

python

pandas

Searching for awhile now and can't get anything concrete on this. Looking for a best practice answer. My code works, but I'm not sure if I'm introducing problems.

# df['Action'] = list(map(my_function, df.param1)) # Works but older 
    # i think?
df['Action'] = df['param1'].map(my_function)

Both of these produce the same VISIBLE result. I'm not entirely sure how the first, commented out line works, but it is an example I found on the internets that I applied here and it worked. Most other uses of map I've found are like the 2nd line, where it is called from the Series object.

So first question, which of these is better practice and what exactly is the first one doing?

2nd and final question. This is the more important of the two. Map, apply, applymap - not sure which to use here. The first commented out line of code does NOT work, while the second gives me exactly what I want.

def my_function(param1, param2, param3):
    return param1 * param2 * param3 # example

# Can't get this df.map function to work?
# Error map is not attribute of dataframe
# df['New_Col'] = df.map(my_function, df.param1, df.param1.shift(1), 
#    df.param2.shift(1))

# TypeError: my_function takes 3 positional args, but 4 were given
# df['New_Col'] = df.apply(my_function, args=(df.param1, df.param1.shift(1), 
#    df.param2.shift(1)))

# This works, not sure why
df['New_Col'] = list(map(my_function, df.param1, df.param1.shift(1), 
     df.param2.shift(1)))

I'm trying to compute a result that is based off of two columns of the df, from the current and previous rows. I've tried variations on map and apply when called from the df directly (df.map, df.apply) and haven't had success. But if I use the list(map(...)) notation it works great.

Is list(map(...)) acceptable? Which is best practice? Is there a correct way to use apply or map directly from the df object?

Thanks guys, appreciated.

EDIT: MaxU's response below works also. As it is, both of these work:

df['New_Col'] = list(map(my_function, df.param1, df.param1.shift(1), 
        df.param2.shift(1)))
df['New_Col'] = my_function(df.parma1, df.param1.shift(1), df.param2.shift(1))

# This does NOT work
df['New_Col'] = df.apply(my_function, axis=1, args=(df.param1, 
        df.param1.shift(1), df.param2.shift(1)))
# Also does not work
# AttributeError: ("'float' object has no attribute 'shift'", 
    'occurred at index 2000-01-04 00:00:00')
# Will work if I remove the shift(), but not what I need.
df['New_Col'] = df.apply(lambda x: my_function(x.param1, x.param1.shift(1),
    x.param2.shift(1)))    

I'm still unclear as to the proper syntax to use apply here, and if any of these 3 methods are superior to the other (I'm guessing list(map(...)) is the "worst" of the 3 since it iterates and isn't vectorized.

like image 548
RaceFrog Avatar asked Aug 18 '17 15:08

RaceFrog


People also ask

What does map () do in pandas?

Pandas: Series - map() function The map() function is used to map values of Series according to input correspondence. Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.

What does the map () function do why you use it?

The map() function is used to iterate over an array and manipulate or change data items. In React, the map() function is most commonly used for rendering a list of data to the DOM. To use the map() function, attach it to an array you want to iterate over.

Can we use map on DataFrame?

Since DataFrame columns are series, you can use map() to update the column and assign it back to the DataFrame. pandas Series is a one-dimensional array-like object containing a sequence of values. Each of these values is associated with a label called index.


1 Answers

So first question, which of these is better practice and what exactly is the first one doing?

df['Action'] = df['param1'].map(my_function)

is much more idiomatic, faster (vectorized) and more reliable.

2nd and final question. This is the more important of the two. Map, apply, applymap - not sure which to use here. The first commented out line of code does NOT work, while the second gives me exactly what I want.

Pandas does NOT have DataFrame.map() - only Series.map(), so if you need to access multiple columns in your mapping function - you can use DataFrame.apply().

Demo:

df['New_Col'] = df.apply(lamba x: my_function(x.param1,
                                              x.param1.shift(1),
                                              x.param2.shift(1),
                         axis=1) 

or just:

df['New_Col'] = my_function(df.param1, df.param1.shift(1), df.param2.shift(1))
like image 133
MaxU - stop WAR against UA Avatar answered Oct 01 '22 03:10

MaxU - stop WAR against UA