Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas list comprehension in a dataframe

I would like to pull out the price at the next day's open currently stored in (row + 1) and store it in a new column, if some condition is met.

df['b']=''

df['shift']=''

df['shift']=df['open'].shift(-1)

df['b']=df[x for x in df['shift'] if df["MA10"]>df["MA100"]]
like image 891
Michele Reilly Avatar asked Apr 11 '13 02:04

Michele Reilly


People also ask

Can you put a list in a Pandas DataFrame?

You can insert a list of values into a cell in Pandas DataFrame using DataFrame.at() , DataFrame. iat() , and DataFrame. loc() methods.

Are list comprehensions better than for loops?

List comprehensions are the right tool to create lists — it is nevertheless better to use list(range()). For loops are the right tool to perform computations or run functions. In any case, avoid using for loops and list comprehensions altogether: use array computations instead.

How much faster are list comprehensions?

From the timed cells below, you can see that the list comprehension runs almost twice as fast as the for loop for this calculation. This is one of the primary benefits of using list comprehension.


1 Answers

There are a few approaches. Using apply:

>>> df = pd.read_csv("bondstack.csv")
>>> df["shift"] = df["open"].shift(-1)
>>> df["b"] = df.apply(lambda row: row["shift"] if row["MA10"] > row["MA100"] else np.nan, axis=1)

which produces

>>> df[["MA10", "MA100", "shift", "b"]][:10]
        MA10      MA100      shift          b
0  16.915625  17.405625  16.734375        NaN
1  16.871875  17.358750  17.171875        NaN
2  16.893750  17.317187  17.359375        NaN
3  16.950000  17.279062  17.359375        NaN
4  17.137500  17.254062  18.640625        NaN
5  17.365625  17.229063  18.921875  18.921875
6  17.550000  17.200312  18.296875  18.296875
7  17.681250  17.177500  18.640625  18.640625
8  17.812500  17.159375  18.609375  18.609375
9  17.943750  17.142813  18.234375  18.234375

For a more vectorized approach, you could use

>>> df = pd.read_csv("bondstack.csv")
>>> df["b"] = np.nan
>>> df["b"][df["MA10"] > df["MA100"]] = df["open"].shift(-1)

or my preferred approach:

>>> df = pd.read_csv("bondstack.csv")
>>> df["b"] = df["open"].shift(-1).where(df["MA10"] > df["MA100"])
like image 121
DSM Avatar answered Sep 18 '22 11:09

DSM