I have an OHLC price data set, that I have parsed from CSV into a Pandas dataframe and resampled to 15 min bars: <pre class="prettyprint"><code><class 'pandas.core.frame.DataFrame'> DatetimeIndex: 500047 entries, 1998-05-04 04:45:00 to 2012-08-07 00:15:00 Freq: 15T Data columns: Close 363152 non-null values High 363152 non-null values Low 363152 non-null values Open 363152 non-null values dtypes: float64(4) </code></pre> I would like to add various calculated columns, starting with simple ones such as period Range (H-L) and then booleans to indicate the occurrence of price patterns that I will define - e.g. a hammer candle pattern, for which a sample definition: <pre class="prettyprint"><code>def closed_in_top_half_of_range(h,l,c): return c > l + (h-l)/2 def lower_wick(o,l,c): return min(o,c)-l def real_body(o,c): return abs(c-o) def lower_wick_at_least_twice_real_body(o,l,c): return lower_wick(o,l,c) >= 2 * real_body(o,c) def is_hammer(row): return lower_wick_at_least_twice_real_body(row["Open"],row["Low"],row["Close"]) \ and closed_in_top_half_of_range(row["High"],row["Low"],row["Close"]) </code></pre> Basic problem: how do I map the function to the column, specifically where I would like to reference more than one other column or the whole row or whatever? This post deals with adding two calculated columns off of a single source column, which is close, but not quite it. And slightly more advanced: for price patterns that are determined with reference to more than a single bar (T), how can I reference different rows (e.g. T-1, T-2 etc.) from within the function definition?

The exact code will vary for each of the columns you want to do, but it's likely you'll want to use the <code>map</code> and <code>apply</code> functions. In some cases you can just compute using the existing columns directly, since the columns are Pandas Series objects, which also work as Numpy arrays, which automatically work element-wise for usual mathematical operations. <pre class="prettyprint"><code>>>> d A B C 0 11 13 5 1 6 7 4 2 8 3 6 3 4 8 7 4 0 1 7 >>> (d.A + d.B) / d.C 0 4.800000 1 3.250000 2 1.833333 3 1.714286 4 0.142857 >>> d.A > d.C 0 True 1 True 2 True 3 False 4 False </code></pre> If you need to use operations like max and min within a row, you can use <code>apply</code> with <code>axis=1</code> to apply any function you like to each row. Here's an example that computes <code>min(A, B)-C</code>, which seems to be like your "lower wick": <pre class="prettyprint"><code>>>> d.apply(lambda row: min([row['A'], row['B']])-row['C'], axis=1) 0 6 1 2 2 -3 3 -3 4 -7 </code></pre> Hopefully that gives you some idea of how to proceed. Edit: to compare rows against neighboring rows, the simplest approach is to slice the columns you want to compare, leaving off the beginning/end, and then compare the resulting slices. For instance, this will tell you for which rows the element in column A is less than the next row's element in column C: <pre class="prettyprint"><code>d['A'][:-1] < d['C'][1:] </code></pre> and this does it the other way, telling you which rows have A less than the preceding row's C: <pre class="prettyprint"><code>d['A'][1:] < d['C'][:-1] </code></pre> Doing <code>['A"][:-1]</code> slices off the last element of column A, and doing <code>['C'][1:]</code> slices off the first element of column C, so when you line these two up and compare them, you're comparing each element in A with the C from the following row.

Adding calculated column(s) to a dataframe in pandas

Tags:

python

pandas

I have an OHLC price data set, that I have parsed from CSV into a Pandas dataframe and resampled to 15 min bars:

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 500047 entries, 1998-05-04 04:45:00 to 2012-08-07 00:15:00 Freq: 15T Data columns: Close    363152  non-null values High     363152  non-null values Low      363152  non-null values Open     363152  non-null values dtypes: float64(4)

I would like to add various calculated columns, starting with simple ones such as period Range (H-L) and then booleans to indicate the occurrence of price patterns that I will define - e.g. a hammer candle pattern, for which a sample definition:

def closed_in_top_half_of_range(h,l,c):     return c > l + (h-l)/2  def lower_wick(o,l,c):     return min(o,c)-l  def real_body(o,c):     return abs(c-o)  def lower_wick_at_least_twice_real_body(o,l,c):     return lower_wick(o,l,c) >= 2 * real_body(o,c)  def is_hammer(row):     return lower_wick_at_least_twice_real_body(row["Open"],row["Low"],row["Close"]) \     and closed_in_top_half_of_range(row["High"],row["Low"],row["Close"])

Basic problem: how do I map the function to the column, specifically where I would like to reference more than one other column or the whole row or whatever?

This post deals with adding two calculated columns off of a single source column, which is close, but not quite it.

And slightly more advanced: for price patterns that are determined with reference to more than a single bar (T), how can I reference different rows (e.g. T-1, T-2 etc.) from within the function definition?

285

asked Sep 11 '12 19:09

ultra909

2 Answers

The exact code will vary for each of the columns you want to do, but it's likely you'll want to use the map and apply functions. In some cases you can just compute using the existing columns directly, since the columns are Pandas Series objects, which also work as Numpy arrays, which automatically work element-wise for usual mathematical operations.

>>> d     A   B  C 0  11  13  5 1   6   7  4 2   8   3  6 3   4   8  7 4   0   1  7 >>> (d.A + d.B) / d.C 0    4.800000 1    3.250000 2    1.833333 3    1.714286 4    0.142857 >>> d.A > d.C 0     True 1     True 2     True 3    False 4    False

If you need to use operations like max and min within a row, you can use apply with axis=1 to apply any function you like to each row. Here's an example that computes min(A, B)-C, which seems to be like your "lower wick":

>>> d.apply(lambda row: min([row['A'], row['B']])-row['C'], axis=1) 0    6 1    2 2   -3 3   -3 4   -7

Hopefully that gives you some idea of how to proceed.

Edit: to compare rows against neighboring rows, the simplest approach is to slice the columns you want to compare, leaving off the beginning/end, and then compare the resulting slices. For instance, this will tell you for which rows the element in column A is less than the next row's element in column C:

d['A'][:-1] < d['C'][1:]

and this does it the other way, telling you which rows have A less than the preceding row's C:

d['A'][1:] < d['C'][:-1]

Doing ['A"][:-1] slices off the last element of column A, and doing ['C'][1:] slices off the first element of column C, so when you line these two up and compare them, you're comparing each element in A with the C from the following row.

answered Oct 04 '22 12:10

BrenBarn

You could have is_hammer in terms of row["Open"] etc. as follows

def is_hammer(rOpen,rLow,rClose,rHigh):     return lower_wick_at_least_twice_real_body(rOpen,rLow,rClose) \        and closed_in_top_half_of_range(rHigh,rLow,rClose)

Then you can use map:

df["isHammer"] = map(is_hammer, df["Open"], df["Low"], df["Close"], df["High"])

answered Oct 04 '22 13:10

Andy Hayden

Related questions
                            
                                Python MySQLdb: connection.close() VS. cursor.close()
                            
                                for x in y(): how does this work? [duplicate]
                            
                                Celery difference between concurrency, workers and autoscaling
                            
                                Passing all arguments of a function to another function
                            
                                How to properly create and run concurrent tasks using python's asyncio module?
                            
                                Conda uninstall one package and one package only
                            
                                Concatenate Numpy arrays without copying
                            
                                Python Virtualenv - No module named virtualenvwrapper.hook_loader
                            
                                How do I write to a Python subprocess' stdin?
                            
                                Python: check if an object is a sequence
                            
                                Custom PyCharm docstring stubs (i.e. for google docstring or numpydoc formats)
                            
                                Python: why are * and ** faster than / and sqrt()?
                            
                                Gunicorn, no module named 'myproject
                            
                                Merge pandas dataframes where one value is between two others [duplicate]
                            
                                What is the difference between chain and chain.from_iterable in itertools?
                            
                                Class with too many parameters: better design strategy?
                            
                                Types that define `__eq__` are unhashable?
                            
                                Capturing group with findall?
                            
                                python: naming a module that has a two-word name
                            
                                Matplotlib - label each bin

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With