I've worked in the h2o R package for quite a while, now, but have recently had to move to the python package.
For the most part, an H2OFrame
is designed to work like a pandas DataFrame
object. However, there are several hurdles I haven't managed to get over... in Pandas, if I want to drop some rows:
df.drop([0,1,2], axis=0, inplace=True)
However, I cannot figure out how to do the same with an H2OFrame
:
frame.drop([0,1,2], axis=0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-30-0eff75c48e35> in <module>()
----> frame.drop([0,1,2], axis=0)
TypeError: drop() got an unexpected keyword argument 'axis'
Their github source documents that the drop method is only for columns, so obviously the obvious way isn't working:
def drop(self, i):
"""Drop a column from the current H2OFrame.
Is there a way to drop rows from an H2OFrame
?
To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.
To drop a specific row from the data frame – specify its index value to the Pandas drop function. It can be useful for selection and aggregation to have a more meaningful index. For our sample data, the “name” column would make a good index also, and make it easier to select country rows for deletion from the data.
to drop rows by index simply use this code: df. Here df is the dataframe on which you are working and in place of index type the index number or name. Here is the implementation of code on the jupyter notebook please do read the comments and markdown for step by step explanation.
Delete a Multiple Rows by Index Position in DataFrame As df. drop() function accepts only list of index label names only, so to delete the rows by position we need to create a list of index names from positions and then pass it to drop(). As default value of inPlace is false, so contents of dfObj will not be modified.
Currently, the H2OFrame.drop
method does not support this, but we have added a ticket to add support for dropping multiple rows (and multiple columns).
In the meantime, you can subset rows by an index:
import h2o
h2o.init(nthreads = -1)
hf = h2o.H2OFrame([[1,3],[4,5],[3,0],[5,5]]) # 4 rows x 2 columns
hf2 = hf[[1,3],:] # Keep some of the rows by passing an index
Note that the index list, [1,3]
, is ordered. If you try to pass [3,1]
instead, you will get an error. H2O will not reorder the rows, and this is its way of telling you that. If you have a list of out-of-order indexes, just wrap the sorted
function around it first.
hf2 = hf[sorted([3,3]),:]
Lastly, if you prefer, it's also okay to reassign the new subsetted frame to the original frame name, as follows:
hf = hf[[1,3],:]
Since this is now supported I wanted to highlight the comment that says how to drop by index:
df = df.drop([0,1,2], axis=0)
where if axis = 1 (default), then it drop columns; if axis=0 then drop rows.
drop(index, axis=1)
where index is a list of column indices, column names, or row indices to drop; or a string to drop a single column by name; or an int to drop a single column by index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With