Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to drop rows in an H2OFrame?

Tags:

python

h2o

I've worked in the h2o R package for quite a while, now, but have recently had to move to the python package.

For the most part, an H2OFrame is designed to work like a pandas DataFrame object. However, there are several hurdles I haven't managed to get over... in Pandas, if I want to drop some rows:

df.drop([0,1,2], axis=0, inplace=True)

However, I cannot figure out how to do the same with an H2OFrame:

frame.drop([0,1,2], axis=0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-30-0eff75c48e35> in <module>()
----> frame.drop([0,1,2], axis=0)

TypeError: drop() got an unexpected keyword argument 'axis'

Their github source documents that the drop method is only for columns, so obviously the obvious way isn't working:

def drop(self, i):
    """Drop a column from the current H2OFrame.

Is there a way to drop rows from an H2OFrame?

like image 548
TayTay Avatar asked Jul 12 '16 17:07

TayTay


People also ask

How do you drop rows from a data frame?

To drop a row or column in a dataframe, you need to use the drop() method available in the dataframe. You can read more about the drop() method in the docs here. Rows are labelled using the index number starting with 0, by default. Columns are labelled using names.

How do I drop specific rows?

To drop a specific row from the data frame – specify its index value to the Pandas drop function. It can be useful for selection and aggregation to have a more meaningful index. For our sample data, the “name” column would make a good index also, and make it easier to select country rows for deletion from the data.

How do I drop a row in Jupyter notebook?

to drop rows by index simply use this code: df. Here df is the dataframe on which you are working and in place of index type the index number or name. Here is the implementation of code on the jupyter notebook please do read the comments and markdown for step by step explanation.

How do I drop multiple rows in a DataFrame?

Delete a Multiple Rows by Index Position in DataFrame As df. drop() function accepts only list of index label names only, so to delete the rows by position we need to create a list of index names from positions and then pass it to drop(). As default value of inPlace is false, so contents of dfObj will not be modified.


2 Answers

Currently, the H2OFrame.drop method does not support this, but we have added a ticket to add support for dropping multiple rows (and multiple columns).

In the meantime, you can subset rows by an index:

import h2o
h2o.init(nthreads = -1)

hf = h2o.H2OFrame([[1,3],[4,5],[3,0],[5,5]])  # 4 rows x 2 columns
hf2 = hf[[1,3],:]  # Keep some of the rows by passing an index

Note that the index list, [1,3], is ordered. If you try to pass [3,1] instead, you will get an error. H2O will not reorder the rows, and this is its way of telling you that. If you have a list of out-of-order indexes, just wrap the sorted function around it first.

hf2 = hf[sorted([3,3]),:]

Lastly, if you prefer, it's also okay to reassign the new subsetted frame to the original frame name, as follows:

hf = hf[[1,3],:]
like image 66
Erin LeDell Avatar answered Oct 06 '22 18:10

Erin LeDell


Since this is now supported I wanted to highlight the comment that says how to drop by index:

df = df.drop([0,1,2], axis=0)

where if axis = 1 (default), then it drop columns; if axis=0 then drop rows.

drop(index, axis=1)

where index is a list of column indices, column names, or row indices to drop; or a string to drop a single column by name; or an int to drop a single column by index.

like image 23
Lauren Avatar answered Oct 06 '22 20:10

Lauren