How to convert a pandas DataFrame subset of columns AND rows into a numpy array?

Tags:

I'm wondering if there is a simpler, memory efficient way to select a subset of rows and columns from a pandas DataFrame.

For instance, given this dataframe:

 df = DataFrame(np.random.rand(4,5), columns = list('abcde')) print df            a         b         c         d         e 0  0.945686  0.000710  0.909158  0.892892  0.326670 1  0.919359  0.667057  0.462478  0.008204  0.473096 2  0.976163  0.621712  0.208423  0.980471  0.048334 3  0.459039  0.788318  0.309892  0.100539  0.753992

I want only those rows in which the value for column 'c' is greater than 0.5, but I only need columns 'b' and 'e' for those rows.

This is the method that I've come up with - perhaps there is a better "pandas" way?

 locs = [df.columns.get_loc(_) for _ in ['a', 'd']] print df[df.c > 0.5][locs]            a         d 0  0.945686  0.892892

My final goal is to convert the result to a numpy array to pass into an sklearn regression algorithm, so I will use the code above like this:

 training_set = array(df[df.c > 0.5][locs])

... and that peeves me since I end up with a huge array copy in memory. Perhaps there's a better way for that too?

766

asked Jul 16 '13 16:07

John Prior

2 Answers

Use its value directly:

In [79]: df[df.c > 0.5][['b', 'e']].values Out[79]:  array([[ 0.98836259,  0.82403141],        [ 0.337358  ,  0.02054435],        [ 0.29271728,  0.37813099],        [ 0.70033513,  0.69919695]])

116

answered Oct 07 '22 06:10

waitingkuo

Perhaps something like this for the first problem, you can simply access the columns by their names:

>>> df = pd.DataFrame(np.random.rand(4,5), columns = list('abcde')) >>> df[df['c']>.5][['b','e']]           b         e 1  0.071146  0.132145 2  0.495152  0.420219

For the second problem:

>>> df[df['c']>.5][['b','e']].values array([[ 0.07114556,  0.13214495],        [ 0.49515157,  0.42021946]])

answered Oct 07 '22 06:10

Daniel

Related questions
                            
                                setting up s3 for logs in airflow
                            
                                How to output CDATA using ElementTree
                            
                                Creating dummy variables in pandas for python
                            
                                set very low values to zero in numpy
                            
                                Pandas Writing Dataframe Columns to csv
                            
                                how to read certain columns from Excel using Pandas - Python
                            
                                Using XPath in ElementTree
                            
                                How to assert a dict contains another dict without assertDictContainsSubset in python? [duplicate]
                            
                                Python Pandas to_sql, how to create a table with a primary key?
                            
                                How to install pandas from pip on windows cmd?
                            
                                How do I append a string to a Path in Python?
                            
                                Debugging a pyQT4 app?
                            
                                Can I overwrite the string form of a namedtuple?
                            
                                Matplotlib plot with variable line width
                            
                                Grouping tests in pytest: Classes vs plain functions
                            
                                Using py.test with coverage doesn't include imports
                            
                                How to add percentages on top of bars in seaborn
                            
                                Redirect Python 'print' output to Logger
                            
                                conda stuck on Proceed ([y]/n)? when updating packages in ipython console
                            
                                Why is this loop faster than a dictionary comprehension for creating a dictionary?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert a pandas DataFrame subset of columns AND rows into a numpy array?

Tags:

python

arrays

pandas

numpy

scikit-learn

John Prior

People also ask

2 Answers

waitingkuo

Daniel

Recent Activity

Donate For Us