I am struggling with the basic task of constructing a DataFrame of counts by value from a tuple produced by <code>np.unique(arr, return_counts=True)</code>, such as: <pre class="prettyprint"><code>import numpy as np import pandas as pd np.random.seed(123) birds=np.random.choice(['African Swallow','Dead Parrot','Exploding Penguin'], size=int(5e4)) someTuple=np.unique(birds, return_counts = True) someTuple #(array(['African Swallow', 'Dead Parrot', 'Exploding Penguin'], # dtype='<U17'), array([16510, 16570, 16920], dtype=int64)) </code></pre> First I tried <pre class="prettyprint"><code>pd.DataFrame(list(someTuple)) # Returns this: # 0 1 2 # 0 African Swallow Dead Parrot Exploding Penguin # 1 16510 16570 16920 </code></pre> I also tried <code>pd.DataFrame.from_records(someTuple)</code>, which returns the same thing. But what I'm looking for is this: <pre class="prettyprint"><code># birdType birdCount # 0 African Swallow 16510 # 1 Dead Parrot 16570 # 2 Exploding Penguin 16920 </code></pre> What's the right syntax?

Here's one NumPy based solution with <code>np.column_stack</code> - <pre class="prettyprint"><code>pd.DataFrame(np.column_stack(someTuple),columns=['birdType','birdCount']) </code></pre> Or with <code>np.vstack</code> - <pre class="prettyprint"><code>pd.DataFrame(np.vstack(someTuple).T,columns=['birdType','birdCount']) </code></pre> Benchmarking <code>np.transpose</code>, <code>np.column_stack</code> and <code>np.vstack</code> for staking <code>1D</code> arrays into columns to form a <code>2D</code> array - <pre class="prettyprint"><code>In [54]: tup1 = (np.random.rand(1000),np.random.rand(1000)) In [55]: %timeit np.transpose(tup1) 100000 loops, best of 3: 15.9 µs per loop In [56]: %timeit np.column_stack(tup1) 100000 loops, best of 3: 11 µs per loop In [57]: %timeit np.vstack(tup1).T 100000 loops, best of 3: 14.1 µs per loop </code></pre>

build a DataFrame with columns from tuple of arrays

Tags:

python

pandas

dataframe

numpy

I am struggling with the basic task of constructing a DataFrame of counts by value from a tuple produced by np.unique(arr, return_counts=True), such as:

import numpy as np
import pandas as pd

np.random.seed(123)  
birds=np.random.choice(['African Swallow','Dead Parrot','Exploding Penguin'], size=int(5e4))
someTuple=np.unique(birds, return_counts = True)
someTuple
#(array(['African Swallow', 'Dead Parrot', 'Exploding Penguin'], 
#       dtype='<U17'), array([16510, 16570, 16920], dtype=int64))

First I tried

pd.DataFrame(list(someTuple))
# Returns this:
#                  0            1                  2
# 0  African Swallow  Dead Parrot  Exploding Penguin
# 1            16510        16570              16920

I also tried pd.DataFrame.from_records(someTuple), which returns the same thing.

But what I'm looking for is this:

#              birdType      birdCount
# 0     African Swallow          16510  
# 1         Dead Parrot          16570  
# 2   Exploding Penguin          16920

What's the right syntax?

500

asked Aug 22 '16 19:08

C8H10N4O2

1 Answers

Here's one NumPy based solution with np.column_stack -

pd.DataFrame(np.column_stack(someTuple),columns=['birdType','birdCount'])

Or with np.vstack -

pd.DataFrame(np.vstack(someTuple).T,columns=['birdType','birdCount'])

Benchmarking np.transpose, np.column_stack and np.vstack for staking 1D arrays into columns to form a 2D array -

In [54]: tup1 = (np.random.rand(1000),np.random.rand(1000))

In [55]: %timeit np.transpose(tup1)
100000 loops, best of 3: 15.9 µs per loop

In [56]: %timeit np.column_stack(tup1)
100000 loops, best of 3: 11 µs per loop

In [57]: %timeit np.vstack(tup1).T
100000 loops, best of 3: 14.1 µs per loop

answered Oct 25 '22 22:10

Divakar

Related questions
                            
                                Default value of Django's model doesn't appear in SQL
                            
                                Django reset auto-increment pk/id field for production
                            
                                Pycharm IPython tab completion not working (within python console)
                            
                                How to use a conditional statement based on DataFrame boolean value in pandas
                            
                                Return single cell value from Pandas DataFrame
                            
                                Subtracting numpy arrays of different shape efficiently
                            
                                Does the Python regular expression module use BRE or ERE?
                            
                                Pyspark import .py file not working
                            
                                Why does Python give "OSError: [Errno 36] File name too long" for filename shorter than filesystem's limit?
                            
                                Merge HSV channels under OpenCV 3 in Python
                            
                                Concatenate rows of pandas DataFrame with same id
                            
                                How to connect to a cluster in Amazon Redshift using SQLAlchemy?
                            
                                =+ Python operator is syntactically correct
                            
                                RuntimeWarning: invalid value encountered in arccos
                            
                                pandas: sorting observations within groupby groups
                            
                                Api key and Django Rest Framework Auth Token
                            
                                Setting default value after initialization in SelectField flask-WTForms
                            
                                Python: how to add a column to a pandas dataframe between two columns?
                            
                                Lowercasing script in Python vs Perl
                            
                                VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With