Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

build a DataFrame with columns from tuple of arrays

I am struggling with the basic task of constructing a DataFrame of counts by value from a tuple produced by np.unique(arr, return_counts=True), such as:

import numpy as np
import pandas as pd

np.random.seed(123)  
birds=np.random.choice(['African Swallow','Dead Parrot','Exploding Penguin'], size=int(5e4))
someTuple=np.unique(birds, return_counts = True)
someTuple
#(array(['African Swallow', 'Dead Parrot', 'Exploding Penguin'], 
#       dtype='<U17'), array([16510, 16570, 16920], dtype=int64))

First I tried

pd.DataFrame(list(someTuple))
# Returns this:
#                  0            1                  2
# 0  African Swallow  Dead Parrot  Exploding Penguin
# 1            16510        16570              16920

I also tried pd.DataFrame.from_records(someTuple), which returns the same thing.

But what I'm looking for is this:

#              birdType      birdCount
# 0     African Swallow          16510  
# 1         Dead Parrot          16570  
# 2   Exploding Penguin          16920

What's the right syntax?

like image 500
C8H10N4O2 Avatar asked Aug 22 '16 19:08

C8H10N4O2


People also ask

How do you create a DataFrame from a tuple?

To create a DataFrame with this list of tuples, we will simply use pandas. DataFrame() method inside which we will pass a list of tuples, but we have to pass a parameter called columns=[] for which we will assign a list of column headers.

Can you have a tuple of arrays?

Tuples are a sort of list but with a limited set of items. In JavaScript, tuples are created using arrays. In Flow you can create tuples using the [type, type, type] syntax.

Can we create DataFrame from NumPy arrays?

Create a DataFrame from a Numpy ndarray Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray . You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError. If you pass a raw Numpy ndarray , the index and column names start at 0 by default.


1 Answers

Here's one NumPy based solution with np.column_stack -

pd.DataFrame(np.column_stack(someTuple),columns=['birdType','birdCount'])

Or with np.vstack -

pd.DataFrame(np.vstack(someTuple).T,columns=['birdType','birdCount'])

Benchmarking np.transpose, np.column_stack and np.vstack for staking 1D arrays into columns to form a 2D array -

In [54]: tup1 = (np.random.rand(1000),np.random.rand(1000))

In [55]: %timeit np.transpose(tup1)
100000 loops, best of 3: 15.9 µs per loop

In [56]: %timeit np.column_stack(tup1)
100000 loops, best of 3: 11 µs per loop

In [57]: %timeit np.vstack(tup1).T
100000 loops, best of 3: 14.1 µs per loop
like image 83
Divakar Avatar answered Oct 25 '22 22:10

Divakar