I am struggling with the basic task of constructing a DataFrame of counts by value from a tuple produced by np.unique(arr, return_counts=True)
, such as:
import numpy as np
import pandas as pd
np.random.seed(123)
birds=np.random.choice(['African Swallow','Dead Parrot','Exploding Penguin'], size=int(5e4))
someTuple=np.unique(birds, return_counts = True)
someTuple
#(array(['African Swallow', 'Dead Parrot', 'Exploding Penguin'],
# dtype='<U17'), array([16510, 16570, 16920], dtype=int64))
First I tried
pd.DataFrame(list(someTuple))
# Returns this:
# 0 1 2
# 0 African Swallow Dead Parrot Exploding Penguin
# 1 16510 16570 16920
I also tried pd.DataFrame.from_records(someTuple)
, which returns the same thing.
But what I'm looking for is this:
# birdType birdCount
# 0 African Swallow 16510
# 1 Dead Parrot 16570
# 2 Exploding Penguin 16920
What's the right syntax?
To create a DataFrame with this list of tuples, we will simply use pandas. DataFrame() method inside which we will pass a list of tuples, but we have to pass a parameter called columns=[] for which we will assign a list of column headers.
Tuples are a sort of list but with a limited set of items. In JavaScript, tuples are created using arrays. In Flow you can create tuples using the [type, type, type] syntax.
Create a DataFrame from a Numpy ndarray Since a DataFrame is similar to a 2D Numpy array, we can create one from a Numpy ndarray . You should remember that the input Numpy array must be 2D, otherwise you will get a ValueError. If you pass a raw Numpy ndarray , the index and column names start at 0 by default.
Here's one NumPy based solution with np.column_stack
-
pd.DataFrame(np.column_stack(someTuple),columns=['birdType','birdCount'])
Or with np.vstack
-
pd.DataFrame(np.vstack(someTuple).T,columns=['birdType','birdCount'])
Benchmarking np.transpose
, np.column_stack
and np.vstack
for staking 1D
arrays into columns to form a 2D
array -
In [54]: tup1 = (np.random.rand(1000),np.random.rand(1000))
In [55]: %timeit np.transpose(tup1)
100000 loops, best of 3: 15.9 µs per loop
In [56]: %timeit np.column_stack(tup1)
100000 loops, best of 3: 11 µs per loop
In [57]: %timeit np.vstack(tup1).T
100000 loops, best of 3: 14.1 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With