Creating a Pandas DataFrame with a numpy array containing multiple types

Tags:

I want to create a pandas dataframe with default values of zero, but one column of integers and the other of floats. I am able to create a numpy array with the correct types, see the values variable below. However, when I pass that into the dataframe constructor, it only returns NaN values (see df below). I have include the untyped code that returns an array of floats(see df2)

import pandas as pd
import numpy as np

values = np.zeros((2,3), dtype='int32,float32')
index = ['x', 'y']
columns = ['a','b','c']

df = pd.DataFrame(data=values, index=index, columns=columns)
df.values.dtype

values2 = np.zeros((2,3))
df2 = pd.DataFrame(data=values2, index=index, columns=columns)
df2.values.dtype

Any suggestions on how to construct the dataframe?

731

asked Feb 08 '14 14:02

bfcondon

1 Answers

Here are a few options you could choose from:

import numpy as np
import pandas as pd

index = ['x', 'y']
columns = ['a','b','c']

# Option 1: Set the column names in the structured array's dtype 
dtype = [('a','int32'), ('b','float32'), ('c','float32')]
values = np.zeros(2, dtype=dtype)
df = pd.DataFrame(values, index=index)

# Option 2: Alter the structured array's column names after it has been created
values = np.zeros(2, dtype='int32, float32, float32')
values.dtype.names = columns
df2 = pd.DataFrame(values, index=index, columns=columns)

# Option 3: Alter the DataFrame's column names after it has been created
values = np.zeros(2, dtype='int32, float32, float32')
df3 = pd.DataFrame(values, index=index)
df3.columns = columns

# Option 4: Use a dict of arrays, each of the right dtype:
df4 = pd.DataFrame(
    {'a': np.zeros(2, dtype='int32'),
     'b': np.zeros(2, dtype='float32'),
     'c': np.zeros(2, dtype='float32')}, index=index, columns=columns)

# Option 5: Concatenate DataFrames of the simple dtypes:
df5 = pd.concat([
    pd.DataFrame(np.zeros((2,), dtype='int32'), columns=['a']), 
    pd.DataFrame(np.zeros((2,2), dtype='float32'), columns=['b','c'])], axis=1)

# Option 6: Alter the dtypes after the DataFrame has been formed. (This is not very efficient)
values2 = np.zeros((2, 3))
df6 = pd.DataFrame(values2, index=index, columns=columns)
for col, dtype in zip(df6.columns, 'int32 float32 float32'.split()):
    df6[col] = df6[col].astype(dtype)

Each of the options above produce the same result

   a  b  c
x  0  0  0
y  0  0  0

with dtypes:

a      int32
b    float32
c    float32
dtype: object

Why pd.DataFrame(values, index=index, columns=columns) produces a DataFrame with NaNs:

values is a structured array with column names f0, f1, f2:

In [171]:  values
Out[172]: 
array([(0, 0.0, 0.0), (0, 0.0, 0.0)], 
      dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '<f4')])

If you pass the argument columns=['a', 'b', 'c'] to pd.DataFrame, then Pandas will look for columns with those names in the structured array values. When those columns are not found, Pandas places NaNs in the DataFrame to represent missing values.

107

answered Sep 23 '22 15:09

unutbu

Related questions
                            
                                Using lambda if condition on different columns in Pandas dataframe
                            
                                Python: How can I enable use of kwargs when calling from command line? (perhaps with argparse)
                            
                                How to run Python Flask within a Docker container [duplicate]
                            
                                How to set the label Fonts as "Time New Roman" by drawparallels in python
                            
                                How to extract the file name from a file path?
                            
                                Python module not found even though "Requirement Already satisfied in Pip"
                            
                                Filter pandas dataframe with specific column names in python
                            
                                Python 3 Boto 3, AWS S3: Get object URL
                            
                                Deploy python app to Heroku "Slug Size too large"
                            
                                ImportError: cannot import name '_ColumnEntity' from 'sqlalchemy.orm.query'
                            
                                How does Spring for Python compare with Spring for Java
                            
                                Why can't I use ttk in Python?
                            
                                Smallest learning curve language to work with CSV files
                            
                                How to get filter to work with a lambda taking multiple arguments?
                            
                                Append elements of a set to a list in Python
                            
                                Better Python list Naming Other than "list"
                            
                                PIL Convert PNG or GIF with Transparency to JPG without
                            
                                Is there anything to be gained from short variable names?
                            
                                How to repeat try-except block
                            
                                What is the maximum length for an attribute name in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating a Pandas DataFrame with a numpy array containing multiple types

Tags:

python

pandas

numpy

bfcondon

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us