Python: Create structured numpy structured array from two columns in a DataFrame

Tags:

How do you create a structured array from two columns in a DataFrame? I tried this:

df = pd.DataFrame(data=[[1,2],[10,20]], columns=['a','b'])
df

    a   b
0   1   2
1   10  20

x = np.array([([val for val in list(df['a'])],
               [val for val in list(df['b'])])])

But this gives me this:

array([[[ 1, 10],
        [ 2, 20]]])

But I wanted this:

[(1,2),(10,20)]

Thanks!

597

asked Jul 11 '18 07:07

Kim O

2 Answers

There are a couple of methods. You may experience a loss in performance and functionality relative to regular NumPy arrays.

record array

You can use pd.DataFrame.to_records with index=False. Technically, this is a record array, but for many purposes this will be sufficient.

res1 = df.to_records(index=False)

print(res1)

rec.array([(1, 2), (10, 20)], 
          dtype=[('a', '<i8'), ('b', '<i8')])

structured array

Manually, you can construct a structured array via conversion to tuple by row, then specifying a list of tuples for the dtype parameter.

s = df.dtypes
res2 = np.array([tuple(x) for x in df.values], dtype=list(zip(s.index, s)))

print(res2)

array([(1, 2), (10, 20)], 
      dtype=[('a', '<i8'), ('b', '<i8')])

What's the difference?

Very little. recarray is a subclass of ndarray, the regular NumPy array type. On the other hand, the structured array in the second example is of type ndarray.

type(res1)                    # numpy.recarray
isinstance(res1, np.ndarray)  # True
type(res2)                    # numpy.ndarray

The main difference is record arrays facilitate attribute lookup, while structured arrays will yield AttributeError:

print(res1.a)
array([ 1, 10], dtype=int64)

print(res2.a)
AttributeError: 'numpy.ndarray' object has no attribute 'a'

Related: NumPy “record array” or “structured array” or “recarray”

answered Nov 14 '22 21:11

jpp

Use list comprehension for convert nested lists to tuples:

print ([tuple(x) for x in df.values.tolist()])
[(1, 2), (10, 20)]

Detail:

print (df.values.tolist())
[[1, 2], [10, 20]]

EDIT: You can convert by to_records and then to np.asarray, check link:

df = pd.DataFrame(data=[[True, 1,2],[False, 10,20]], columns=['a','b','c'])
print (df)
       a   b   c
0   True   1   2
1  False  10  20

print (np.asarray(df.to_records(index=False)))
[( True,  1,  2) (False, 10, 20)]

answered Nov 14 '22 21:11

jezrael

Related questions
                            
                                How is int.from_bytes() calculated?
                            
                                Tricky slicing specifications on business-day datetimeindex
                            
                                TypeError: Missing one required positional argument
                            
                                Slicing a MultiIndex DataFrame with a condition based on the index [duplicate]
                            
                                USBError: [Errno 13] Access denied (insufficient permissions)
                            
                                PyQt: is there an better way to set objectName in code?
                            
                                How to speed up pandas string function?
                            
                                Find number runs with customizable distance between numbers
                            
                                Get printable name of any QKeyEvent key value
                            
                                Plotly figure hide and display
                            
                                Error "'str' object is not callable" when using property setter
                            
                                How to {pivot|denormalize|manipulate} CSV table in Python
                            
                                Sum attributes of duplicate coordinates in python
                            
                                Altair: not sorting an axis
                            
                                How to melt first level column in multiindex with pandas
                            
                                pip install lxml fails on python 3.7 on windows
                            
                                what is uninitialized data in pytorch.empty function
                            
                                Pandas: seaborn countplot from several columns
                            
                                Numpy remove duplicate column values
                            
                                Curl and Python Requests (get) reporting different http status code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python: Create structured numpy structured array from two columns in a DataFrame

Tags:

python

arrays

pandas

dataframe

numpy