Pandas DataFrame to multidimensional NumPy Array

Tags:

I have a Dataframe which I want to transform into a multidimensional array using one of the columns as the 3rd dimension.
As an example:

Click to copy

df = pd.DataFrame({
'id': [1, 2, 2, 3, 3, 3],
'date': np.random.randint(1, 6, 6),
'value1': [11, 12, 13, 14, 15, 16],
'value2': [21, 22, 23, 24, 25, 26]
 })

enter image description here

I would like to transform it into a 3D array with dimensions (id, date, values) like this:
enter image description here
The problem is that the 'id's do not have the same number of occurrences so I cannot use np.reshape().

For this simplified example, I was able to use:

Click to copy

ra = np.full((3, 3, 3), np.nan)

for i, value in enumerate(df['id'].unique()):
    rows = df.loc[df['id'] == value].shape[0]
    ra[i, :rows, :] = df.loc[df['id'] == value, 'date':'value2']

To produce the needed result:
enter image description here
but the original DataFrame contains millions of rows.

Is there a vectorized way to accomplice the same result?

315

asked Oct 08 '18 09:10

Yannis

Video Answer

1 Answers

Approach #1

Here's one vectorized approach after sorting id col with df.sort_values('id', inplace=True) as suggested by @Yannis in comments -

Click to copy

count_id = df.id.value_counts().sort_index().values
mask = count_id[:,None] > np.arange(count_id.max())
vals = df.loc[:, 'date':'value2'].values
out_shp = mask.shape + (vals.shape[1],)
out = np.full(out_shp, np.nan)
out[mask] = vals

Approach #2

Another with factorize that doesn't require any pre-sorting -

Click to copy

x = df.id.factorize()[0]   
y = df.groupby(x).cumcount().values
vals = df.loc[:, 'date':'value2'].values
out_shp = (x.max()+1, y.max()+1, vals.shape[1])
out = np.full(out_shp, np.nan)
out[x,y] = vals

174

answered Sep 20 '22 16:09

Divakar

Related questions
                            
                                How to Install M2crypto on Windows
                            
                                Pandas SettingWithCopyWarning When Using loc [duplicate]
                            
                                How can I set the size of the default font loaded by PIL so it fits on my 8x8 matrix?
                            
                                ImportError: cannot import name check_array from sklearn.utils.validation
                            
                                Create UUID on client and save primary key with Django REST Framework and using a POST
                            
                                Django not sending error emails - how can I debug?
                            
                                Logging in Django on Heroku not appearing
                            
                                Distribution of Number of Digits of Random Numbers
                            
                                How can I add labels to TensorBoard Images?
                            
                                how can I asynchronously map/filter an asynchronous iterable?
                            
                                using mattermost api via gitlab oauth as an end-user with username and password (no client_secret)
                            
                                How do I make a custom model Field call to_python when the field is accessed immediately after initialization (not loaded from DB) in Django >=1.10?
                            
                                Get weight matrices from gensim word2Vec
                            
                                Why does __self__ of built-in functions return the builtin module it belongs to?
                            
                                What Does the python -v Command Do
                            
                                Unit tests fail after a Django upgrade
                            
                                When to use multiple event loops?
                            
                                How to get interactive plot of pyplot when using pycharm
                            
                                cProfile adds significant overhead when calling numba jit functions
                            
                                What is the Big O Complexity of Reversing the Order of Columns in Pandas DataFrame?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas DataFrame to multidimensional NumPy Array

Tags:

python

arrays

pandas

numpy

transform

Yannis

People also ask

Video Answer

1 Answers

Divakar

Recent Activity

Donate For Us