Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to append to an xarray.Dataset?

I've been using the .append() method to concatenate two tables (with the same fields) in pandas. Unfortunately this method does not exist in xarray, is there another way to do it?

like image 260
Itay Lieder Avatar asked Oct 30 '15 12:10

Itay Lieder


People also ask

What is Xarray in Python?

Xarray is a python package for working with labeled multi-dimensional (a.k.a. N-dimensional, ND) arrays, it includes functions for advanced analytics and visualization. Xarray is heavily inspired by pandas and it uses pandas internally.

How do I drop a dimension in Xarray?

In future versions of xarray (v0. 9 and later), you will be able to drop coordinates when indexing by writing drop=True , e.g., ds['bar']. sel(x=1, drop=True) .

How to create a dataset with xarray in Python?

The following syntax is used to create a dataset with xarray: ds = xr.Dataset (data_vars, coords, attrs) A complete dataset consists of three dictionaries: data_vars : The key is the variable name and value is a tuple consisting of

How do I convert a Dataframe to an xarray?

Starting with a DataFrame, you can directly convert it to a Dataset. This can be an excellent starting point since it creates a xarray object for you. In the example below, I create a dataFrame with one variable, y, and one index, x. I then use to_xarray () to make it into a xarray object. This dataset isn’t formatted very well yet.

How to append an element to an array in xarray?

Show activity on this post. Xarray doesn't have an append method because its data structures are built on top of NumPy's non-resizable arrays, so we cannot append new elements without copying the entire array. Hence, we don't implement an append method.

How does concatenation work in xarray?

concat () has a number of options which provide deeper control over which variables are concatenated and how it handles conflicting variables between datasets. With the default parameters, xarray will load some coordinate variables into memory to compare them between datasets.


2 Answers

Xarray doesn't have an append method because its data structures are built on top of NumPy's non-resizable arrays, so we cannot append new elements without copying the entire array. Hence, we don't implement an append method. Instead, you should use xarray.concat.

One usual pattern is to accumulate Dataset/DataArray objects in a list, and concatenate once at the end:

datasets = []
for example in examples:
    ds = create_an_xarray_dataset(example)
    datasets.append(ds)
combined = xarray.concat(datasets, dim='example')

You don't want to concatenate inside the loop -- that would make your code run in quadratic time.

Alternatively, you could allocate a single Dataset/DataArray for the result, and fill in the values with indexing, e.g.,

dims = ('example', 'x', 'y')
combined = xarray.Dataset(
    data_vars={'my_variable': (dims, np.zeros((len(examples), 100, 200)))},
    coords={'example': examples})
for example in examples:
    combined.loc[dict(example=example)] = create_an_xarray_dataset(example)

(Note that you always need to use indexing with square brackets like [] or .loc[] -- assigning with sel() and isel() doesn't work.)

These two approaches are equally efficient -- it's really a matter of taste which one looks better to you or works better for your application.

For what it's worth, pandas has the same limitation: the append method does indeed copy entire dataframes each time it is used. This is a perpetual surprise and source of performance issues for new users. So I do think that we made the right design decision not including it in xarray.

like image 167
shoyer Avatar answered Oct 09 '22 13:10

shoyer


You can either use .concat or merge(). Documentation.

like image 36
bkaf Avatar answered Oct 09 '22 13:10

bkaf