Retrieving the order of key-word arguments passed via **kwargs would be extremely useful in the particular project I am working on. It is about making a kind of n-d numpy array with meaningful dimensions (right now called dimarray), particularly useful for geophysical data handling.
For now say we have:
import numpy as np
from dimarray import Dimarray # the handy class I am programming
def make_data(nlat, nlon):
""" generate some example data
"""
values = np.random.randn(nlat, nlon)
lon = np.linspace(-180,180,nlon)
lat = np.linspace(-90,90,nlat)
return lon, lat, values
What works:
>>> lon, lat, values = make_data(180,360)
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0]
-180.0 -90.0
What does not:
>>> lon, lat, data = make_data(180,180) # square, no shape checking possible !
>>> a = Dimarray(values, lat=lat, lon=lon)
>>> print a.lon[0], a.lat[0] # is random
-90.0, -180.0 # could be (actually I raise an error in such ambiguous cases)
The reason is that Dimarray's __init__
method's signature is (values, **kwargs)
and since kwargs
is an unordered dictionary (dict) the best it can do is check against the shape of values
.
Of course, I want it to work for any kind of dimensions:
a = Dimarray(values, x1=.., x2=...,x3=...)
so it has to be hard coded with **kwargs
The chances of ambiguous cases occurring increases with the number of dimensions.
There are ways around that, for instance with a signature (values, axes, names, **kwargs)
it is possible to do:
a = Dimarray(values, [lat, lon], ["lat","lon"])
but this syntax is cumbersome for interactive use (ipython), since I would like this package to really be a part of my (and others !!) daily use of python, as an actual replacement of numpy arrays in geophysics.
I would be VERY interested in a way around that. The best I can think of right now is to use inspect module's stack method to parse the caller's statement:
import inspect
def f(**kwargs):
print inspect.stack()[1][4]
return tuple([kwargs[k] for k in kwargs])
>>> print f(lon=360, lat=180)
[u'print f(lon=360, lat=180)\n']
(180, 360)
>>> print f(lat=180, lon=360)
[u'print f(lat=180, lon=360)\n']
(180, 360)
One could work something out from that, but there are unsolvable issues since stack() catches everything on the line:
>>> print (f(lon=360, lat=180), f(lat=180, lon=360))
[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']
[u'print (f(lon=360, lat=180), f(lat=180, lon=360))\n']
((180, 360), (180, 360))
Is there any other inspect trick I am not aware of, which could solve this problem ? (I am not familiar with this module) I would imagine getting the piece of code which is right between the brackets lon=360, lat=180
should be something feasible, no??
So I have the feeling for the first time in python to hit a hard wall in term of doing something which is theoretically feasible based on all available information (the ordering provided by the user IS valuable information !!!).
I read interesting suggestions by Nick there: https://mail.python.org/pipermail/python-ideas/2011-January/009054.html and was wondering whether this idea has moved forward somehow?
I see why it is not desirable to have an ordered **kwargs in general, but a patch for these rare cases would be neat. Anyone aware of a reliable hack?
NOTE: this is not about pandas, I am actually trying to develop a light-weight alternative for it, whose usage remains very close to numpy. Will soon post the gitHub link.
EDIT: Note I this is relevant for interactive use of dimarray. The dual syntax is needed anyway.
EDIT2: I also see counter arguments that knowing the data is not ordered could also be seen as valuable information, since it leaves Dimarray the freedom to check values
shape and adjust the order automatically. It could even be that not remembering the dimension of the data occurs more often than having the same size for two dimensions. So right now, I guess it is fine to raise an error for ambiguous cases, asking the user to provide the names
argument. Nevertheless, it would be neat to have the freedom to make that kind of choices (how Dimarray class should behave), instead of being constrained by a missing feature of python.
EDIT 3, SOLUTIONS: after the suggestion of kazagistar:
I did not mention that there are other optional attribute parameters such as name=""
and units=""
, and a couple of other parameters related to slicing, so the *args
construct would need to come with keyword name testing on kwargs
.
In summary, there are many possibilities:
*Choice a: keep current syntax
a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")
*Choice b: kazagistar's 2nd suggestion, dropping axis definition via **kwargs
a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")
*Choice c: kazagistar's 2nd suggestion, with optional axis definition via **kwargs
(note this involves names=
to be extracted from **kwargs
, see background below)
a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, ("lat", mylat), ("lon",mylon), name="myarray")
*Choice d: kazagistar's 3nd suggestion, with optional axis definition via **kwargs
a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")
Hmm, it comes down to aesthetics, and to some design questions (Is lazy ordering an important feature in interactive mode?). I am hesitating between b) and c). I am not sure the **kwargs really brings something. Ironically enough, what I started to criticize became a feature when thinking more about it...
Thanks very much for the answers. I will mark the question as answered, but you are most welcome to vote for a), b) c) or d) !
=====================
EDIT 4 : better solution: choice a) !!, but adding a from_tuples class method. The reason for that is to allow one more degree of freedom. If the axis names are not provided, they will be generated automatically as "x0", "x1" etc... To use really just like pandas, but with axis naming. This also avoids mixing up axes and attributes into **kwargs, and leaving it only for the axes. There will be more soon as soon as I am done with the doc.
a = Dimarray(values, lon=mylon, lat=mylat, name="myarray")
a = Dimarray(values, [mylat, mylon], ["lat", "lon"], name="myarray")
a = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")
EDIT 5 : more pythonic solution? : similar to EDIT 4 above in term of the user api, but via a wrapper dimarray, while being very strict with how Dimarray is instantiated. This is also in the spirit of what kazagistar proposed.
from dimarray import dimarray, Dimarray
a = dimarray(values, lon=mylon, lat=mylat, name="myarray") # error if lon and lat have same size
b = dimarray(values, [("lat", mylat), ("lon",mylon)], name="myarray")
c = dimarray(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
d = dimarray(values, [mylat, mylon, ...], name="myarray2")
And from the class itself:
e = Dimarray.from_dict(values, lon=mylon, lat=mylat) # error if lon and lat have same size
e.set(name="myarray", inplace=True)
f = Dimarray.from_tuples(values, ("lat", mylat), ("lon",mylon), name="myarray")
g = Dimarray.from_list(values, [mylat, mylon, ...], ['lat','lon',...], name="myarray")
h = Dimarray.from_list(values, [mylat, mylon, ...], name="myarray")
In the cases d) and h) axes are automatically named "x0", "x1", and so on, unless mylat, mylon actually belong to the Axis class (which I do not mention in this post, but Axes and Axis do their job, to build axes and deal with indexing).
Explanations:
class Dimarray(object):
""" ndarray with meaningful dimensions and clean interface
"""
def __init__(self, values, axes, **kwargs):
assert isinstance(axes, Axes), "axes must be an instance of Axes"
self.values = values
self.axes = axes
self.__dict__.update(kwargs)
@classmethod
def from_tuples(cls, values, *args, **kwargs):
axes = Axes.from_tuples(*args)
return cls(values, axes)
@classmethod
def from_list(cls, values, axes, names=None, **kwargs):
if names is None:
names = ["x{}".format(i) for i in range(len(axes))]
return cls.from_tuples(values, *zip(axes, names), **kwargs)
@classmethod
def from_dict(cls, values, names=None,**kwargs):
axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)
# with necessary assert statements in the above
return cls(values, axes)
Here is the trick (schematically):
def dimarray(values, axes=None, names=None, name=..,units=..., **kwargs):
""" my wrapper with all fancy options
"""
if len(kwargs) > 0:
new = Dimarray.from_dict(values, axes, **kwargs)
elif axes[0] is tuple:
new = Dimarray.from_tuples(values, *axes, **kwargs)
else:
new = Dimarray.from_list(values, axes, names=names, **kwargs)
# reserved attributes
new.set(name=name, units=units, ..., inplace=True)
return new
The only thing we loose is indeed *args syntax, which could not accommodate for so many options. But that's fine.
And its make it easy for sub-classing, too. How does it sound to the python experts here?
(this whole discussion could be split in two parts really)
=====================
A bit of background (EDIT: in part outdated, for cases a), b), c), d) only), just in case you are interested:
*Choice a involves:
def __init__(self, values, axes=None, names=None, units="",name="",..., **kwargs):
""" schematic representation of Dimarray's init method
"""
# automatic ordering according to values' shape (unless names is also provided)
# the user is allowed to forget about the exact shape of the array
if len(kwargs) > 0:
axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)
# otherwise initialize from list
# exact ordering + more freedom in axis naming
else:
axes = Axes.from_list(axes, names)
... # check consistency
self.values = values
self.axes = axes
self.name = name
self.units = units
*Choices b) and c) impose:
def __init__(self, values, *args, **kwargs):
...
b) all attributes are naturally passed via kwargs, with self.__dict__.update(kwargs)
. This is clean.
c) Need to filter key-word arguments:
def __init__(self, values, *args, **kwargs):
""" most flexible for interactive use
"""
# filter out known attributes
default_attrs = {'name':'', 'units':'', ...}
for k in kwargs:
if k in 'name', 'units', ...:
setattr(self, k) = kwargs.pop(k)
else:
setattr(self, k) = default_attrs[k]
# same as before
if len(kwargs) > 0:
axes = Axes.from_dict(shape=values.shape, names=names, **kwargs)
# same, just unzip
else:
names, numpy_axes = zip(*args)
axes = Axes.from_list(numpy_axes, names)
This is actually quite nice and handy, the only (minor) drawback is that default parameters for name="", units="" and some other more relevant parameters are not accessible by inspection or completion.
*Choice d: clear __init__
def __init__(self, values, axes, name="", units="", ..., **kwaxes)
But is a bit verbose indeed.
==========
EDIT, FYI: I ended up using a list of tuples for the axes
parameter, or alternatively the parameters dims=
and labels=
for axis name and axis values, respectively. The related project dimarray is on github. Thanks again at kazagistar.
Embrace keyword arguments in Python Consider using the * operator to require those arguments be specified as keyword arguments. And remember that you can accept arbitrary keyword arguments to the functions you define and pass arbitrary keyword arguments to the functions you call by using the ** operator.
With keyword arguments in python, we can change the order of passing the arguments without any consequences. Let's take a function to divide two numbers, and return the quotient. We can call this function with arguments in any order, as long as we specify which value goes into what.
When you use keyword arguments in a function call, the caller identifies the arguments by the parameter name. This allows you to skip arguments or place them out of order because the Python interpreter is able to use the keywords provided to match the values with parameters.
Assign keyword arguments to remaining parameters in any order. If one of the keyword arguments matches an argument already assigned positionally (or the same keyword argument is passed twice), it's an error.
No, you cannot know the order in which items were added to a dictionary, since doing this increases the complexity of implementing the dicionary significantly. (For when you really really need this, collections.OrderedDict has you covered).
However, have you considered some basic alternative syntax? For example:
a = Dimarray(values, 'lat', lat, 'lon', lon)
or (probably the best option)
a = Dimarray(values, ('lat', lat), ('lon', lon))
or (most explicit)
a = Dimarray(values, [('lat', lat), ('lon', lon)])
At some level though, that need ordering are inherently positional. **kwargs is often abused for labeling, but argument name generally shouldn't be "data", since it is a pain to set programatically. Just make the two parts of the data that are associated clear with a tuple, and use a list to make the ordering preserved, and provide strong assertions + error messages to make it clear when the input is invalid and why.
There is module especially made to handle this :
https://github.com/claylabs/ordered-keyword-args
def multiple_kwarguments(first , **lotsofothers):
print first
for i,other in lotsofothers.items():
print other
return True
multiple_kwarguments("first", second="second", third="third" ,fourth="fourth" ,fifth="fifth")
output:
first
second
fifth
fourth
third
from orderedkwargs import ordered kwargs
@orderedkwargs
def mutliple_kwarguments(first , *lotsofothers):
print first
for i, other in lotsofothers:
print other
return True
mutliple_kwarguments("first", second="second", third="third" ,fourth="fourth" ,fifth="fifth")
Output:
first
second
third
fourth
fifth
Note: Single asterik is required while using this module with decorator above the function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With