Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make "keyword-only" fields with dataclasses?

Since 3.0 there is support to make an argument keyword only:

class S3Obj:     def __init__(self, bucket, key, *, storage_class='Standard'):         self.bucket = bucket         self.key = key         self.storage_class = storage_class 

How to get that kind of signature using dataclasses? Something like this, but preferably without the SyntaxError:

@dataclass class S3Obj:     bucket: str     key: str     *     storage_class: str = 'Standard' 

Ideally declarative, but using the __post_init__ hook and/or a replacement class decorator is fine too - as long as the code is reusable.

Edit: maybe something like this syntax, using an ellipsis literal

@mydataclass class S3Obj:     bucket: str     key: str     ...     storage_class: str = 'Standard' 
like image 971
wim Avatar asked Apr 18 '18 20:04

wim


People also ask

What is keyword only argument in Python?

Keyword-only arguments are another attribute of Python functions that have been available since Python 3.0. These arguments are specified using the '*' marker. They prompt the user to state the keyword used in the already defined function when making a call to the same function.

Can DataClasses have methods?

A dataclass can very well have regular instance and class methods. Dataclasses were introduced from Python version 3.7. For Python versions below 3.7, it has to be installed as a library.

What is __ Post_init __?

The post-init function is an in-built function in python and helps us to initialize a variable outside the __init__ function. post-init function in python.

What does @dataclass do in Python?

Python introduced the dataclass in version 3.7 (PEP 557). The dataclass allows you to define classes with less code and more functionality out of the box.


2 Answers

Update: coming in Python 3.10, there's a new dataclasses.KW_ONLY sentinel that works like this:

@dataclasses.dataclass class Example:     a: int     b: int     _: dataclasses.KW_ONLY     c: int     d: int 

Any fields after the KW_ONLY pseudo-field are keyword-only.

There's also a kw_only parameter to the dataclasses.dataclass decorator, which makes all fields keyword-only:

@dataclasses.dataclass(kw_only=True) class Example:     a: int     b: int 

It's also possible to pass kw_only=True to dataclasses.field to mark individual fields as keyword-only.

If keyword-only fields come after non-keyword-only fields (possible with inheritance, or by individually marking fields keyword-only), keyword-only fields will be reordered after other fields, specifically for the purpose of __init__. Other dataclass functionality will keep the declared order. This reordering is confusing and should probably be avoided.


Pre-Python 3.10 answer:

You're not going to get much help from dataclasses when doing this. There's no way to say that a field should be initialized by keyword-only argument, and the __post_init__ hook doesn't know whether the original constructor arguments were passed by keyword. Also, there's no good way to introspect InitVars, let alone mark InitVars as keyword-only.

At minimum, you'll have to replace the generated __init__. Probably the simplest way is to just define __init__ by hand. If you don't want to do that, probably the most robust way is to create field objects and mark them kwonly in the metadata, then inspect the metadata in your own decorator. This is even more complicated than it sounds:

import dataclasses import functools import inspect  # Helper to make calling field() less verbose def kwonly(default=dataclasses.MISSING, **kwargs):     kwargs.setdefault('metadata', {})     kwargs['metadata']['kwonly'] = True     return dataclasses.field(default=default, **kwargs)  def mydataclass(_cls, *, init=True, **kwargs):     if _cls is None:         return functools.partial(mydataclass, **kwargs)      no_generated_init = (not init or '__init__' in _cls.__dict__)     _cls = dataclasses.dataclass(_cls, **kwargs)     if no_generated_init:         # No generated __init__. The user will have to provide __init__,         # and they probably already have. We assume their __init__ does         # what they want.         return _cls      fields = dataclasses.fields(_cls)     if any(field.metadata.get('kwonly') and not field.init for field in fields):         raise TypeError('Non-init field marked kwonly')      # From this point on, ignore non-init fields - but we don't know     # about InitVars yet.     init_fields = [field for field in fields if field.init]     for i, field in enumerate(init_fields):         if field.metadata.get('kwonly'):             first_kwonly = field.name             num_kwonly = len(init_fields) - i             break     else:         # No kwonly fields. Why were we called? Assume there was a reason.         return _cls      if not all(field.metadata.get('kwonly') for field in init_fields[-num_kwonly:]):         raise TypeError('non-kwonly init fields following kwonly fields')      required_kwonly = [field.name for field in init_fields[-num_kwonly:]                        if field.default is field.default_factory is dataclasses.MISSING]      original_init = _cls.__init__      # Time to handle InitVars. This is going to get ugly.     # InitVars don't show up in fields(). They show up in __annotations__,     # but the current dataclasses implementation doesn't understand string     # annotations, and we want an implementation that's robust against     # changes in string annotation handling.     # We could inspect __post_init__, except there doesn't have to be a     # __post_init__. (It'd be weird to use InitVars with no __post_init__,     # but it's allowed.)     # As far as I can tell, that leaves inspecting __init__ parameters as     # the only option.      init_params = tuple(inspect.signature(original_init).parameters)     if init_params[-num_kwonly] != first_kwonly:         # InitVars following kwonly fields. We could adopt a convention like         # "InitVars after kwonly are kwonly" - in fact, we could have adopted         # "all fields after kwonly are kwonly" too - but it seems too likely         # to cause confusion with inheritance.         raise TypeError('InitVars after kwonly fields.')     # -1 to exclude self from this count.     max_positional = len(init_params) - num_kwonly - 1      @functools.wraps(original_init)     def __init__(self, *args, **kwargs):         if len(args) > max_positional:             raise TypeError('Too many positional arguments')         check_required_kwargs(kwargs, required_kwonly)         return original_init(self, *args, **kwargs)     _cls.__init__ = __init__      return _cls  def check_required_kwargs(kwargs, required):     # Not strictly necessary, but if we don't do this, error messages for     # required kwonly args will list them as positional instead of     # keyword-only.     missing = [name for name in required if name not in kwargs]     if not missing:         return     # We don't bother to exactly match the built-in logic's exception     raise TypeError(f"__init__ missing required keyword-only argument(s): {missing}") 

Usage example:

@mydataclass class S3Obj:     bucket: str     key: str     storage_class: str = kwonly('Standard') 

This is somewhat tested, but not as thoroughly as I would like.


You can't get the syntax you propose with ..., because ... doesn't do anything a metaclass or decorator can see. You can get something pretty close with something that actually triggers name lookup or assignment, like kwonly_start = True, so a metaclass can see it happen. However, a robust implementation of this is complicated to write, because there are a lot of things that need dedicated handling. Inheritance, typing.ClassVar, dataclasses.InitVar, forward references in annotations, etc. will all cause problems if not handled carefully. Inheritance probably causes the most problems.

A proof-of-concept that doesn't handle all the fiddly bits might look like this:

# Does not handle inheritance, InitVar, ClassVar, or anything else # I'm forgetting.  class POCMetaDict(dict):     def __setitem__(self, key, item):         # __setitem__ instead of __getitem__ because __getitem__ is         # easier to trigger by accident.         if key == 'kwonly_start':             self['__non_kwonly'] = len(self['__annotations__'])         super().__setitem__(key, item)  class POCMeta(type):     @classmethod     def __prepare__(cls, name, bases, **kwargs):         return POCMetaDict()     def __new__(cls, name, bases, classdict, **kwargs):         classdict.pop('kwonly_start')         non_kwonly = classdict.pop('__non_kwonly')          newcls = super().__new__(cls, name, bases, classdict, **kwargs)         newcls = dataclass(newcls)          if non_kwonly is None:             return newcls          original_init = newcls.__init__          @functools.wraps(original_init)         def __init__(self, *args, **kwargs):             if len(args) > non_kwonly:                 raise TypeError('Too many positional arguments')             return original_init(self, *args, **kwargs)          newcls.__init__ = __init__         return newcls 

You'd use it like

class S3Obj(metaclass=POCMeta):     bucket: str     key: str      kwonly_start = True      storage_class: str = 'Standard' 

This is untested.

like image 132
user2357112 supports Monica Avatar answered Sep 27 '22 21:09

user2357112 supports Monica


I wonder why this is not part of the dataclass API, that seems important to me.

If all arguments are keyword arguments maybe its a bit simpler and the following could suffice?

from dataclasses import dataclass from functools import wraps  def kwargs_only(cls):          @wraps(cls)     def call(**kwargs):         return cls(**kwargs)          return call  @kwargs_only @dataclass class Coordinates:     latitude: float = 0     longitude: float = 0 

That's not perfect because the error when using positional argument refers to call:

-------------------------------------------------------- TypeError              Traceback (most recent call last) <ipython-input-24-fb588c816ecf> in <module> ----> 1 c = Coordinates(1, longitude=2)       2 help(c)  TypeError: call() takes 0 positional arguments but 1 was given 

Similarly the dataclass' constructor documentation is outdated and doesn't reflect the new constraint.

If there are only some keyword fields, maybe this?

def kwargs(*keywords):          def decorator(cls):         @wraps(cls)         def call(*args, **kwargs):             if any(kw not in kwargs for kw in keywords):                 raise TypeError(f"{cls.__name__}.__init__() requires {keywords} as keyword arguments")             return cls(*args, **kwargs)                  return call      return decorator   @kwargs('longitude') @dataclass(frozen=True) class Coordinates:     latitude: float     longitude: float = 0 
like image 35
cglacet Avatar answered Sep 27 '22 21:09

cglacet