what is the accepted style python for data objects [duplicate]

Question

What the normal style for data objects in python. Lets say I have a method that gets a customer from somewhere (net, DB, ....) what type of object should I return. I see several choices:

a tuple
a dictionary
a class instance (are data classes 'normal')

I am sure there are others. Doing my first big python project and so would like to start out using best practices for the start

Wow - surprised at the negative reaction to the question. Maybe not being clear

I Have lots of different data items i want to pass around my code. User, product, customer, order,... (in fact they are nothing like that but the its simpler with obvious types of thing). so I have

def get_user():
  return x

what should x be. an instance of class called user, a dict, a tuple...

there are no methods associated with the objects, pure data

seems like namedtuples are the way to go

edit: How about this as a style

class product:
   pass

...

def get_product():
   ... db read stuff
   pr = product()
   pr.name = dbthing[0]
   pr.price = dbthing[1]
   return pr

Is that barf inducing or well established style or odd or what? It works. From the consumer side it makes for readable code

def xxx():
  pr = get_product()
  total = amount * pr.price

Steve Jessop · Accepted Answer

For simple data records you should generally think about collections.namedtuple first, and use one of the other options if that's not suitable for any reason. Once you know why it's not suitable that generally suggests which of the others to use. You can think of collections.namedtuple as being a shortcut for quickly defining immutable "data classes".

Taking a database as an example, if you're using an ORM then even the simplest records will be represented as objects[*], and with good reason because they will all have in common some actions you can perform on them such as storing changes back to the database. The tutorial/documentation for your ORM will guide you. If you're using the Python db API directly then rows from SQL queries will come back (initially) as tuples, but of course you can do what you like with those once you have them. Also the database connector can provide ways to manipulate them before your call to execute() returns them, for example setting the row factory in sqlite3.

Taking "the net" as an example -- well, there are many means of data interchange, but one common example is accessing an API that returns JSON data. In that case there's not much choice but to initially represent this data in your program the same way that it was structured as JSON: as lists and dictionaries containing lists, dictionaries, strings and numbers. Again, you can do what you like with this once you have it. It's fairly normal to work with it as it is, it's also fairly normal to get it re-structured into something else straight away. Both personal preference and the particular circumstances affect which you actually choose.

You should certainly think of all of these things as available options. Generally speaking you would use:

a tuple when each position has its own meaning. That's why Python returns "multiple values" from a function by returning a tuple, because the different things might be completely different types with different meaning.
a dictionary when the available keys vary by record.
a data class when the keys are the same for every record. You can use namedtuple for immutable records.
a list or tuple when the order is important but the positions are all equivalent. Observe that there's a little tension here between "tuples are immutable, lists are mutable" vs. "tuples are for heterogeneous data and lists are for homogeneous data". Personally I lean towards the former but I've seen sensible arguments for the latter, so if you're asking how people in general make the choice you can't ignore that.

[*] well, tuples and dictionaries are objects too of course, I mean objects other than these data structures ;-)

RemcoGerlich · Answer

I interpret "data object" as an immutable object usually with a few fields.

One option you see a lot is to just use standard dictionaries with the fields as keys. But personally I don't like that, as software grows bigger it can be hard to see exactly what keys exist and where they come from. People start writing functions that add new keys to existing dictionaries, and it all turns into a mess.

Your empty product class looks a bit bizarre to me -- if you're going to do that, pass the values into the constructor and let that set the attributes. Then it's about the most normal way to do it -- a simple class with some attributes and nothing else.

But namedtuples are cooler because they're immutable, so as you read the code you don't have to worry that some field changes somewhere:

from collections import namedtuple

Product = namedtuple('Product', 'name price')

p = Product("some product", 10)

But now you want to add functionality to it, say a __unicode__ method that returns a description of the product and its price. You can now turn it into a normal class again, with the constructor taking these same arguments. But you can also subclass a namedtuple:

class Product(namedtuple('Product', 'name price')):
    def __unicode__(self):
        return "{} (${})".format(self.name, self.price)

And it's still immutable. That's what I do when I need a pure data object. If you ever need it to become a mutable class, make one of the attributes something mutable, or turn it into a normal class with the same interface after all.

what is the accepted style python for data objects [duplicate]

Tags:

python

pm100

2 Answers

Steve Jessop

RemcoGerlich

Recent Activity

Donate For Us

what is the accepted style python for data objects [duplicate]

Tags:

python

pm100

2 Answers

Steve Jessop

RemcoGerlich

Related questions

Recent Activity

Donate For Us