I am going to build a program (in Scala or Python - not yet decided) that is intensive on data manipulation. I see two mayor approaches:
I am not sure but the first approach might be more functional programming like, the second more OOP, is that right? By the way, I love both Functional Programming and OOP (some say they are opposites of each other, but Odersky tried his best to disprove that with Scala).
I prefer the second approach, because
However, I worry that if I have a lot of data (and I do), I will have a lot of memory consumption because the method might have to be instantiated so many times.
Leading me up to: Which approach should I choose?
Here a crude DataObject class:
class DataObject {
List datavalues
def mymethod(){
...
}
}
Which approach is best depends entirely on your problem. If you have only few operations, functions are simpler. If you have many operations which depend on the type/features of data, classes are efficient.
Personally, I prefer having classes for the same type of data to improve abstraction and modularity. Basically, using classes requires you to think about what your data is like, what is allowed on it and what is appropriate. It enforces that you separate, compartmentalize and understand what you are doing. Once you've done that, you can treat them like black boxes that just work.
I've seen many data-analysis programs fail because they just had functions working on arbitrary data. At first, it was simple computations. Then state needed to be preserved/cached, so data got appended or modified directly. Then someone realized that if you did x before you shouldn't do y later, so all sorts of flags, fields and other things get tacked on, which only functions a, b and d understood. Then someone added function f which extended on that, while someone added function k which extended it differently. That creates a cluster-foo that's impossible to understand, maintain, or trust in creating results.
So if you are unsure, do classes. You'll be happier in the end.
Concerning your second question, I can only answer that for python. However, many languages do it similarly.
Regular methods in python are defined on the class and created with it. That means the actual function represented by a method is shared by all instances, without memory overhead. Basically, a bare instance is just a wrapped reference to the class, from which methods are fetched. Only things exclusive to an instance, like data, add to memory notably.
Calling a method does add some overhead, because the method gets bound to the instance - basically, the function is fetched from the class and the first parameter self
gets bound. This technically incurs some overhead.
# Method Call
$ python -m timeit -s 'class Foo():' -s ' def p(self):' -s ' pass' -s 'foo = Foo()' 'foo.p()'
10000000 loops, best of 3: 0.158 usec per loop
# Method Call of cached method
$ python -m timeit -s 'class Foo():' -s ' def p(self):' -s ' pass' -s 'foo = Foo()' -s 'p=foo.p' 'p()'
10000000 loops, best of 3: 0.0984 usec per loop
# Function Call
$ python -m timeit -s 'def p():' -s ' pass' 'p()'
10000000 loops, best of 3: 0.0846 usec per loop
However, practically any operation does this; you'll only notice the added overhead if your applications does nothing else but call your method, and the method also does nothing.
I've also seen people write data analysis applications with so many levels of abstraction that in fact they mostly just called methods/functions. This is a smell of writing code in general, not whether to use methods or functions.
So if you are unsure, do classes. You'll be happier in the end.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With