Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lists are for homogeneous data and tuples are for heterogeneous data... why?

Tags:

python

I feel like this must have been asked before (probably more than once), so potential apologies in advance, but I can't find it anywhere (here or through Google).

Anyway, when explaining the difference between lists and tuples in Python, the second thing mentioned, after tuples being immutable, is that lists are best for homogeneous data and tuples are best for heterogeneous data. But nobody seems to think to explain why that's the case. So why is that the case?

like image 888
MQDuck Avatar asked Jul 20 '14 19:07

MQDuck


People also ask

Is tuple is homogeneous or heterogeneous?

Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking or indexing (or even by attribute in the case of namedtuples ). Lists are mutable, and their elements are usually homogeneous and are accessed by iterating over the list.

Is list is homogeneous or heterogeneous?

List is one of the most commonly used built-in data structures, for storing a collection of values. List can contain heterogeneous values such as integers, floats, strings, tuples, lists, and dictionaries but they are commonly used to store collections of homogeneous objects. Python lists are mutable sequences.

Why lists are heterogeneous in Python?

The items stored can be of any type numeric, string, boolean, objects, etc which makes it heterogeneous. This means that a list can have any type of data and we can iterate over this list using any type of loop.

What is the difference between homogeneous data and heterogeneous data?

Homogeneous data structures are ones that can only store a single type of data (numeric, integer, character, etc.). Heterogeneous data structures are ones that can store more than one type of data at the same time.


3 Answers

First of all, that guideline is only sort of true. You're free to use tuples for homogenous data and lists for heterogenous data, and there may be cases where that's a fine thing to do. One important case is if you need the collection to the hashable so you can use it as a dictionary key; in this case you must use a tuple, even if all the elements are homogenous in nature.

Also note that the homogenous/heterogenous distinction is really about the semantics of the data, not just the types. A sequence of a name, occupation, and address would probably be considered heterogenous, even though all three might be represented as strings. So it's more important to think about what you're going to do with the data (i.e., will you actually treat the elements the same) than about what types they are.

That said, I think one reason lists are preferred for homogenous data is because they're mutable. If you have a list of several things of the same kind, it may make sense to add another one to the list, or take one away; when you do that, you're still left with a list of things of the same kind.

By contrast, if you have a collection of things of heterogenous kinds, it's usually because you have a fixed structure or "schema" to them (e.g., the first one is an ID number, the second is a name, the third is an address, or whatever). In this case, it doesn't make sense to add or remove an element from the collection, because the collection is an integrated whole with specified roles for each element. You can't add an element without changing your whole schema for what the elements represent.

In short, changes in size are more natural for homogenous collections than for heterogenous collections, so mutable types are more natural for homogenous collections.

like image 88
BrenBarn Avatar answered Oct 07 '22 05:10

BrenBarn


The difference is philosophical more than anything.

A tuple is meant to be a shorthand for fixed and predetermined data meanings. For example:

person = ("John", "Doe")

So, this example is a person, who has a first name and last name. The fixed nature of this is the critical factor. Not the data type. Both "John" and "Doe" are strings, but that is not the point. The benefit of this is unchangeable nature:

  1. You are never surprised to find a value missing. person always has two values. Always.

  2. You are never surprised to find something added. Unlike a dictionary, another bit of code can't "add a new key" or attribute

This predictability is called immutability It is just a fancy way of saying it has a fixed structure.

One of the direct benefits is that it can be used as a dictionary key. So:

some_dict = {person: "blah blah"}

works. But:

da_list = ["Larry", "Smith"]
some_dict = {da_list: "blah blah"}

does not work.

Don't let the fact that element reference is similar (person[0] vs da_list[0]) throw you off. person[0] is a first name. da_list[0] is simply the first item in a list at this moment in time.

like image 28
JohnAD Avatar answered Oct 07 '22 05:10

JohnAD


It's not a rule, it's just a tradition.

In many languages, lists must be homogenous and tuples must be fixed-length. This is true of C++, C#, Haskell, Rust, etc. Tuples are used as anonymous structures. It is the same way in mathematics.

Python's type system, however, does not allow you to make these distinctions: you can make tuples of dynamic length, and you can make lists with heterogeneous data. So you are allowed to do whatever you want with lists and tuples in Python, it just might be surprising to other people reading your code. This is especially true if the people reading your code have a background in mathematics or are more familiar with other languages.

like image 4
Dietrich Epp Avatar answered Oct 07 '22 03:10

Dietrich Epp