Merging dataframes on an index is more efficient in Pandas

Question

Why is merging dataframes in Pandas on an index more efficient (faster) than on a column?

import pandas as pd

# Dataframes share the ID column
df = pd.DataFrame({'ID': [0, 1, 2, 3, 4],
                   'Job': ['teacher', 'scientist', 'manager', 'teacher', 'nurse']})

df2 = pd.DataFrame({'ID': [2, 3, 4, 5, 6, 7, 8],
                    'Level': [12, 15, 14, 20, 21, 11, 15], 
                    'Age': [33, 41, 42, 50, 45, 28, 32]})

enter image description here

df = df.set_index('ID')
df2 = df2.set_index('ID')

enter image description here

This represents about a 3.5 times speed up! (Using Pandas 0.23.0)

Reading through the Pandas internals page it says an Index "Populates a dict of label to location in Cython to do O(1) lookups." Does this mean that doing operations with an index is more efficient than with columns? Is it a best practice to always use the index for operations such as merges?

I read through the documentation for joining and merging and it doesn't explicitly mention any benefits to using the index.

ntg · Accepted Answer

The reason for this is that the DataFrame's index is backed by a hash table.

To merge two sets, we need to find for each element of the first the corresponding in the second (if it exists) Searching is significantly faster if supported by a hash table because searching in an unsorted list is O(N), while in a list supported by a hash function ~O(1).

One strategy that could be faster to merge columns would be to first create a hash table for the smallest of the two. Still that means that the merge will be slower by the time it takes to create this dict.

Merging dataframes on an index is more efficient in Pandas

Tags:

python

merge

pandas

dataframe

willk

Video Answer

1 Answers

ntg

Recent Activity

Donate For Us

Merging dataframes on an index is more efficient in Pandas

Tags:

python

merge

pandas

dataframe

willk

Video Answer

1 Answers

ntg

Related questions

Recent Activity

Donate For Us