Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ValueError: array is too big

Tags:

python

I am trying to merge two excel files using the following code and encountering the error of ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size.

import pandas as pd

file1 = pd.read_excel("file1.xlsx")
file2 = pd.read_excel("file2.xlsx")

file3 = file1.merge(file2, on="Input E-mail", how="outer")

file3.to_excel("merged1.xlsx")

File size is ~100MB+~100MB, Available Ram is 9GB (of 16GB)

like image 686
Nivas Avatar asked Apr 04 '17 19:04

Nivas


1 Answers

Your resulting dataframe can be much larger than your two input ones. Simple example:

import pandas as pd

values = pd.DataFrame({"id": [1,1,1,1], "value": ["a", "b", "c", "d"]})

users = pd.DataFrame({"id": [1,1,1], "users": ["Amy", "Bob", "Dan"]})

big_table = pd.merge(users, values, how="outer")

print big_table

Result:

     id  users    value
0     1   Amy       a
1     1   Amy       b
2     1   Amy       c
3     1   Amy       d
4     1   Bob       a
5     1   Bob       b
6     1   Bob       c
7     1   Bob       d
8     1   Dan       a
9     1   Dan       b
10    1   Dan       c
11    1   Dan       d
like image 114
Akavall Avatar answered Nov 12 '22 00:11

Akavall