Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hadoop fs -ls out of memory error

Tags:

hadoop

hdfs

I have 300000+ files on a HDFS data directory.

When I do a hadoop fs -ls and I am getting an out of memory error saying GC Limit has exceeded. The cluster nodes have 256 GB of RAM each. How do I fix it?

like image 219
Sambit Tripathy Avatar asked Dec 26 '22 06:12

Sambit Tripathy


2 Answers

You can make more memory available to the hdfs command by specifying 'HADOOP_CLIENT_OPTS'

HADOOP_CLIENT_OPTS="-Xmx4g" hdfs dfs -ls /

Found here: http://lecluster.delaurent.com/hdfs-ls-and-out-of-memory-gc-overhead-limit/

This fixed the problem for me, I had over 400k files in one directory and needed to delete most but not all of them.

like image 192
Jack Davidson Avatar answered Jan 13 '23 04:01

Jack Davidson


Write a python script to split the files into multiple directories and run through them. First of all what are you trying to achieve when you know you have 300000+ files in a directory. If you want to concatenate better arrange them into sub dirs.

like image 43
Anay T Avatar answered Jan 13 '23 02:01

Anay T