Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark on Windows - What exactly is winutils and why do we need it?

I'm curious! To my knowledge, HDFS needs datanode processes to run, and this is why it's only working on servers. Spark can run locally though, but needs winutils.exe which is a component of Hadoop. But what exactly does it do? How is it, that I cannot run Hadoop on Windows, but I can run Spark, which is built on Hadoop?

like image 351
lte__ Avatar asked Jul 06 '16 20:07

lte__


1 Answers

Though Max's answer covers the actual place where it's being referred. Let me give a brief background on why it needs it on Windows -

From Hadoop's Confluence Page itself -

Hadoop requires native libraries on Windows to work properly -that includes accessing the file:// filesystem, where Hadoop uses some Windows APIs to implement posix-like file access permissions.

This is implemented in HADOOP.DLL and WINUTILS.EXE.

In particular, %HADOOP_HOME%\BIN\WINUTILS.EXE must be locatable

And , I think you should be able to run both Spark and Hadoop on Windows.

like image 83
saurzcode Avatar answered Oct 16 '22 17:10

saurzcode