My environment is spark standalone cluster. I need to have hive thrift server enabled to allow JDBC access to a parquet file. At the same time I need to have one java application (that is using HiveContext) to be launched, while thrift server is running, to load some data inside the same parquet file. I have made my experiments both with metastore standalone based on DERBY (default) and with metastore managed by mysql database. The problem that I have is almost the same. If the thrift server is running the java applications gets 0 cores so it is waiting for thrift server to be shut down (and then the java application closes it processing correctly) while, if the application is running, the thrift server cannot even start. In fact the thrift server gets as many cores, and references as many worker threads, as much as available, not letting other applications to get resources. Is it possible to reduce the nr of worker processes allocated to the thrift server? Apparently there is no specific configuration to manage this parameter.
I don't think is a matter of nr. of cores because I can add more cores and the result is identical.
Can you please advice me on the topic? Thanks a lot.
put these two parameters in spark-defaults.sh in conf folder of spark, these parameters is for allowing maximum resources for 1 application.
spark.cores.max=max number of cores(ex. 2)
spark.executor.memory=max memory allowed(ex. 2024M)
or you can try to run spark on yarn mode.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With