I am running multiple queries on the hive. I have a Hadoop cluster with 6 nodes. Total vcores in the cluster is 21.
I need only 2 cores to be allocated to a python process so that the rest of the available cores will be used by another main process.
Code
from pyhive import hive
hive_host_name = "subdomain.domain.com"
hive_port = 20000
hive_user = "user"
hive_password = "password"
hive_database = "database"
conn = hive.Connection(host=hive_host_name, port=hive_port,username=hive_user, database=hive_database, configuration={})
cursor = conn.cursor()
cursor.execute('select count(distinct field) from somedata')
As you have configured maximum 6 executors with 8 vCores and 56 GB memory each, the same resources, i.e, 6x8=56 vCores and 6x56=336 GB memory will be fetched from the Spark Pool and used in the Job. The remaining resources (80-56=24 vCores and 640-336=304 GB memory) from Spark Pool will remain unused and can be used in any other Spark Job.
To allocate a data set to a 3850 virtual volume, you must also have MOUNT authority, gained by using the TSO ACCOUNT command or by using the RACF PERMIT command for the TSO AUTH general resource class.
You have two CPUs with 4 cores each, which is 8 cores in total. Theoretically in Hyper-V you can assign up to 8 cores to a single machine. But you will need at least 1 core for the host, so 7 cores will be left for VMs. Two VMs with 4 cores each + the host on this server won't be effective and you will experience performance issues pretty soon.
According to our vendor, we can specify maximum available CPU cores to each VM. i.e, in a Quadcore processor, we can specify 4 cores of CPU for each VM in Hyper-V. But our Network Support team says this is not possible and we can only define maximum of 8 cores (4 cores in each processor), hence we can have only 2 VMs (4+4) in the Host.
Try passing following setting in the configuration map:
yarn.nodemanager.resource.cpu-vcores=2
Default value is 8 for this setting.
Description: Number of CPU cores that can be allocated for containers.
Your updated code will be like:
from pyhive import hive
hive_host_name = "subdomain.domain.com"
hive_port = 20000
hive_user = "user"
hive_password = "password"
hive_database = "database"
configuration = {
"yarn.nodemanager.resource.cpu-vcores": 2
}
conn = hive.Connection( \
host=hive_host_name,
port=hive_port,
username=hive_user,
database=hive_database,
configuration=configuration
)
cursor = conn.cursor()
cursor.execute('select count(distinct field) from somedata')
Reference URL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With