Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to monitor and control DPU usage in AWS Glue Crawlers

In the docs it's said that AWS allocates by default 10 DPUs per ETL job and 5 DPUs per development endpoint by default, even though both can have a minimum of 2 DPUs configured.

It's also mentioned that Crawling is also priced on second increments and with a 10 minute minimum run, but nowhere is specified how many DPUs are allocated. Jobs and Development Endpoints can be configured in the Glue console to consume less DPUs, but I haven't seen any such configuration for the crawlers.

Is there a fixed amount of DPUs per crawler? Can we control that amount?

like image 443
villasv Avatar asked Mar 07 '18 21:03

villasv


People also ask

How do you increase DPU in AWS Glue?

Right now there is no way to configure DPU memory, but you can request a limit increase on your account to be able to use more DPUs. Show activity on this post. As of April 2019, there are two new types of workers: You can now specify a worker type for Apache Spark jobs in AWS Glue for memory intensive workloads.

How many DPU do I need?

Apache Spark and Spark Streaming job runs require a minimum of 2 DPU. By default, AWS Glue allocates 10 DPU to each Apache Spark job and 2 DPU to each streaming job. Jobs using AWS Glue version 0.9 or 1.0 have a 10-minute minimum billing duration, while jobs that use Glue versions 2.0 and later have a 1-minute minimum.


2 Answers

This is my conversation with AWS Support about this subject:

Hello, I'd like to know how many DPUs a crawler uses in order to calculate my costs with crawlers.

Their answer:

Dear AWS Customer,

Thank you for reaching out today. My name is Safari, I will assist with your case.

I understand that while compiling the cost of your Glue crawlers, you'd like to know the amount of DPUs a particular crawler uses.

Unfortunately, there is no direct way to find out the DPU consumption by a given crawler. I apologize for the inconvenience. However, you may see the total DPU consumption across all crawlers in your detailed bill under the section AWS Service Charges > Glue > {region} > AWS Glue CrawlerRun. Additionally, you can add tags to your crawlers and then enable "Cost Allocation Tags" from your AWS Billing and Cost Management console. This would allow AWS to generate a cost allocation report grouped by the predefined tags. For more on this, please see the documentation link below [1].

I hope this helps. Please let me know if I can provide you with any other assistance.

References [1]: https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html

like image 149
Torrico Avatar answered Nov 09 '22 23:11

Torrico


Discussed with AWS support team as well, and currently its not possible to modify or view the DPU configuration details for Glue - crawlers. But, does crawlers use a DPU?

like image 32
Yuva Avatar answered Nov 09 '22 22:11

Yuva