Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to measure exactly what "data out" in Azure web app?

I have a web app in Azure, which has roughly 100k visitors a month, with less than 2 page views pr session (purely SEO visitors).

I just studied our Azure bills, and was shocked to find out that during last month we 3.41 TB of data out.

Terabyte.

This makes absolutely no sense. Our average page size is less than 3mb (a lot, but not 30mb which the math would say). The total data out should in practice be:

3431000 (mb) / 150000 (sessions) = 23mb pr session, which is absolutely bogus. A result from a service such as Pingdom says:

result from Pingdom

(Seems Stack.Imgur is down - temp link: http://prntscr.com/gvzoaz )

My graph looks like this, and it's not something that just came up. I have not analyzed our bills for a while, so this could easily have been going on for a while:

Azure data out

(Seems Stack.Imgur is down - temp link: http://prntscr.com/gvzohm )

The pages we have most visits on are an autogenerated SEO page which reads from a database with +3mio records, but it's quite optimized and our databases are not that expensive. The main challenge is the data out, which costs a lot.

However, how do I go about any test this? Where do I start?

My architecture:

I honestly believe that all my resources are in the same area. Here is a screenshot of my main killers of usage - my app and database :

App:

enter image description here

enter image description here

Database:

enter image description here

All my resources:

enter image description here

like image 851
Lars Holdgaard Avatar asked Oct 11 '17 09:10

Lars Holdgaard


1 Answers

After some very good help from a Ukraine developer I found on Upwork, we've finally solved the issue.

The challenge was in our robots.txt.

It turned out, that we had SO many requests on our pages - and we have 3.6 mill address pages - that it simply was a HUGE amount of requests. That's why the data out was so big.

We have now solved it by:

  • Adding a robots.txt which disallow all bots but Google and Bing
  • Adjusted the Google crawl speed in the Webmaster Tools
  • Adjusted our sitemap from monthly to yearly changefreq for our address pages to avoid re-crawling

I'm happy!

like image 61
Lars Holdgaard Avatar answered Oct 15 '22 00:10

Lars Holdgaard