I have a several crawlers that crawls multiple sites and stores the contents in a database. The logs from the program are stored in CloudWatch Logs.
If the crawlers successfully pulls back content it looks like similarly to below
HTTP GET: 200 - https://www.thecheyennepost.com/news/national/r
HTTP GET: 200 - https://www.thecheyennepost.com/news/f-e-warren-hous
The issue I'm dealing with is identifying when 400 errors pop up. Below is an example:
HTTP GET: 429 - https://www.livingstonparishnews.com/search/?l=25&sort=
HTTP GET: 429 - https://www.livingstonparishnews.com/search/?l=25&sort=rele
HTTP GET: 429 - https://www.ktbs.com/search/?l=25&s=start_time&sd=desc&f=
I tried using status_code=4*
but that didn't do anything
I just want to be able to filter any and all 400 errors.
Any help that can be provided would be greatly appreciated.
Yes! Now you can with Logs Insights :)
First... you need to have the new UI or in another way go to "Logs Insights" service... jaja
CloudWatch -> CloudWatch Logs -> Log groups -> [your service logs]
With the new UI you can see this button (or go to Logs Insights in the search engine of aws cli):
Now you can see this:
Now in your case.. you need this query (tell me if you need to filter another thing)
fields @message
| sort @timestamp desc
| filter @message like /4{1}[0-9]{1}[0-9]{1}/
I see your logs and you have spaces between your status code and I think this is the best
fields @message
| sort @timestamp desc
| filter @message like / 4{1}[0-9]{1}[0-9]{1} /
And that's all
Now run the query and you will see only logs that contains status codes [4xx]. I hope that solve your problem
NOTE: if you go directly from search engine to Logs Insights you need to select the service logs that you scan with the query. On the combobox in top of query box.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With