I'm working on a site that has mixed-case urls, similar to youtube. We generate IDs on the server, and I chose base 62 (numbers, lower and uppercase letters) so they would be shorter. So the urls might be something like example.com/user/123AbCaBc
The facebook robot seems to be hitting my site regularly with an all-lowercase version example.com/user/123abcabc
This causes a 404 error as the all-lowercase ID isn't in the database.
According to the logs, there aren't other user agents creating 404s, so this is for sure a robot and not a human. Here's the user agent I'm seeing:
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
This happens about once every 4 minutes. I'm not currently logging non-404 hits, so I'm not sure if there are others to the non-lowercase version.
The server tech here is nodejs / mongodb, but I don't see how that is relavant to the issue at hand.
Is there something I can do to fix facebook? Is there a problem here, or should I squealch these log errors? Anyone else have a similar problem?
It's possible that you Node "Webserver application" (are you using Express?) currently doesn't support byte ranges. The Facebook crawler apparantly has the behaviour to fallback on lowercasing the URL as described here:
Have a look at
on how to fix this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With