This is why I block in a htaccess:
# Bot Agent Block Rule
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (BOTNAME|BOTNAME2|BOTNAME3) [NC]
RewriteRule (.*) - [F,L]
This is still relying on the bot being nice enough to tell you that it’s a bot; it could just not.
Exactly. The only truly effectively way I’ve ever found to block bots is to use a service like Akamai. They have an add-on called Bot Manager that identifies requests as bots in real time. They have a library of over 1000 known bots and can also identify unknown bots built on different frameworks, bots that impersonate well known bots like Googlebot, etc. This service is expensive, but effective…
How does this differentiate between a user and a bot if the User Agent doesn’t say it’s a bot?