Right now, robots.txt on lemmy.ca is configured this way

User-Agent: *
  Disallow: /login
  Disallow: /login_reset
  Disallow: /settings
  Disallow: /create_community
  Disallow: /create_post
  Disallow: /create_private_message
  Disallow: /inbox
  Disallow: /setup
  Disallow: /admin
  Disallow: /password_change
  Disallow: /search/
  Disallow: /modlog

Would it be a good idea privacy-wise to deny GPTBot from scrapping content from the server?

User-agent: GPTBot
Disallow: /

Thanks!

You are viewing a single thread.
View all comments View context
6 points

Probably want == instead else we will all be forbidden

permalink
report
parent
reply
3 points
*

I would have thought so too, but == failed the syntax check

2023/08/07 15:36:59 [emerg] 2315181#2315181: unexpected "==" in condition in /etc/nginx/sites-enabled/lemmy.ca.conf:50

You actually want ~ though because GPTBot is just in the user agent, it’s not the full string.

permalink
report
parent
reply
2 points

Strangely, = works the same as == with nginx. It’s a very strange config format…

https://nginx.org/en/docs/http/ngx_http_rewrite_module.html#if

permalink
report
parent
reply
1 point

Look at me! I’m the GPTBot now!

permalink
report
parent
reply

Lemmy.ca Support / Questions

!lemmy_ca_support@lemmy.ca

Create post

Support / Questions specific to lemmy.ca.

For support / questions related to the lemmy software itself, go to !lemmy_support@lemmy.ml

Community stats

  • 26

    Monthly active users

  • 91

    Posts

  • 318

    Comments