zvi: self-portrait: short, fat, black dyke in bunny slippers (Default)
still kind of a stealthy love ninja ([personal profile] zvi) wrote in [site community profile] dw_suggestions2009-08-17 12:49 pm

Finer grained control of robots.txt

Title:
Finer grained control of robots.txt

Area:
privacy, account settings

Summary:
Allow users to upload their own robots.txt file

Description:
The current robots.txt file generated when you check "Minimize my journal's inclusion in search engine results" forbids all automated robots from indexing your journal.

However, not all robots originate with search engines. For instance, someone who did not want their journal to be googlable might still want the backup provided by archive.org. (Or, perhaps it's the other way around, you do want Google to search, but you don't want a permanent record.) Browsershots.org uses a robot to display a web page in a variety of different browsers. I'm sure there are other webservices out there (translation? found art?) which use robots and obey robots.txt directives.

I suggest permitting users to upload a robots.txt file for their journal, so they can have finer grained control of their privacy.

If everybody having a robots.txt file is too expensive, let people add individual robots to allow and have the service automatically generate a robots.txt which allows those user-agents on a person's journal.

Poll #1022 Finer grained control of robots.txt
Open to: Registered Users, detailed results viewable to: All, participants: 40


This suggestion:

View Answers

Should be implemented as-is.
24 (60.0%)

Should be implemented with changes.
6 (15.0%)

Shouldn't be implemented.
1 (2.5%)

(I have no opinion)
8 (20.0%)

(Other: please comment)
1 (2.5%)

ratcreature: RatCreature is thinking: hmm...? (hmm...?)

[personal profile] ratcreature 2009-08-17 06:27 pm (UTC)(link)
I wouldn't mind more control, but I'm not really familiar with the syntax. On my website I just allow all robots, so I wouldn't be sure how to write one that does anything advanced. Maybe if there was some sort of wizard or something to help you to set it up...
turlough: castle on mountain top in winter, Burg Hohenzollern (Default)

[personal profile] turlough 2009-08-17 06:48 pm (UTC)(link)
Yeah, pretty much what I was going to say.
ratcreature: RatCreature is thinking: hmm...? (hmm...?)

[personal profile] ratcreature 2009-08-17 09:15 pm (UTC)(link)
I could probably figure it out there, and in any case for me it wouldn't matter so much because I'd go from allowing allowing everything to maybe blocking some services I'm not fond of. I'm more thinking that if someone had previously chosen a keep robots away (at least if they follow the file) and then say wants to allow robot X for something useful, but makes some unfortunate syntax error, their journal might get indexed in lots of places before they notice their error, since it's not as if some incorrect robots.txt breaks anything visible.
azurelunatic: Vivid pink Alaskan wild rose. (Default)

[personal profile] azurelunatic 2009-08-18 10:44 am (UTC)(link)
Hmm.

On the one hand, it would be good to at least have some sort of parser so you could see what would get blocked and what would not.

On the other hand, there are a lot of things that are Deep Magic That One Oughtn't Mess With Unless One Knows. (Though in any case there should be the ability to quickly turn back on the default.)

On the gripping hand, impolite robots do not even abide by them, and it is all public content unless said bot has an account that has granted access.
7rin: (Default)

[personal profile] 7rin 2009-08-18 12:04 am (UTC)(link)
I think - at the very least - if this is going to happen, then the link needs to be included. I've voted for "with changes" because of this, but personally, I can't see me ever using it anyway.

I dunno, I presume it works the same for community accounts as it does for personal accounts, at least on the robot side? Would it be the same for both on the DW side too?
triadruid: Pseudocode for "If nothing else, remember this." (codemonkey)

[personal profile] triadruid 2009-08-20 09:59 pm (UTC)(link)
Agreed. The option for those of us who do speak Robot to customize it might aid some users, while keeping your general Clueless Human away from the shiny red candy-like buttons. :)
mark: A photo of Mark kneeling on top of the Taal Volcano in the Philippines. It was a long hike. (Default)

[staff profile] mark 2009-08-17 09:48 pm (UTC)(link)
I like this. It wouldn't be a heavy hit on the server and I could see this being pretty easy to implement.
msilverstar: catherine tate looking skeptical (skeptical)

[personal profile] msilverstar 2009-08-22 06:50 pm (UTC)(link)
I think this would give people an un-warranted feeling of safety. If it's public on the internet, it's OUT, and irresponsible or malicious search engines may well ignore robots.txt. The more control over robots.txt, the more people might think it matters.

To get a semblance of privacy, journal entries have to have controlled access. And even then, it's not totally reliable.