dw_suggestions | Sitewide antispam capability: comment CAPTCHAs for inactive accounts

Entry tags:

Sitewide antispam capability: comment CAPTCHAs for inactive accounts

Title:
Sitewide antispam capability: comment CAPTCHAs for inactive accounts

Area:
comments, anonymous users, inactive users, antispam

Summary:
When an account becomes inactive (discussion of what constitutes "inactive" for the purposes of this concept to follow), require any anonymous comments to fill out a CAPTCHA. If/when the account becomes active again, revert to the user's settings. This would not delete anything already in the journal, would not stop logged-in users from commenting, and would allow anonymous users who could solve the CAPTCHA to comment.

Description:
Spun off the comments on http://dw-suggestions.dreamwidth.org/1374810.html --

Spam comments are a woe that should be discouraged, while not discouraging comments of legitimate discourse from real sentient beings.

Most comment spam on Dreamwidth is anonymous. One of the places that spammers strike is the accounts of people who have become inactive. If someone's not around for any particular reason, they generally can't get rid of any spam that shows up in their journal. Emboldened by the way their first overtures have not been repelled or cleaned up, the spammer strikes again, and again, and again.

Anonymous spam is present until cleaned up by the journal owner (or someone logged in as them). Registered user/OpenID spam is only present until someone (someone else hit by the spammer, or a good neighbor) reports the spammer and the spammer is suspended; all of the spam comments left by that user will then go away across the site.

For this reason, it is more important to attempt to repel anonymous spam in the event that the journal owner is not around and therefore not able to take action.

When the journal owner becomes inactive, and the journal allows anonymous comments, and the journal owner does not already present anonymous comments with a CAPTCHA, and anonymous comments are not screened by default (screening leaves the anonymous comments invisible to search engines unless the journal owner comes through and unscreens them, and by that time first the owner is active, and second, the owner is hardly going to unscreen spam on purpose unless there's a bigger problem) then there should be a sitewide setting to put up a CAPTCHA upon the attempted anonymous comments to those journals.

Now the definition of "inactive" for the purposes of probably not actively gardening journal comments. This should be something that can be adjusted on the administrative end of things should it not be got right on the first try. As a first attempt:

No new or edited entries in personal journal
No new or edited community entries? (Can we track this?)
No new or edited comments (from the journal, not to the journal, either in their own journal or abroad) (can we track this?)
No active login sessions

... for at least 60 days? Doing anything that touches one of the above things would start the clock over again. If someone logs in, leaves a comment in a community, deletes the comment, and logs out, that would restart the clock. If someone leaves themselves logged in after doing that, they would have until that login automatically expires before the clock starts.

Poll #11570 Sitewide antispam capability: comment CAPTCHAs for inactive accounts

Open to: Registered Users, detailed results viewable to: All, participants: 54

View Respondents

This suggestion:

View Answers

Should be implemented as-is.
39 (72.2%)

Should be implemented with changes. (please comment)
8 (14.8%)

Shouldn't be implemented.
0 (0.0%)

(I have no opinion)
7 (13.0%)

(Other: please comment)
0 (0.0%)

Flat | Top-Level Comments Only

From /stats

Total Accounts: 1712622
That are active in some way: 75715
That have ever posted an entry: 160079
That have posted an entry in last 30 days: 33277
That have posted an entry in the last 7 days: 18548
That have posted an entry in the last 24 hours: 7767

For the sake of simplicity, whatever the definition is of "active in some way" should be the definition for this as well, any account not active in some way counts as inactive.

So, IIRC, I count as active because I check my reading page and comment even though I've not posted for nearly a year, but someone who hasn't logged in in a period of time gets set inactive.

FWIW, that's what I had in mind when I said "inactive" in the comment thread you link to.

Seems like a good idea in principle. I would be in favour of notifying the user somehow (email? no point in an inbox notification if they're not using DW) when they become "inactive", on the principle of not giving people captchas when they have asked not to have them without explaining why. But, the notification might be perceived as spammy...

If we already have the last login date easily accessible in the database, would this be an acceptable way to determine whether somebody is active? I have little idea what I'm talking about here, but I imagine that it would be easier to implement (but a counter such as you describe might also be easy. I'll leave that to the people who have a clue ;-))

matgb is correct above: there's already an activity check for determining "active" for a bunch of things. :)

I said "with changes" only because 60 days seems light, and IME this type of comment spam tends toward entries that are at least six months old, if not older. 120 days seems a more reasonable length of time.

This. I would have said three months at least.
Also +1 to notifying the user that apparent inactivity status triggered the turning on of CAPTCHA tests for anonymous comments (if they're allowed).

My reason for specifying 60 days from the time of last active login as the initial brainstorming number is because the last information I had on leave-me-logged-in timeouts was that they expire 60 days from last account activity.

YES.

Spam comments are a woe that should be discouraged, while not discouraging comments of legitimate discourse from real sentient beings.

I don't agree that captchae serve the purpose outlined above. Specifically, there is some evidence that these tests do discourage comments from actual humans. Furthermore, the original poster appears to be claiming that humans who cannot solve such tests are either unreal or not thinking. Are the visually-impaired to be forever disadvantaged?

From the tenor of this and other recent suggestions, Dreamwidth seems to treat captchae as a magic bullet, something that will prevent spam in all its forms. Are there no other tools in the arsenal?

There's a question of social expectations. Dreamwidth has presented itself as a company that honours its contracts, and one that will not change policies without evidence. It would be wrong to present a captcha in any circumstance where the account holder has chosen against them.

Can the original poster provide evidence that this is a widespread problem? That the harm from spam is greater than the harm from overturning deliberate journal decisions? Absent such evidence, I could not support the proposal in any form.

The original suggester is the head of the antispam team, just for the record. :)

Spam is definitely a medium problem now, and if you look at similar services, it's a huge problem there: unless a services is diligent and vigilant about the issue as aggressively as possible, it becomes a major spam target and once that happens, it's a lot harder to address. (Look at InsaneJournal, for instance; I don't know what percentage of their activity is spam, but judging by comments to their news posts, the results of their random journal search, and the fact that whenever you load their stats page, you have pretty good chances of hitting over half spammers in the "recently created" and "recently updated" sections, I'd venture a guess of anywhere from 50-75% of their activity is spam no matter how hard they try to squash it.) The only way to keep a service from being a major spam magnet is to diligently and vehemently address even the smallest bits of spam and prove to spam networks that spending time trying to spam the service is not a good return on investment.

DW's default CAPTCHA implementation is no longer image-based, by the way; individual journal owners can choose to switch back, but by default site-wide, we use text-based captchas with a significantly lower false-block rate and a much, much higher standard of accessibility. They also have the advantage of being more resistant to the proxying attack that's the standard way of breaking captchas these days.

My preference is always to respond to the suggestion as it is presented. It wasn't at all obvious that Azure Lunatic was anything other than an interested observer.

Thank you also for clarifying the change in captcha format. I very much doubt that data exists to show how many (or few) humans are deterred from interacting by this form of test, or how many humans are deterred by the presence of any test.

there is constant research going into accessible captcha alternatives (as you know), and I would definitely like us to be looking into them at least once a year or so, whether we implement this suggestion or not. Just periodic checkups to make sure that there isn't something usable which is better than what we have. I keep a folder of them but tend not to spend the resources into looking into them -- I suppose that is kind of my job. ;-)

Oh, absolutely. Now that we have the framework for alternate captcha implementations (and soon, hopefully, we will have a viewer preference as well as a by-journal preference) it should be easy to toss in more if somebody comes up with something better.

I should also add, we have been very clear and up-front from the very beginning that there will be times when we need to place restrictions or make alterations to the site as a whole for the benefit of the service as a whole, and preventing the site from being overrun by spammers is definitely something I think qualifies.

Also: do you really think that someone who hasn't been active on the site in a year or whatever is going to be that upset over the service requiring commenters to solve a captcha in order to comment, when on multiple other services of similar reach and remit, not being active for a year is enough to get your entire account deleted?

That statement was meant as an ideal case, and meant to include all humans of every ability type and level (excluding those posting spam as piecework), and any nonhuman sentients who are also not spammers. I think CAPTCHAs are a tool that is far from ideal. I'm sorry I was not more clear about my feelings on this in the original entry.

The best way of telling spam from real comments is always going to be educating the journal owner about the many forms spam can take, and allowing the journal owner to report to the antispam team the comments that are spam. One of the more insidious forms of "test spam" is the irrelevant comment, which is often a comment that was probably written as part of legitimate discourse elsewhere, and then copied into a spammer's arsenal and blasted out to see where it would stick. It of course makes little to no sense when commented to the entry under attack, and the journal owner can generally tell, and tends to report or ask about it. An irrelevant comment could possibly fool spamwhackers who don't have the same knowledge about the normal flow of commenting on that particular journal, and a comment that makes no sense to the average spamwhacker might make perfect sense to the journal owner. (For example, if I were to yell "GET ON MY HORSE" at

zarhooie in an entry that has nothing to do with horses, she and I would understand what we were talking about, but we wouldn't expect anyone else to understand.)

In the absence of the journal owner actively weeding spam comments from legitimate comments, a journal that allows all anonymous comments is a spam magnet. And currently Dreamwidth's methods of fighting this are inadequate, and I don't know exactly how inadequate because that spam is not being reported to me until the owners come back. Every now and then there's a lump of incoming reports that include comments made quite some time ago, often including spam sources that were already addressed quite some time ago, but usually including some that we've never seen before. This pattern seems to be what happens when someone comes back from a hiatus and cleans up spam that was left in their absence. Cleaning up a spammed-up journal is an onerous task that I would not wish on anyone.

There are some things that I considered (however briefly) and then discarded; there are other things that I'll be discussing further with staff.

Allowing a site administrator to delete anonymous comments off an inactive journal, even comments that are obvious spam, horrifies me in a visceral way that I'm not entirely sure how to describe. It would be a very direct way of dealing with already left spam, but it would break every expectation of privacy and autonomy, would be a crappy task to carry out and would not scale well, would run a very real risk of deleting legitimate comments (and deleting legitimate comments already in place is something I consider as worse than leaving spam in place or frustrating a non-site-member or logged-out site member trying to leave an anonymous comment), and I also don't know whether having an administrator doing that would expose Dreamwidth to additional legal risk from spam that was not spotted and removed.

Turning off anonymous comments entirely would certainly block anonymous spammers, but would also block any and all legitimate anonymous comments.

Automatically screening all future anonymous comments would screen anonymous spam, but would screen any and all legitimate anonymous comments. This fails what I have started to call "the Jumpy test", for journals that still may receive comments, but whose owners will never return.

LiveJournal has a dedicated system that scans every incoming comment against a number of criteria to determine if any given comment is bad enough to be actively blocked, or merely suspicious enough to be automatically screened. Implementing a similar system would need to first be discussed with staff before I would put it through suggestions.

Presenting CAPTCHAs to logged-in users commenting identified just because the journal they are commenting to has become inactive would not serve any useful purpose that I could think of, and would (at minimum) annoy the logged-in user. This proposal is only about anonymous comments.

Boiled down to basics, the original poster want to test that the person making the comment is a person, and not a robospammer. Current technology does not permit a direct test, so proxy tests have to be used, one of which is the captcha. Proxy tests are poor and imperfect.

I don't disagree that Dreamwidth should make reasonable efforts to prevent spam posts from appearing on its servers. The original poster makes good points that the specific circumstances (anonymous comments on inactive journals) warrant some action.

It remains unclear that the circumstances outlined in the suggestion are particularly common. I'm not familiar enough with the Dreamwidth defaults to know if no-captcha-for-anon-comments can arise by accepting defaults. If that is the default case, I'm somewhat easier about Dreamwidth making minor changes to site behaviour than if the owner has consciously chosen not to show a test.

On further reflection, and acknowledging the evil nature of captchae, I would not raise tremendous objections to this as an interim resolution, pending a better (and fully automated) internal spam-check process.

There would need to be publicity about this, possibly including a message to the effect of "You're seeing this test because the journal owner hasn't logged in for some months." Such a message might also prod the owner to logon again.

I believe the default is in fact no anonymous comments allowed at all.

I agree with using the existing definition of "active in some way", and emailing the user to notify them of the change and that logging in or posting a comment or entry will turn it back off.

Flat | Top-Level Comments Only

Sitewide antispam capability: comment CAPTCHAs for inactive accounts

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject