Elementary Spam Detection with Django
. . .
I leave my site alone for a few weeks, and take a short holiday in the UK. When I come back, I find 20 spam comments on my blog. I better do something about that. Here's some code I've implemented to fight the problem.
The first is to set up the system to send an email to myself anytime I receive a new comment. So I define a new ThePostModerator class that subclasses the existing CommentModerator class like this. I also have to register ThePostModerator with ThePost, the class I use for blog posts.
from django.contrib.comments.moderation import CommentModerator, moderator, AlreadyModerated;
class ThePostModerator(CommentModerator):
email_notification = True
enable_field = 'comments_permitted'
try:
moderator.register(ThePost, ThePostModerator)
except AlreadyModerated:
pass;
However, there is one problem. Email notifications depend on the existence of a comment_notification_email.txt file in the templates/comments directory, which is not provided by default. So you may get a missing template error like I did. This is an error with Django, and is the subject of Bug #14646. So what do you do? Write your own comment_notification_email.txt file.
New comment on {{ content_object }} submitted on {{ comment.submit_date }}.
Commenter's name: {{ comment.name }}
Commenter's email: {{ comment.email }}
Commenter's URL: {{ comment.user_url }}
------------------------------------------
{{ comment.comment }}
The previous lets me know instantly when new comments arrive, but do not do anything about moderating them. So I can write a moderate_comments routine like this, and then attach a receiver decorator to hook it up to the Comment class:
ahrefmatch = re.compile("<\s*[A|a]\s+[href|HREF]");
emailshitlist = ("jksper6666@gmail.com",
# Other email addresses...)
urlshitlist = ("http://cheap-link-building.com/",
# Other URLs...);
@receiver(pre_save, sender=Comment)
def moderate_comments(sender, instance, **kwargs):
""" Extra code. If any of the following happens:
1. HTML anchor code in comment.
2. BBCode equivalent "[url]".
3. URLs in shitlist.
4. Person in shitlist.
Then comment is automatically placed in moderation.
"""
if not instance.id: # Only check when the comment is first saved.
if ahrefmatch.search(instance.comment):
instance.is_public = False;
elif "[url]" in instance.comment or "[URL]" in instance.comment:
instance.is_public = False;
elif instance.user_url in urlshitlist:
instance.is_public = False;
elif instance.user_email in emailshitlist:
instance.is_public = False;
Unlike most other blogs, my comments are pure text: I do not desire comments with HTML anchors, or BBCode equivalents. So anyone trying to pop them into my blog is likely to be a spammer.
This is my idiosyncratic method for fighting spam. For Django programmers looking for more consistent spam-fighting mechanisms, see Patrick Beeson on how Akismet can make your blog less cluttered with rubbish. Have fun.
Comments
2
Tuesday 18 October 2011 at 8:46 p.m.
Normally I would delete or edit spam such as the first comment above. But in this case, I'll take it as test data for my spam-fighting algorithm.
Comments are closed.
‹ Võ Nguyên Giáp: 100 years old today. Cultural Competency and Indigenous Education ›
1 SealTaicheHal
Thursday 13 October 2011 at 7:09 a.m.
[url=http://wvrbyldp.com]Hello :)[/url]