Elementary Spam Detection with Django

Author: daoosaigon. Published on Thursday 29 September 2011 at 12:48 p.m. Categories: Django. Tags: spam, moderation, filter, Django. Comments: 2

I leave my site alone for a few weeks, and take a short holiday in the UK. When I come back, I find 20 spam comments on my blog. I better do something about that. Here's some code I've implemented to fight the problem.

The first is to set up the system to send an email to myself anytime I receive a new comment. So I define a new ThePostModerator class that subclasses the existing CommentModerator class like this. I also have to register ThePostModerator with ThePost, the class I use for blog posts.

from django.contrib.comments.moderation import CommentModerator, moderator, AlreadyModerated;
class ThePostModerator(CommentModerator):
    email_notification = True
    enable_field = 'comments_permitted'
 
try:
    moderator.register(ThePost, ThePostModerator)
except AlreadyModerated:
    pass;

However, there is one problem. Email notifications depend on the existence of a comment_notification_email.txt file in the templates/comments directory, which is not provided by default. So you may get a missing template error like I did. This is an error with Django, and is the subject of Bug #14646. So what do you do? Write your own comment_notification_email.txt file.

New comment on {{ content_object }} submitted on {{ comment.submit_date }}.

Commenter's name: {{ comment.name }}
Commenter's email: {{ comment.email }}
Commenter's URL: {{ comment.user_url }}

------------------------------------------
{{ comment.comment }}

The previous lets me know instantly when new comments arrive, but do not do anything about moderating them. So I can write a moderate_comments routine like this, and then attach a receiver decorator to hook it up to the Comment class:

ahrefmatch = re.compile("<\s*[A|a]\s+[href|HREF]"); 
 
emailshitlist = ("jksper6666@gmail.com",
# Other email addresses...)
    
urlshitlist = ("http://cheap-link-building.com/",
# Other URLs...);
 
 
@receiver(pre_save, sender=Comment)
def moderate_comments(sender, instance, **kwargs):
    """ Extra code. If any of the following happens:
        1. HTML anchor code in comment. 
        2. BBCode equivalent "[url]".
        3. URLs in shitlist. 
        4. Person in shitlist.
        
        Then comment is automatically placed in moderation.
    """
    if not instance.id: # Only check when the comment is first saved.
        if ahrefmatch.search(instance.comment):
            instance.is_public = False;
        elif "[url]" in instance.comment or "[URL]" in instance.comment:
            instance.is_public = False;
        elif instance.user_url in urlshitlist:
            instance.is_public = False;
        elif instance.user_email in emailshitlist:
            instance.is_public = False;

Unlike most other blogs, my comments are pure text: I do not desire comments with HTML anchors, or BBCode equivalents. So anyone trying to pop them into my blog is likely to be a spammer.

This is my idiosyncratic method for fighting spam. For Django programmers looking for more consistent spam-fighting mechanisms, see Patrick Beeson on how Akismet can make your blog less cluttered with rubbish. Have fun.

Comments

http://wvrbyldp.com/

1 SealTaicheHal
Thursday 13 October 2011 at 7:09 a.m.

[url=http://wvrbyldp.com]Hello :)[/url]

http://www.pkmurphy.com.au/blog/

2 Peter Murphy
Tuesday 18 October 2011 at 8:46 p.m.

Normally I would delete or edit spam such as the first comment above. But in this case, I'll take it as test data for my spam-fighting algorithm.

Comments are closed.

Võ Nguyên Giáp: 100 years old today. Cultural Competency and Indigenous Education