Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D debexpo
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 42
    • Issues 42
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • mentors.debian.net
  • debexpo
  • Merge requests
  • !157

Spam detection

  • Review changes

  • Download
  • Email patches
  • Plain diff
Merged Baptiste Beauplat requested to merge lyknode/debexpo:feature/registration-throttling into live Sep 19, 2020
  • Overview 0
  • Commits 5
  • Pipelines 2
  • Changes 6

Relates to #99 (closed).

Spam detection

This MR implement spam detection in two ways.

Time between when the registration form is requested and submitted

The following command show the time spent on the registration page:

declare -A hash; (grep '/accounts/register' /var/log/apache2/access-debexpo.log /var/log/apache2/access-debexpo.log.1; zgrep '/accounts/register' /var/log/apache2/access-debexpo.log.*.gz) | sort -k 4 |cut -d ' ' -f 1,4,6 | sed -e 's/^[^:]*://g' |tr -d '["' | while read ip time method; do if [[ $method == "GET" ]]; then hash[${ip}]="${time}"; else echo $ip $hash[${ip}] $(( $(date +%s -d "$(echo "$time" | sed -e 's,/,-,g' -e 's,:, ,')") - $(date +%s -d "$(echo "$hash[${ip}]" | sed -e 's,/,-,g' -e 's,:, ,')") )); fi; done | cut -d ' ' -f 3 | sort | uniq -c | sort -n

Output:

  COUNT SECONDS
      1 10
      1 11
      1 12
      1 2662
      1 27
      1 290
      1 34
      1 4
      1 48
      1 83
      1 95
      2 23
      2 3
      2 6
      2 7
     10 2
    154 1
   2256 0

We can see that most of spammer spend less than 2 seconds before submitting the form. I manually reviewed the times compared to activated account in db and it appears that a human spend at least 6-7 seconds (for the fastest) while usually being closer to 10 to 20 seconds.

In this MR, on form request, a timestamp is stored in the session (server-side, stored in db). On submission, the timestamp is tested and must be greater than REGISTRATION_MIN_ELAPSED. I set it by default to 5 seconds.

Max number of registration per IP

The following command shows the number of request per IP and per day:

declare -A hash; (grep '/accounts/register' /var/log/apache2/access-debexpo.log /var/log/apache2/access-debexpo.log.1; zgrep '/accounts/register' /var/log/apache2/access-debexpo.log.*.gz) | grep POST | sort -k 4 |cut -d ' ' -f 1,4 | sed -e 's/^[^:]*://g' -e 's/:.*//g' |tr -d '["' | while read ip day; do echo $(echo $ip | sha256sum | cut -c 1-7) $day; done | sort | uniq -c | sort -n

Output (IP is hashed for privacy):

  COUNT    IP      DAY
      2 e194673 12/Sep/2020
      2 e194673 14/Sep/2020
      2 e194673 15/Sep/2020
      2 e194673 18/Sep/2020
      2 e85186a 06/Sep/2020
      2 ea292cf 06/Sep/2020
      2 ea292cf 09/Sep/2020
     39 a2f40e8 19/Sep/2020
     40 082b225 15/Sep/2020
     49 082b225 07/Sep/2020
     59 bea6192 19/Sep/2020
     62 a2f40e8 12/Sep/2020
    105 a2f40e8 18/Sep/2020
    112 bea6192 12/Sep/2020
    143 bea6192 18/Sep/2020
    163 a2f40e8 11/Sep/2020
    226 a2f40e8 07/Sep/2020
    247 a2f40e8 15/Sep/2020
    298 bea6192 11/Sep/2020
    363 bea6192 15/Sep/2020
    409 bea6192 07/Sep/2020

In this MR, each IP is stored (hashed for privacy) for REGISTRATION_CACHE_TIMEOUT (default to 1 day). Each time a registration is processed, the cached key is incremented and when it reaches the limit defined by REGISTRATION_PER_IP, it will be rejected.

Both of those measures should prevent most of the spammer to utilize mentors while having no effect on human users. If a single IP should legitimately register more than 5 accounts a day, a contact with the support would be enough to tweak the settings.

Spam detection can be disable with REGISTRATION_SPAM_DETECTION = False.

Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: feature/registration-throttling