Added a setting to .htaccess to prohibit bot traffic from accessing the site. | そう備忘録

Added a setting to .htaccess to prohibit bot traffic from accessing the site.

bot traffic

In a previous article, I wrote about a time when I received a large amount of access from bot traffic in one day.

I didn’t take immediate action at that time, but after that I had several accesses from the same site, so I added a setting to my .htaccss to prohibit access from bot traffic.

The reason why I didn’t take immediate action at that time was because I had already received information that people were accessing the site by changing the referrer address.

I thought that prohibiting access by specifying the referrer would be a weasel word, so I left it alone for a while.

However, now that I know the tendency and characteristics of referrers to some extent, I decided to exclude them by using regular expressions in “.htaccess”.

How to check the referrer (referrer source)

I checked it with Google Analytics.

  • Select behavior-> Site Content > All Pages from the menu.
  • Refine the dates in the target range
  • Add “Source/Medium” in the Secondary dimension

It turns out that “bot-traffic.icu” is the referrer.

Check referal

Main referrers

The referrers that I am aware of are as follows.

They all lead to the same site (I dare not post a screen shot).

  • bot-traffic.icu
  • bottraffic999.xyz
  • bottraffic143.xyz

Based on the above trend, there is a possibility that they will continue to change the number part (999, 143) to access the site.

They also seem to change the top level of the domain (the icu and xyz parts).

In some cases, “-” (hyphen) is inserted between “bot” and “traffic”, and in other cases, it is not.

After checking the referrers every time, I decided to use regular expressions to specify access-prohibited referrers in “.htaccess” because it is troublesome to add them.

How to specify exclusion

The following description was added to the “.htaccess” file.

RewriteEngine on
RewriteCond %{HTTP_REFERER} bot(|-)traffic(|[0-9]{3}).... [NC]
RewriteRule .* - [F,L]

The RewriteEngine on line is already there, so only RewriteCond and RewriteRule are added.

RewriteCondとRewrireRuleの追加

Parameter Meaning.

RewriteCond

Specify the match condition

%{HTTP_REFERER}

Indicates a referrer

(|-)

A regular expression, representing none or “-” (hyphen).

“|” (pipe) represents an or condition.

“()” (parentheses) represents a group.

I chose this specification because there were both patterns, “bot-traffic” and “bottraffic”.

(|[0-9]{3})

A regular expression representing a none or three-digit number.

[0-9]: number
{3}: 3 digits
This is the case when there are no digits specified, 999, and 143, but if the number of digits changes, it needs to be adjusted.

\.

“\” (backslash) specifies an escape sequence (the subsequent character is not determined as a regular expression character).

“.” (dot) has the meaning of any single character in a regular expression.

In this case, “.” (dot), I want it to be recognized as a regular expression character, so I specify an escape sequence.

This is a regular expression representing any three characters.

I used this specification because there are cases where the top level of the domain is icu and cases where it is xyz.

If the number of patterns does not increase in the future, (icu|xyz) may be acceptable.

[NC]

Not case-sensitive

RewriteRule

Specifying URL conversion rules

.*

Above, I want to target the referrers of all bot traffic matched by RewriteCond, so I specify an arbitrary string (0 to n characters) that matches all of them.

  • “.” (dot): any single character
  • “*” (asterisk): zero or more repetitions of the previous character

^(. *)$, but it basically means the same thing.

  • “^” (caret): the character immediately following is the beginning of the line
  • “$” (dollar): the character immediately before is the end of the line

Specify that no rewrite (URL conversion) should be done.

Since this is bot traffic, there is no need to rewrite the URL, so “-” (hyphen) is specified.

[F]

Specify the access forbidden (403-Forbidden).

Disable access to bot traffic.

[L]

Ignore all rules after that.

I specify this because I don’t need to apply this rule to bot traffic even if I add RewriteCond under this rule in the future.

If you specify [F], it will say “Ignore subsequent rules”, so there is no need to specify L, but I specified it so that you can see it explicitly.

Restarting apache

After modifying and saving the .htaccess, restart apache with the following command.

sudo service apache2 restart

That concludes this article.

Finally.

I hope this article will be useful to someone somewhere.

souichirou kikuchi

I'm Japanese. A reminder to remember what I've done. I'm blogging in the hope that it will be helpful to others who want to do similar things. I mainly write blogs about LEGO, AWS (Amazon Web Services), WordPress, Deep Learning and Raspberry Pi. At work, I'm working on installing collaborative robots and IoT in factories. I passed the JDLA (Japan Deep Learning Association) Deep Learning for GENERAL in July 2019. If you have any questions, please leave them in the comments at the bottom of the article.

comment

Name, Email, and Website are optional.
and, your Email address will not be published.