实用Nginx和Apache环境屏蔽YisouSpider蜘蛛爬虫方法

今天晚上有网友找到老蒋,说自己新搭建的博客还没有多少流量,网站就打开非常慢,需要刷新很久才可以慢慢打开,而且自己使用的是阿里云VPS主机,速度不至于这么慢,于是老蒋帮助看看到底是什么问题。我们先通过TOP命令看看当前VPS的负载情况,果然占用很高的CPU资源,php-fpm基本上占用CPU在60%+,再去看看日志。

实用Nginx和Apache环境屏蔽YisouSpider蜘蛛爬虫方法

我们看到了什么?有很多的YisouSpider爬虫,这个蜘蛛其实也没有多少用途,是来自阿里旗下的一搜视频和神马搜索蜘蛛,占据很小的搜索流量,我们可以屏蔽掉爬虫。这里我们可以参考前天发布的"通过设定.htaccess和nginx.conf禁止恶意User Agent网页爬虫"来解决问题。这里老蒋单独针对YisouSpider爬虫来解决掉我们的困扰。

第一、Nginx环境网站

如果我们采用的是Nginx环境的网站,就相对简单很多,直接在当前站点的.conf文件中添加脚本。

if ($http_user_agent ~* "YisouSpider") {
return 403;
}

替换之后重启Nginx,我们就可以解决这个问题,只要过来,就返回403.

第二、Apache环境网站

这个也很简单,我们可以直接在.htaccess根目录文件中屏蔽掉来访的YisouSpider蜘蛛爬虫的IP地址。

Order allow,deny
Allow from all
Deny from 42.156.139.54

红色的IP地址就是我们需要屏蔽的YisouSpider爬虫IP地址,这里老蒋整理到当前大部分的YisouSpider蜘蛛地址,我们都添加进去。

Deny from 42.156.136.95
Deny from 42.156.137.67
Deny from 42.156.137.62
Deny from 42.156.137.63
Deny from 42.156.137.60
Deny from 42.156.138.113
Deny from 42.120.161.37
Deny from 42.120.161.32
Deny from 42.120.161.65
Deny from 42.156.139.95
Deny from 42.156.136.113
Deny from 42.156.136.44
Deny from 42.156.136.42
Deny from 42.156.136.40
Deny from 42.156.139.51
Deny from 42.156.139.53
Deny from 42.156.139.55
Deny from 42.156.139.54
Deny from 42.156.139.57
Deny from 42.156.138.63
Deny from 42.156.138.62
Deny from 42.156.138.61
Deny from 42.156.138.60
Deny from 42.156.138.67
Deny from 42.120.161.62
Deny from 42.156.138.65
Deny from 42.120.160.67
Deny from 42.120.161.95
Deny from 42.120.160.113
Deny from 42.120.160.63
Deny from 42.120.160.62
Deny from 42.120.160.61
Deny from 42.120.160.60
Deny from 42.156.138.81
Deny from 42.156.138.80
Deny from 42.156.138.83
Deny from 42.156.138.82
Deny from 42.156.136.23
Deny from 42.156.137.53
Deny from 42.156.137.51
Deny from 42.156.137.57
Deny from 42.156.138.104
Deny from 42.156.139.5
Deny from 42.156.139.4
Deny from 42.120.161.23
Deny from 42.120.161.24
Deny from 42.156.136.107
Deny from 42.156.137.61
Deny from 42.156.136.50
Deny from 42.156.136.51
Deny from 42.156.136.53
Deny from 42.156.136.54
Deny from 42.156.136.55
Deny from 42.156.136.57
Deny from 42.156.139.42
Deny from 42.156.138.75
Deny from 42.156.139.40
Deny from 42.156.138.77
Deny from 42.156.138.74
Deny from 42.120.160.74
Deny from 42.120.160.75
Deny from 42.120.160.109
Deny from 42.120.160.77
Deny from 42.156.137.40
Deny from 42.156.138.95
Deny from 42.120.160.16
Deny from 42.120.160.17
Deny from 42.120.160.15
Deny from 42.120.160.13
Deny from 42.120.160.10
Deny from 42.120.160.28
Deny from 42.156.139.77
Deny from 42.120.161.57
Deny from 42.156.139.74
Deny from 42.120.161.51
Deny from 42.120.161.53
Deny from 42.120.161.18
Deny from 42.156.136.83
Deny from 42.156.136.27
Deny from 42.120.160.81
Deny from 42.120.160.80
Deny from 42.120.160.83
Deny from 42.120.160.82
Deny from 42.156.136.28
Deny from 42.120.161.63
Deny from 42.156.137.32
Deny from 42.156.137.37
Deny from 42.120.160.23
Deny from 42.156.136.82
Deny from 42.156.136.81
Deny from 42.156.136.80
Deny from 42.120.160.24
Deny from 42.156.136.4
Deny from 42.156.136.5
Deny from 42.156.136.2
Deny from 42.120.160.9
Deny from 42.120.160.4
Deny from 42.120.160.5
Deny from 42.120.160.2
Deny from 42.156.139.60
Deny from 42.156.139.61
Deny from 42.156.139.62
Deny from 42.156.139.63
Deny from 42.120.161.42
Deny from 42.120.161.40
Deny from 42.156.139.67
Deny from 42.156.138.20
Deny from 42.156.138.16
Deny from 42.156.138.17
Deny from 42.156.138.15
Deny from 42.156.136.37
Deny from 42.120.160.95
Deny from 42.156.136.32
Deny from 42.156.136.31
Deny from 42.156.136.24
Deny from 42.156.137.23
Deny from 42.156.137.20
Deny from 42.120.160.37
Deny from 42.120.160.32
Deny from 42.156.137.28
Deny from 42.156.139.82
Deny from 42.156.139.83
Deny from 42.156.139.80
Deny from 42.156.139.81
Deny from 42.120.161.4
Deny from 42.156.138.107
Deny from 42.156.138.24
Deny from 42.156.138.23
Deny from 42.120.161.74
Deny from 42.120.160.19
Deny from 42.156.138.28
Deny from 42.156.139.15
Deny from 42.156.139.17
Deny from 42.156.139.19
Deny from 42.156.139.18
Deny from 42.120.160.40
Deny from 42.120.160.42
Deny from 42.120.161.20
Deny from 42.156.138.19
Deny from 42.120.161.68
Deny from 42.156.137.2
Deny from 42.120.161.66
Deny from 42.120.161.67
Deny from 42.120.161.60
Deny from 42.120.161.61
Deny from 42.156.137.5
Deny from 42.156.137.4
Deny from 42.156.138.32
Deny from 42.156.138.68
Deny from 42.156.136.18
Deny from 42.156.136.19
Deny from 42.156.138.37
Deny from 42.156.136.15
Deny from 42.156.136.16
Deny from 42.156.136.17
Deny from 42.156.136.13
Deny from 42.156.139.28
Deny from 42.120.160.53
Deny from 42.120.160.50
Deny from 42.120.160.51
Deny from 42.120.161.5
Deny from 42.120.160.57
Deny from 42.120.160.55
Deny from 42.120.161.9
Deny from 42.120.161.19
Deny from 42.156.139.107
Deny from 42.156.139.22
Deny from 42.120.161.15
Deny from 42.120.161.17
Deny from 42.156.137.24
Deny from 42.120.161.83
Deny from 42.156.136.61
Deny from 42.156.136.60
Deny from 42.156.136.63
Deny from 42.156.136.62
Deny from 42.156.136.67
Deny from 42.156.137.107
Deny from 42.156.139.32
Deny from 42.156.139.31
Deny from 42.120.161.81
Deny from 42.156.139.37
Deny from 42.156.138.40
Deny from 42.156.137.19
Deny from 42.156.138.42
Deny from 42.156.137.17
Deny from 42.156.137.15
Deny from 42.156.137.13
Deny from 42.156.137.10
Deny from 42.120.161.55
Deny from 42.120.161.113
Deny from 42.120.161.77
Deny from 42.156.137.74
Deny from 42.156.137.77
Deny from 42.156.137.95
Deny from 42.156.138.2
Deny from 42.156.138.4
Deny from 42.156.138.5
Deny from 42.156.139.113
Deny from 42.120.160.20
Deny from 42.156.137.80
Deny from 42.156.137.81
Deny from 42.156.137.82
Deny from 42.156.137.83
Deny from 42.156.137.113
Deny from 42.156.139.71
Deny from 42.156.139.24
Deny from 42.156.139.20
Deny from 42.156.136.77
Deny from 42.156.136.74
Deny from 42.156.139.23
Deny from 42.120.161.82
Deny from 42.156.138.57
Deny from 42.120.161.80
Deny from 42.156.138.55
Deny from 42.156.138.53
Deny from 42.156.138.50
Deny from 42.156.138.51

这样我们基本上可以解决问题,如果还有漏网之鱼的IP地址,我们再从日志文件中找出来添加进去。如果我们网站采用到CDN加速,这里需要注意到返回的IP地址不是蜘蛛地址,而是CDN的返回地址,我们需要用到非加速状态的获取真实IP地址。

总结,这里老蒋上面分享到Nginx和Apache环境下的,屏蔽YisouSpider爬虫的抓取导致网站访问变慢和卡死的问题。

本文出处:老蒋部落 » 实用Nginx和Apache环境屏蔽YisouSpider蜘蛛爬虫方法 | 欢迎分享( 公众号:老蒋朋友圈 )

公众号 「老蒋朋友圈」获取站长新知 / 加QQ群 【1012423279】获取商家优惠推送