APISIX网关部署使用教程

APISIX是一个具有动态、实时、高性能等特点的云原生 API 网关。你可以使用 APISIX 网关作为所有业务的流量入口,它提供了动态路由、动态上游、动态证书、A/B 测试、灰度发布(金丝雀发布)、蓝绿部署、限速、防攻击、收集指标、监控报警、可观测、服务治理等功能。

安装APISIX

部署apisix最简单的方法就是docker部署,直接使用官方的example配置。

git clone https://github.com/apache/apisix-docker.git
cd apisix-docker/example

example下的docker-compose.yml和docker-compose-arm64.yml分别对应的是X86和ARM服务器。

查看yml文件会发现,会启动了多个docker服务:

  • apisix:主程序,api网关;
  • etcd:APISIX 使用 etcd 作为配置中心进行保存和同步配置;
  • prometheus:开源的监控程序;
  • grafana:开源的看板程序;

web1和web2是测试程序,可以直接去掉。

另外,如果我们要使用apisix的UI控制面板,可以在docker-compose.yml中再加一个apisix-dashboard的docker服务:

  apisix-dashboard:
    image: apache/apisix-dashboard:3.0.1-alpine
    restart: always
    volumes:
    - ./dashboard_conf/conf.yaml:/usr/local/apisix-dashboard/conf/conf.yaml
    ports:
    - "9000:9000"
    networks:
      apisix:

最终的docker-compose.yml文件为:

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

version: "3"

services:
  apisix-dashboard:
    image: apache/apisix-dashboard:3.0.1-alpine
    restart: always
    volumes:
    - ./dashboard_conf/conf.yaml:/usr/local/apisix-dashboard/conf/conf.yaml
    ports:
    - "9000:9000"
    networks:
      apisix:

  apisix:
    image: apache/apisix:${APISIX_IMAGE_TAG:-3.6.0-debian}
    restart: always
    volumes:
      - ./apisix_conf/config.yaml:/usr/local/apisix/conf/config.yaml:ro
    depends_on:
      - etcd
    ##network_mode: host
    ports:
      - "9180:9180/tcp"
      - "9080:9080/tcp"
      - "9091:9091/tcp"
      - "9443:9443/tcp"
      - "9092:9092/tcp"
    networks:
      apisix:

  etcd:
    image: bitnami/etcd:3.4.15
    restart: always
    volumes:
      - etcd_data:/bitnami/etcd
    environment:
      ETCD_ENABLE_V2: "true"
      ALLOW_NONE_AUTHENTICATION: "yes"
      ETCD_ADVERTISE_CLIENT_URLS: "http://etcd:2379"
      ETCD_LISTEN_CLIENT_URLS: "http://0.0.0.0:2379"
    ports:
      - "2379:2379/tcp"
    networks:
      apisix:

  prometheus:
    image: prom/prometheus:v2.25.0
    restart: always
    volumes:
      - ./prometheus_conf/prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090"
    networks:
      apisix:

  grafana:
    image: grafana/grafana:7.3.7
    restart: always
    ports:
      - "3000:3000"
    volumes:
      - "./grafana_conf/provisioning:/etc/grafana/provisioning"
      - "./grafana_conf/dashboards:/var/lib/grafana/dashboards"
      - "./grafana_conf/config/grafana.ini:/etc/grafana/grafana.ini"
    networks:
      apisix:

networks:
  apisix:
    driver: bridge

volumes:
  etcd_data:
    driver: local

通过修改本地example/dashboard_conf/conf.yaml 文件,来修改控制面板的配置。需要修改users中的用户名和密码,用于登录控制面板。

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

conf:
  listen:
    host: 0.0.0.0     # `manager api` listening ip or host name
    port: 9000          # `manager api` listening port
  allow_list:           # If we don't set any IP list, then any IP access is allowed by default.
    - 0.0.0.0/0
  etcd:
    endpoints:          # supports defining multiple etcd host addresses for an etcd cluster
      - "http://etcd:2379"
                          # yamllint disable rule:comments-indentation
                          # etcd basic auth info
    # username: "root"    # ignore etcd username if not enable etcd auth
    # password: "123456"  # ignore etcd password if not enable etcd auth
    mtls:
      key_file: ""          # Path of your self-signed client side key
      cert_file: ""         # Path of your self-signed client side cert
      ca_file: ""           # Path of your self-signed ca cert, the CA is used to sign callers' certificates
    # prefix: /apisix     # apisix config's prefix in etcd, /apisix by default
  log:
    error_log:
      level: warn       # supports levels, lower to higher: debug, info, warn, error, panic, fatal
      file_path:
        logs/error.log  # supports relative path, absolute path, standard output
                        # such as: logs/error.log, /tmp/logs/error.log, /dev/stdout, /dev/stderr
    access_log:
      file_path:
        logs/access.log  # supports relative path, absolute path, standard output
                         # such as: logs/access.log, /tmp/logs/access.log, /dev/stdout, /dev/stderr
                         # log example: 2020-12-09T16:38:09.039+0800	INFO	filter/logging.go:46	/apisix/admin/routes/r1	{"status": 401, "host": "127.0.0.1:9000", "query": "asdfsafd=adf&a=a", "requestId": "3d50ecb8-758c-46d1-af5b-cd9d1c820156", "latency": 0, "remoteIP": "127.0.0.1", "method": "PUT", "errs": []}
  security:
      # access_control_allow_origin: "http://httpbin.org"
      # access_control_allow_credentials: true          # support using custom cors configration
      # access_control_allow_headers: "Authorization"
      # access_control-allow_methods: "*"
      # x_frame_options: "deny"
      content_security_policy: "default-src 'self'; script-src 'self' 'unsafe-eval' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; frame-src *"  # You can set frame-src to provide content for your grafana panel.

authentication:
  secret:
    secret              # secret for jwt token generation.
                        # NOTE: Highly recommended to modify this value to protect `manager api`.
                        # if it's default value, when `manager api` start, it will generate a random string to replace it.
  expire_time: 3600     # jwt token expire time, in second
  users:                # yamllint enable rule:comments-indentation
    - username: username   # username and password for login `manager api`
      password: password

plugins:                          # plugin list (sorted in alphabetical order)
  - api-breaker
  - authz-keycloak
  - basic-auth
  - batch-requests
  - consumer-restriction
  - cors
  # - dubbo-proxy
  - echo
  # - error-log-logger
  # - example-plugin
  - fault-injection
  - grpc-transcode
  - hmac-auth
  - http-logger
  - ip-restriction
  - jwt-auth
  - kafka-logger
  - key-auth
  - limit-conn
  - limit-count
  - limit-req
  # - log-rotate
  # - node-status
  - openid-connect
  - prometheus
  - proxy-cache
  - proxy-mirror
  - proxy-rewrite
  - redirect
  - referer-restriction
  - request-id
  - request-validation
  - response-rewrite
  - serverless-post-function
  - serverless-pre-function
  # - skywalking
  - sls-logger
  - syslog
  - tcp-logger
  - udp-logger
  - uri-blocker
  - wolf-rbac
  - zipkin
  - server-info
  - traffic-split

运行docker compose启动docker:

docker-compose -p docker-apisix up -d

配置APISIX

通过修改本地 ./conf/config.yaml 文件,完成对 APISIX 服务本身的基本配置。建议修改 Admin API 的key,保护 APISIX 的安全。同时可以限制IP访问API,增加安全性:

deployment:
  admin:
    allow_admin:               # https://nginx.org/en/docs/http/ngx_http_access_module.html#allow
      - 127.0.0.0/24              # We need to restrict ip access rules for security. 0.0.0.0/0 is for test.
      - 本机IP

    admin_key:
      - name: "admin"
        key: newpassword
        role: admin                 # admin: manage all configuration data

上述限制IP的策略,主要是考虑apisix-dashboard控制面板来访问api,其它应用场合根据实际情况修改。

使用APISIX

访问http://ip:9000,用上述用户名和密码就可以登录API控制面板了。具体使用过程比较简单,这里不详述,只说下在使用容易踩坑的地方。

apisix最常用的就是对访问进行限速控制,主要涉及到三个插件:

  • limit-conn:并发请求数
  • limit-count:指定时间窗口内的请求数量
  • limit-req:请求速率

而限速又涉及到访客IP,只有获取到用户的真实IP才能有效地限速,所以这里就又涉及到real ip插件。

  1. APISIX官方文档中,说real IP插件需要重新编辑APISIX-Base,但实际上,现在新版本的apisix docker镜像,都已经是运行在APISIX-Base上的。
  2. APISIX传递real IP给上游,需要自行配置,plugin字段中添加如下代码:
    "real-ip": {
      "source": "http_x_forwarded_for",
      "trusted_addresses": [
        "0.0.0.0/0"
      ]
    }

source必须为http_x_forwarded_for才有效,使用官方文档中的$remote_addr、$remote_port测试都无效,这是我踩过很久的坑。

trusted_addresses为信任的回源IP,可以根据实际情况修改。比如,如果使用nginx反代了apisix网关,则trust IP是反代服务器IP:

    "real-ip": {
      "source": "http_x_forwarded_for",
      "trusted_addresses": [
        "反代服务器IP"
      ]
    }

3. 在上述3个插件的配中,key的值必须为remote_addr,才能获取到访客真实IP,使用官方文档中的http_x_real_ip、http_x_forwarded_for都无效,也不知道是哪里的问题。

4. 如果你的APISIX前面还运行了一层nginx,或者是还套了一层cloudflare(访客访问域名 – cloudflare CDN – nginx – APISIX),那么就需要递归传递用户的真实IP,特别注意的是因为apisix运行在docker,所以set_real_ip_from需要添加docker内网IP段:

docker inspect 容器id
set_real_ip_from 172.18.0.0/24

可以参考前面的文章:https://www.168itw.com/web-server/cloudflare-cdn-real-ip/
也可以参考:https://github.com/apache/apisix/discussions/4793

其它更多功能还没有测试,后期继续补充完善该教程。

FFmpeg循环推流脚本

转载过来备用,原文链接见文章结尾。

#!/bin/bash
PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:~/bin
export PATH
#=================================================================#
#   System Required: CentOS7 X86_64                               #
#   Description: FFmpeg Stream Media Server                       #
#   Author: LALA                                    #
#   Website: https://www.lala.im                                  #
#=================================================================#

# 颜色选择
red='\033[0;31m'
green='\033[0;32m'
yellow='\033[0;33m'
font="\033[0m"

ffmpeg_install(){
# 安装FFMPEG
read -p "你的机器内是否已经安装过FFmpeg4.x?安装FFmpeg才能正常推流,是否现在安装FFmpeg?(yes/no):" Choose
if [ $Choose = "yes" ];then
	yum -y install wget
	wget --no-check-certificate https://www.johnvansickle.com/ffmpeg/old-releases/ffmpeg-4.0.3-64bit-static.tar.xz
	tar -xJf ffmpeg-4.0.3-64bit-static.tar.xz
	cd ffmpeg-4.0.3-64bit-static
	mv ffmpeg /usr/bin && mv ffprobe /usr/bin && mv qt-faststart /usr/bin && mv ffmpeg-10bit /usr/bin
fi
if [ $Choose = "no" ]
then
    echo -e "${yellow} 你选择不安装FFmpeg,请确定你的机器内已经自行安装过FFmpeg,否则程序无法正常工作! ${font}"
    sleep 2
fi
	}

stream_start(){
# 定义推流地址和推流码
read -p "输入你的推流地址和推流码(rtmp协议):" rtmp

# 判断用户输入的地址是否合法
if [[ $rtmp =~ "rtmp://" ]];then
	echo -e "${green} 推流地址输入正确,程序将进行下一步操作. ${font}"
  	sleep 2
	else  
  	echo -e "${red} 你输入的地址不合法,请重新运行程序并输入! ${font}"
  	exit 1
fi 

# 定义视频存放目录
read -p "输入你的视频存放目录 (格式仅支持mp4,并且要绝对路径,例如/opt/video):" folder

# 判断是否需要添加水印
read -p "是否需要为视频添加水印?水印位置默认在右上方,需要较好CPU支持(yes/no):" watermark
if [ $watermark = "yes" ];then
	read -p "输入你的水印图片存放绝对路径,例如/opt/image/watermark.jpg (格式支持jpg/png/bmp):" image
	echo -e "${yellow} 添加水印完成,程序将开始推流. ${font}"
	# 循环
	while true
	do
		cd $folder
		for video in $(ls *.mp4)
		do
		ffmpeg -re -i "$video" -i "$image" -filter_complex overlay=W-w-5:5 -c:v libx264 -c:a aac -b:a 192k -strict -2 -f flv ${rtmp}
		done
	done
fi
if [ $watermark = "no" ]
then
    echo -e "${yellow} 你选择不添加水印,程序将开始推流. ${font}"
    # 循环
	while true
	do
		cd $folder
		for video in $(ls *.mp4)
		do
		ffmpeg -re -i "$video" -c:v copy -c:a aac -b:a 192k -strict -2 -f flv ${rtmp}
		done
	done
fi
	}

# 停止推流
stream_stop(){
	screen -S stream -X quit
	killall ffmpeg
	}

# 开始菜单设置
echo -e "${yellow} CentOS7 X86_64 FFmpeg无人值守循环推流 For LALA.IM ${font}"
echo -e "${red} 请确定此脚本目前是在screen窗口内运行的! ${font}"
echo -e "${green} 1.安装FFmpeg (机器要安装FFmpeg才能正常推流) ${font}"
echo -e "${green} 2.开始无人值守循环推流 ${font}"
echo -e "${green} 3.停止推流 ${font}"
start_menu(){
    read -p "请输入数字(1-3),选择你要进行的操作:" num
    case "$num" in
        1)
        ffmpeg_install
        ;;
        2)
        stream_start
        ;;
        3)
        stream_stop
        ;;
        *)
        echo -e "${red} 请输入正确的数字 (1-3) ${font}"
        ;;
    esac
	}

# 运行开始菜单
start_menu
继续阅读

Prometheus+Grafana运维级监控部署

在之前的网站uptime监控中提到过Prometheus+Grafana监控搭配,抽空自己部署了一遍,功能强大,达到了运维级。

Prometheus是主控程序,是监控的Server端。

Node exporter是client端,部署在被监控vps上,负责向Prometheus传输数据。

Grafana是开源的看板/面板程序,提供可视化图形界面。

三者相结合,提供了一套完整的运维级监控系统。如果需要监控警报,还需要加入Alertmanager。

Prometheus、Node exporter、Altermanager开源地址:

https://github.com/prometheus

Grafana开源地址:https://github.com/grafana/grafana

下面记录下部署过程。

1、docker部署Prometheus

选择用docker部署,简单方便。注意映射9090端口,及prometheus.yml配置文件目录。我这里以/opt/config目录为例:

-v /opt/config:/etc/prometheus

将宿主机/opt/config目录,挂载到docker的/etc/prometheus目录。

如果是二进制安装,步骤为:

tar zxf prometheus-2.25.0.linux-amd64.tar.gz -C /opt
mv /opt/prometheus-2.25.0.linux-amd64 /opt/prometheus
vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=prometheus service
 
[Service]
User=root
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data
 
TimeoutStopSec=10
Restart=on-failure
RestartSec=5
 
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus

每次更改了Prometheus配置文件,都需要重启:

systemctl restart prometheus

2、配置prometheus.yml文件

以下为最简单的默认配置文件

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    monitor: 'codelab-monitor'

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s

    static_configs:
      - targets: ['localhost:9100']
        labels:
          instance: prometheus

添加配置文件之后,docker程序才能正常启动。

此时访问http://ip:9090端口,访问Prometheus控制界面。目前界面很简陋,所以需要继续部署grafana。

3、部署Grafana面板程序

以debian为例:

sudo apt-get install -y apt-transport-https
sudo apt-get install -y software-properties-common wget
sudo wget -q -O /usr/share/keyrings/grafana.key https://apt.grafana.com/gpg.key

添加源:

echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

安装grafana:

sudo apt-get update
sudo apt-get install grafana

启动Grafana:

sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl status grafana-server

配置随机启动:

sudo systemctl enable grafana-server.service

浏览器访问IP:3000端口,即可打开grafana页面,默认用户名密码都是admin,初次登录会要求修改默认的登录密码。

4、为Grafana添加Prometheus数据源

点击主界面的“Add your first data source”并选择Prometheus,import,保存即可。

当然你还可以添加更多目标,看到更多详细的数据,Grafana官方也提供了很多dashboard的模板:

https://grafana.com/grafana/dashboards/

推荐排名第一的Node Exporter Full。安装模板也很简单,直接复制模板ID(Copy ID to clipboard)

Grafana后台 – Dashboards – Browse – New – Import – 粘贴ID – Load – 并选择数据源为Prometheus。搞定

5、部署Node exporter

在需要被监控的vps上部署node exporter,当然也可以在Prometheus所在vps上部署,监控本机。

选择下载对应版本并解压:

wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar xvfz node_exporter-1.5.0.linux-amd64.tar.gz
mv /root/node_exporter-1.5.0.linux-amd64 /opt/node_exporter

配置开机启动:

vi /usr/lib/systemd/system/node_exporter.service
[Unit]
Description=node_exporter service
 
[Service]
User=root
ExecStart=/opt/node_exporter/node_exporter
 
TimeoutStopSec=10
Restart=on-failure
RestartSec=5
 
[Install]
WantedBy=multi-user.target

如果报错,也可以 vi /etc/systemd/system/node_exporter.service

关于Linux service配置文件的说明,请参考:https://wizardforcel.gitbooks.io/vbird-linux-basic-4e/content/150.html

systemctl daemon-reload
systemctl enable node_exporter

启动服务:

systemctl start node_exporter
systemctl status node_exporter

node exporter默认监控端口是9100,访问ip:9100可以查看node exporter。

注意:在vps面板防火墙中放行9100端口,同时如果vps安装了宝塔等程序,也需要在”安全“中放行9100端口。

iptables也要放行9100:

iptables -I INPUT -s 0.0.0.0/0 -p tcp --dport 9100 -j ACCEPT

6、Node exporter与Prometheus建立连接,传送数据。

编辑Prometheus的prometheus.yml文件,添加以下代码,注意对齐:

  - job_name: 'Linux'
    static_configs:
      - targets: ['xx.xx.xx.xx:9100']
        labels:
          instance: 168itw.com

重启docker Prometheus。

继续阅读

Debian/Ubuntu安装ffmpeg和streamlink

首先update系统。

安装ffmpeg:

apt-get install ffmpeg

安装streamlink

1、安装pip3

apt-get install python3-pip

2、通过pip安装最新版streamlink

pip3 install --user --upgrade streamlink

3、添加环境变量。

如果是root用户安装,在/root/.bashrc中添加以下代码;

如果是Ubuntu安装,在/home/ubuntu/.bashrc中添加以下代码;

export PATH="${HOME}/.local/bin:${PATH}"

4、重连ssh,查看streamlink是否安装成功:

streamlink -V

常用的免费开源SVG icon图标库

收集整理了一些常用的免费的svg icon图标库,可以直接在自己网站上引用。

1、Font Awesome

网站:https://fontawesome.com/

应该是引用最多的图标库了吧,有免费的,也有收费的。

2、Google Material Icons

网站:http://google.github.io/material-design-icons/

Google官方出品。

3、iconfont

网站:https://www.iconfont.cn/collections/index

阿里出品。

4、flaticon

网站:https://www.flaticon.com/uicons

5、feather icons

地址:https://github.com/feathericons/feather

6、iconoir

地址:https://github.com/lucaburgio/iconoir

7、ionicons

网站:https://ionic.io/ionicons

注意ionicons的用法跟上面6个有点不同。

mysql数据库多线程LAST_INSERT_ID()

msql多线程的时候用LAST_INSERT_ID()比max(id)高效且不会被其他线程(connection)打断。

INSERT INTO wp_posts (post_title,post_content,post_author,post_name,post_date,post_excerpt,to_ping,pinged,post_content_filtered) VALUES ('[标签:标题]','[标签:内容]','1','[标签:slug]','[系统时间转化:yyyy-MM-dd HH:mm:ss]','','','','')
INSERT INTO wp_term_relationships (object_id,term_taxonomy_id) VALUES (LAST_INSERT_ID(),1)

但是LAST_INSERT_ID函数有几点必须注意,直接复制过来的:

  • If the most recent INSERT or UPDATE statement set more than one AUTO_INCREMENT value, the LAST_INSERT_ID function returns only the first AUTO_INCREMENT value.
  • The LAST_INSERT_ID function returns the last AUTO_INCREMENT value on a client-by-client basis, so it will only return the last AUTO_INCREMENT value for your client. The value can not be affected by other clients.
  • Executing the LAST_INSERT_ID function does not affect the value that LAST_INSERT_ID returns.

个人总结的几个注意事项:

1、如果一次insert插入多条value值,LAST_INSERT_ID()获取的是第一条value记录的自增ID;

2、如果是多条insert,插入同一个table表,LAST_INSERT_ID()获取的是最后一条insert记录的自增ID;

3、如果insert的表发生了变化,LAST_INSERT_ID()获取的自增ID也会变化,它返回的是最后一条insert记录的自增ID。

继续阅读

多语言网站的hreflang标签

在网页头部head中添加以下代码:

<link rel="alternate" hreflang="zh-CN" href="https://www168itw.com/seo/hreflang/" />

需要注意的是,不同语言的页面,一定要双向互相指向对方,否则hreflang不起作用,Google会忽略掉。

所以你需要在网页的header中列出所有的语言版本及链接。

还可以指定网页的默认语言版本:

<link rel="alternate" href="https://example.com/" hreflang="x-default" />

关于简体中文和繁体中文,也可以用zh-Hanszh-Hant

结合上文wordpress输出文章别名的方法,可以给wordpress站点设置多语言。

Google关于hreflang多语言的详细用法:

https://developers.google.com/search/docs/advanced/crawling/localized-versions

https://www.w3schools.com/tags/ref_language_codes.asp

C#常用的正则表达式

基本符号

^ 表示匹配字符串的开始位置 (例外 用在中括号中[ ] 时,可以理解为取反,表示不匹配括号中字符串)
$ 表示匹配字符串的结束位置

*表示匹配 零次到多次

+表示匹配 一次到多次 (至少有一次)
? 表示匹配零次或一次
. 表示匹配单个字符
| 表示为或者,两项中取一项
( ) 小括号表示匹配括号中全部字符
[ ] 中括号表示匹配括号中一个字符 范围描述 如[0-9 a-z A-Z]
{ } 大括号用于限定匹配次数 如 {n}表示匹配n个字符 {n,}表示至少匹配n个字符 {n,m}表示至少n,最多m
\ 转义字符 如上基本符号匹配都需要转义字符 如 * 表示匹配*号
\w 表示英文字母和数字 \W 非字母和数字
\d 表示数字 \D 非数字

常用的正则表达式

匹配中文字符的正则表达式: [\u4e00-\u9fa5]

匹配双字节字符(包括汉字在内):[^\x00-\xff]

匹配空行的正则表达式:\n[\s| ]*\r

匹配HTML标记的正则表达式:/<(.)>.<\/\1>|<(.*) \/>/

匹配首尾空格的正则表达式:(^\s)|(\s$)

匹配IP地址的正则表达式:/(\d+).(\d+).(\d+).(\d+)/g //

匹配Email地址的正则表达式:\w+([-+.]\w+)@\w+([-.]\w+).\w+([-.]\w+)*

匹配网址URL的正则表达式:http://(/[\w-]+.)+[\w-]+(/[\w- ./?%&=]*)?

sql语句:^(select|drop|delete|create|update|insert).*$

1、非负整数:^\d+$

2、正整数:^[0-9][1-9][0-9]$

3、非正整数:^((-\d+)|(0+))$

4、负整数:^-[0-9][1-9][0-9]$

5、整数:^-?\d+$

6、非负浮点数:^\d+(.\d+)?$

7、正浮点数:^((0-9)+.[0-9][1-9][0-9])|([0-9][1-9][0-9].[0-9]+)|([0-9][1-9][0-9]))$

8、非正浮点数:^((-\d+.\d+)?)|(0+(.0+)?))$

9、负浮点数:^(-((正浮点数正则式)))$

10、英文字符串:^[A-Za-z]+$

11、英文大写串:^[A-Z]+$

12、英文小写串:^[a-z]+$

13、英文字符数字串:^[A-Za-z0-9]+$

14、英数字加下划线串:^\w+$

15、E-mail地址:^[\w-]+(.[\w-]+)*@[\w-]+(.[\w-]+)+$

16、URL:^[a-zA-Z]+://(\w+(-\w+))(.(\w+(-\w+)))(\?\s)?$
或:^http:\/\/[A-Za-z0-9]+.[A-Za-z0-9]+[\/=\?%-&_~`@[]\’:+!]([^<>\”\”])$

17、邮政编码:^[1-9]\d{5}$

18、中文:^[\u0391-\uFFE5]+$

19、电话号码:^((\d2,3)|(\d{3}-))?(0\d2,3|0\d{2,3}-)?[1-9]\d{6,7}(-\d{1,4})?$

20、手机号码:^((\d2,3)|(\d{3}-))?13\d{9}$

21、双字节字符(包括汉字在内):^\x00-\xff

22、匹配首尾空格:(^\s)|(\s$)(像vbscript那样的trim函数)

23、匹配HTML标记:<(.)>.<\/\1>|<(.*) \/>

24、匹配空行:\n[\s| ]*\r

25、提取信息中的网络链接:(h|H)(r|R)(e|E)(f|F) *= *(‘|”)?(\w|\|\/|.)+(‘|”| *|>)?

26、提取信息中的邮件地址:\w+([-+.]\w+)@\w+([-.]\w+).\w+([-.]\w+)*

27、提取信息中的图片链接:(s|S)(r|R)(c|C) *= *(‘|”)?(\w|\|\/|.)+(‘|”| *|>)?

28、提取信息中的IP地址:(\d+).(\d+).(\d+).(\d+)

29、提取信息中的中国手机号码:(86)013\d{9}

30、提取信息中的中国固定电话号码:(\d3,4|\d{3,4}-|\s)?\d{8}

31、提取信息中的中国电话号码(包括移动和固定电话):(\d3,4|\d{3,4}-|\s)?\d{7,14}

32、提取信息中的中国邮政编码:[1-9]{1}(\d+){5}

33、提取信息中的浮点数(即小数):(-?\d*).?\d+

34、提取信息中的任何数字 :(-?\d*)(.\d+)?

35、IP:(\d+).(\d+).(\d+).(\d+)

36、电话区号:/^0\d{2,3}$/

37、腾讯QQ号:^[1-9][1-9][0-9]$

38、帐号(字母开头,允许5-16字节,允许字母数字下划线):^[a-zA-Z][a-zA-Z0-9_]{4,15}$

39、中文、英文、数字及下划线:^[\u4e00-\u9fa5_a-zA-Z0-9]+$

cloudflare防火墙规则

1、Allow:Good Bot & Whitelist

(cf.client.bot) or (ip.src in {132.x.x.x 43.x.x.x}) or (http.user_agent contains "YisouSpider") or (http.request.uri.path contains "/.well-known/acme-challenge")

要把自己本机服务器IP加到白名单,否则类似于wordpress定时任务(证书续签等)也有可能被下面规则2中的IDC黑名单给block掉。

2022.5.7 update:该条规则会把类似于Semrush、AhrefsBot这种在cloudflare白名单的robots给放行了,因为规则1优先级最高,所以即使在后面的规则中屏蔽了他们,他们还是会allow,所以最好是再通过robots.txt屏蔽这种白名单中的robots。Cloudflare Verified Bots列表:https://radar.cloudflare.com/verified-bots

2022.7.30 update:将路径/.well-known/acme-challenge加入白名单,否则Let’s Encrypt证书续期验证会被拦截。

2、JS Challenge:IDC ASN

(ip.geoip.asnum in {174 195 209 577 792 793 794 1215 1216 1217 2497 2914 3223 3255 3326 3329 3457 3598 4184 4190 4637 4694 4755 4785 4788 4816 4826 4835 5056 5610 5617 6471 6584 6830 6876 6877 6939 7029 7224 7489 7552 7684 8068 8069 8070 8071 8074 8075 8100 8220 8560 8881 8987 9009 9299 9312 9370 9678 9952 9984 10026 10453 11351 11426 11691 12076 12271 12334 12367 12874 12876 12989 14061 14117 14140 14576 14618 15169 16276 16509 16591 16629 17043 17428 17707 17788 17789 17790 17791 18013 18228 18403 18450 18599 18734 18978 19527 19740 20207 20473 20552 20554 20860 21704 21769 21859 21887 22773 22884 23468 23724 23885 23959 23969 24088 24192 24424 24429 24940 25429 25697 25820 25935 25961 26160 26496 26818 27715 28429 28431 28438 28725 29066 29286 29287 29802 30083 30823 31122 31235 31400 31898 32097 32098 32505 32613 34081 34248 34549 34947 35070 35212 35320 35540 35593 35804 35816 35908 35916 36351 36352 36384 36385 36444 36492 36806 37963 37969 38001 38197 38283 38365 38538 38587 38588 38627 39284 40065 40676 40788 41009 41096 41264 41378 42652 42905 43289 43624 43989 45011 45012 45062 45076 45085 45090 45102 45102 45102 45103 45104 45139 45458 45566 45576 45629 45753 45899 45932 46484 46844 47232 47285 47927 48024 48337 48905 49327 49588 49981 50297 50340 50837 51852 52000 52228 52341 54463 54538 54574 54600 54854 54994 55158 55330 55720 55799 55924 55933 55960 55967 55990 55992 56005 56011 56109 56222 57613 58073 58199 58461 58466 58519 58543 58563 58593 58772 58773 58774 58775 58776 58844 58854 58862 58879 59019 59028 59048 59050 59051 59052 59053 59054 59055 59067 59077 59374 60068 60592 60631 60798 61154 61317 61348 61577 61853 62044 62240 62468 62785 62904 63018 63023 63075 63288 63314 63545 63612 63620 63631 63655 63677 63678 63679 63727 63728 63729 63835 63838 63888 63916 63949 64050 131090 131106 131138 131139 131140 131141 131293 131428 131444 131477 131486 131495 132196 132203 132509 132510 132513 132591 132839 133024 133199 133380 133478 133492 133746 133752 133774 133775 133776 133905 133929 134238 134327 134760 134761 134763 134764 134769 134770 134771 134835 134963 135061 135290 135300 135330 135377 135629 137693 137697 137699 137753 137784 137785 137787 137788 137876 137969 138366 138407 138607 138915 138949 138950 138952 138982 138994 139007 139018 139124 139144 139201 139203 139220 139316 139327 139726 139887 140096 140596 140701 140716 140717 140720 140723 140979 141157 141180 142570 149167 177453 177549 197099 197540 198047 198651 199490 199506 199524 199883 200756 201094 201978 202053 202675 203087 204601 204720 206092 206204 206791 206798 207319 207400 207590 208425 208556 211914 212708 213251 213375 262187 263022 263196 263639 263693 264344 264509 265443 265537 266706 267784 269939 270110 328608 394699 395003 395936 395954 395973 398101 63951 45050 209366 40021 47583 46606 211252 51167 202054 35913 35425 33182 141004 135917 141995 21499 34989 24806 48282 58061 46475 34665 396982 55020 53667 140224 136907 398478 13335 132892 202422 398324})

注意:13335,132892是cloudflare 自己的asn,将其加入防火墙是为了拦截套了cloudflare warp的IP。

3、Managed Challenge:Bad Crawler

(http.user_agent contains "80legs") or (http.user_agent contains "Abonti") or (http.user_agent contains "admantx") or (http.user_agent contains "aipbot") or (http.user_agent contains "AllSubmitter") or (http.user_agent contains "Backlink") or (http.user_agent contains "backlink") or (http.user_agent contains "Badass") or (http.user_agent contains "Bigfoot") or (http.user_agent contains "blexbot") or (http.user_agent contains "Buddy") or (http.user_agent contains "CherryPicker") or (http.user_agent contains "cloudsystemnetwork") or (http.user_agent contains "cognitiveseo") or (http.user_agent contains "Collector") or (http.user_agent contains "cosmos") or (http.user_agent contains "CrazyWebCrawler") or (http.user_agent contains "Crescent") or (http.user_agent contains "Devil") or (http.user_agent contains "domain" and http.user_agent contains "spider") or (http.user_agent contains "domain" and http.user_agent contains "stat") or (http.user_agent contains "domain" and http.user_agent contains "Appender") or (http.user_agent contains "domain" and http.user_agent contains "Crawler") or (http.user_agent contains "DittoSpyder") or (http.user_agent contains "Konqueror") or (http.user_agent contains "Easou") or (http.user_agent contains "Yisou") or (http.user_agent contains "Etao") or (http.user_agent contains "mail" and http.user_agent contains "olf") or (http.user_agent contains "mail" and http.user_agent contains "spider") or (http.user_agent contains "exabot.com") or (http.user_agent contains "getintent") or (http.user_agent contains "Grabber") or (http.user_agent contains "GrabNet") or (http.user_agent contains "HEADMasterSEO") or (http.user_agent contains "heritrix") or (http.user_agent contains "htmlparser") or (http.user_agent contains "hubspot") or (http.user_agent contains "Jyxobot") or (http.user_agent contains "kraken") or (http.user_agent contains "larbin") or (http.user_agent contains "ltx71") or (http.user_agent contains "leiki") or (http.user_agent contains "LinkScan") or (http.user_agent contains "Magnet") or (http.user_agent contains "Mag-Net") or (http.user_agent contains "Mechanize") or (http.user_agent contains "MegaIndex") or (http.user_agent contains "Metasearch") or (http.user_agent contains "MJ12bot") or (http.user_agent contains "moz.com") or (http.user_agent contains "Navroad") or (http.user_agent contains "Netcraft") or (http.user_agent contains "niki-bot") or (http.user_agent contains "NimbleCrawler") or (http.user_agent contains "Nimbostratus") or (http.user_agent contains "Ninja") or (http.user_agent contains "Openfind") or (http.user_agent contains "Page" and http.user_agent contains "Analyzer") or (http.user_agent contains "Pixray") or (http.user_agent contains "probethenet") or (http.user_agent contains "proximic") or (http.user_agent contains "psbot") or (http.user_agent contains "RankActive") or (http.user_agent contains "RankingBot") or (http.user_agent contains "RankurBot") or (http.user_agent contains "Reaper") or (http.user_agent contains "SalesIntelligent") or (http.user_agent contains "Semrush") or (http.user_agent contains "SEOkicks") or (http.user_agent contains "spbot") or (http.user_agent contains "SEOstats") or (http.user_agent contains "Snapbot") or (http.user_agent contains "Stripper") or (http.user_agent contains "Siteimprove") or (http.user_agent contains "sitesell") or (http.user_agent contains "Siphon") or (http.user_agent contains "Sucker") or (http.user_agent contains "TenFourFox") or (http.user_agent contains "TurnitinBot") or (http.user_agent contains "trendiction") or (http.user_agent contains "twingly") or (http.user_agent contains "VidibleScraper") or (http.user_agent contains "WebLeacher") or (http.user_agent contains "WebmasterWorldForum") or (http.user_agent contains "webmeup") or (http.user_agent contains "Webster") or (http.user_agent contains "Widow") or (http.user_agent contains "Xaldon") or (http.user_agent contains "Xenu") or (http.user_agent contains "xtractor") or (http.user_agent contains "Zermelo")

4、Managed Challenge:Basic Crawler & Bad Crawler 2

(http.user_agent contains "Scrapy") or (http.user_agent contains "fuck") or (http.user_agent contains "lient" and http.user_agent contains "ttp") or (http.user_agent contains "java") or (http.user_agent contains "Joomla") or (http.user_agent contains "libweb") or (http.user_agent contains "libwww") or (http.user_agent contains "PHPCrawl") or (http.user_agent contains "PyCurl") or (http.user_agent contains "python") or (http.user_agent contains "wrk") or (http.user_agent contains "hey/") or (http.user_agent contains "Acunetix") or (http.user_agent contains "apache") or (http.user_agent contains "BackDoorBot") or (http.user_agent contains "cobion") or (http.user_agent contains "masscan") or (http.user_agent contains "FHscan") or (http.user_agent contains "scanbot") or (http.user_agent contains "Gscan") or (http.user_agent contains "Researchscan") or (http.user_agent contains "WPScan") or (http.user_agent contains "ScanAlert") or (http.user_agent contains "Wprecon") or (http.user_agent contains "virusdie") or (http.user_agent contains "VoidEYE") or (http.user_agent contains "WebShag") or (http.user_agent contains "Zeus") or (http.user_agent contains "zgrab") or (http.user_agent contains "zmap") or (http.user_agent contains "nmap") or (http.user_agent contains "fimap") or (http.user_agent contains "ZmEu") or (http.user_agent contains "ZumBot") or (http.user_agent contains "Zyborg") or (http.user_agent contains "attachment") or (http.user_agent eq "undefined") or (http.user_agent eq "") or (http.user_agent contains "Octopus") or (http.user_agent contains "AhrefsBot") or (http.user_agent contains "BLEXBot") or (http.user_agent contains "DataForSeoBot") or (http.user_agent contains "Adsbot") or (http.user_agent contains "Barkrowler") or (http.user_agent contains "Nmap Scripting Engine") or (http.user_agent contains "VelenPublicWebCrawler") or (http.user_agent contains "axios")

5、Path Check

对wordpress网站适用,后台登录地址和xmlrpc.php经常被扫,所以加入验证。

翻页加入验证,防止采集。

搜索/?s=加入验证,防止被刷。

(http.request.uri contains "/wp-login.php") or (http.request.uri contains "/xmlrpc.php") or (http.request.uri.path contains "/page/") or (http.request.uri contains "/?s=")
继续阅读