Monthly Archive for December, 2012

route, nat, iptables 网路,路由和防火墙

#GFW促进学习系列

Route

查看路由表:

#linux:
$route -n
$ip route list

#Windows:
>route PRINT

一个典型的LAN里机器的路由表:(区域网网段是192.168.0.1/24)

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1   0.0.0.0         UG    0      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth0
192.168.0.0   0.0.0.0         255.255.255.0   U     1      0        0 eth0

169.254.0.0/16是RFC定义的特殊link-local地址。

路由表的策略是: 当需要向某个IP发送数据时,将目标IP地址和路由表中每条路由项的掩码(Genmask) 进行相与(AND)计算,如果结果匹配对应路由项的Destination,则记录下此路由项。如果有多个匹配项,则会选择最佳路由项(mask较为精确的优先;Metric较小的优先)。然后通过最佳路由项的interface或Gateway发送数据:如果该路由项没有G flag (Gateway == 0.0.0.0),则将数据由对应的interface发送,否则,继续以Gateway地址为目标IP搜索路由表,直到找到直接的路由项并由其interface发送。

连接OpenVPN并且使用”redirect-gateway def1″后增加的路由条目:

0.0.0.0         192.168.X.1   128.0.0.0       UG    0      0        0 tun0
128.0.0.0       192.168.X.1   128.0.0.0       UG    0      0        0 tun0
192.168.X.1   0.0.0.0         255.255.255.255 UH    0      0        0 tun0
A.B.C.D  192.168.0.1   255.255.255.255 UGH   0      0        0 eth0

192.168.X.1为OpenVPN添加的虚拟网段,A.B.C.D是OpenVPN伺服器的IP地址

另外,本机App主动向外发起通信建立Socket时,可以bind某个interface,则数据包直接由该interface发送,不再进行查找路由表操作。(Windows情况有所不同,特别是Vista之前)(source: On Windows, a call to bind() affects card selection only incoming traffic, not outgoing traffic. Thus, on a client running in a multi-homed system (i.e., more than one interface card), it’s the network stack that selects the card to use, and it makes its selection based solely on the destination IP, which in turn is based on the routing table. A call to bind() will not affect the choice of the card in any way.

It’s got something to do with something called a “Weak End System” (“Weak E/S”) model. Vista changed to a strong E/S model, so the issue might not arise under Vista. But all prior versions of Windows used the weak E/S model.

With a weak E/S model, it’s the routing table that decides which card is used for outgoing traffic in a multihomed system. )

NAT

NAT常用于私有地址与公网地址之间的转换。

例如:本机IP为192.168.0.2,通过路由器192.168.0.1上网,路由器具有公网IP A.B.C.D。当本机向8.8.8.8发送DNS解析请求时:

192.168.0.2:查询路由表,找到需要的interface eth0。
192.168.0.2:通过eth0发送udp数据包:source ip为192.168.0.2,source port为随机打开的某个高端埠(Ephemeral port, Linux: 32768-61000, Windows: 49152-65535),destination ip为8.8.8.8,destination port为53
192.168.0.1:收到了192.168.0.2发送的数据包,SNAT,将转换后的数据包通过internet interface发送出去。同时记录到转换表中
8.8.8.8:收到了A.B.C.D发送的数据包,发送响应数据。
192.168.0.1:收到了8.8.8.8的数据包,Un SNAT:根据内存中的转换表找到目标机器192.168.0.2,替换数据包的destination,并由路由器的lan interface发送出去。
192.168.0.2:eth0收到了响应数据包。如同直接与8.8.8.8通信一样。

SNAT转换将数据包源地址由192.168.0.2替换为路由器公网IP A.B.C.D。并可能替换数据包的source port
Un SNAT转换将数据包目标地址由A.B.C.D替换为192.168.0.2

Un SNAT是自动进行的。

路由器保存有当前建立的连接的「转换表」ip_conntrack,转换表每个条目有几个要素:(不是标准格式!!仅用于理解SNAT)
(source ip, source port, router port, destination ip, destination port, proto)

(192.168.0.2, 32002, 32002, 8.8.8.8, 53, udp)
对SNAT而言,转换表每个条目后四个要素不能完全相同(否则路由器就不知道从internet收到的数据包应该转发给哪个内网机器了)。

根据转换表Un SNAT找到数据包对应的内网IP。

iptables

查看iptables规则:

iptables -vxnL
iptables -t nat -vxnL

数据包通过iptables chains的流程可以参考这里,解释的很详细。流程概述:

Receive:

某个interface收到数据包 -> PREROUTING (manage, nat) -> route ->
    是发送给本机的数据包? -> INPUT (manage, filter) -> app
    不是 -> FORWARD (manage, filter) -> POSTROUTING (manage, nat) -> 某个interface发出

Send:

app发送数据包 -> route -> OUTPUT (manage, nat, filter) -> POSTROUTING (manage, nat) -> 某个interface发出

* Receive时判断是否是发送给本机的数据包的方法是:收到数据包的destination IP是否与本机某个interface的IP相同。
* Send的OUTPUT chain除了manage和filter以外nat表也有。nat的OUTPUT位于filter的之前,用于对从本机(app)发出的(而不是收到并FORWARD的)请求做DNAT或REDIRECT。
* Send的route阶段确定了数据包的source ip和source port。source ip和source port由app发送数据包时决定。如果app没有bind某个interface并且没有设置source ip/port,则source ip时为route使用的interface ip。
* filter表的INPUT, OUTPUT和FORWARD用来过滤数据包 -j ACCEPT /DROP
* nat表的PREROUTING用来DNAT(或REDIRECT), POSTROUTING用来SNAT(或MASQUERADE)
* FORWARD (以及所有forward之后的chain)需要net.ipv4.ip_forward=1

常见路由器NAT的iptables规则:

#SNAT
#br0是路由器internet interface
#lo0是路由器lan interface
iptables -t nat -A POSTROUTING -o br0 -j MASQUERADE

#DNAT,用于如路由器埠映射
#A.B.C.D是路由器internet ip
iptables -t nat -A PREROUTING -d A.B.C.D --dport 3000:4000 -j DNAT --to 192.168.0.2

PREROUTING不能使用-o
POSTROUTING不能使用-i

关于DNAT:

内网 - firewall - 外网
    -> SNAT ->
    <- Un SNAT <-    
 
    <- DNAT
    -> Un DNAT

以上面为例:

DNAT改变数据包dst为内网地址
Un DNAT还原响应的数据包src为firewall地址

SNAT改变数据包src为firewall地址
Un SNAT还原响应的数据包dst为内网地址

首先是对初始数据包DNAT或SNAT,建立连接(established)后才能对返回数据包Un DNAT和Un SNAT。

说的有些绕 = =

关于REDIRECT:
特殊形式的DNAT
The REDIRECT target is used to redirect packets and streams to the (firewall) machine itself. It redirects the packet to the machine itself by changing the destination IP to the primary address of the incoming interface (locally-generated packets are mapped to the 127.0.0.1 address)

参数可以改变埠

-p tcp --to-port[s] 5409
-p tcp --to-port[s] 4000-5409

DNAT,SNAT,Un DNAT, Un SNAT:
DNAT也会对反向数据包自动Un DNAT。Un DNAT与SNAT之间没有关联。Un DNAT的数据包不会经过POSTROUTING,同样Un SNAT的数据包也不会经过PREROUTING(但Un DNAT在 **相当于** POSTROUTING chain的阶段执行,Un SNAT同理,参考下面的说明)。

In short, the ‘nat’ table chains only see the
first packet of a “connection”, and only if it has the state NEW (not
RELATED). All the subsequent valid packets belonging or related to that
connection (state NEW, ESTABLISHED, or RELATED) don’t go through theses
chains. The action taken by these packets is automatically determined by
the NAT operation applied to the first packet and the direction of the
packet.

For instance, with this rule :
iptables -t nat -A POSTROUTING -o eth0 -j SNAT –to 1.2.3.4

The first ‘direct’ packet of an outgoing connection on eth0 goes through
the nat POSTROUTING chains and matches this rule, so the SNAT operation
is applied. Instead of going through the POSTROUTING chain, the
subsequent direct packets (in the same direction) of the connection will
automatically be applied the same SNAT operation. The return packets (in
the opposite direction) of the connection will automatically be applied
the de-SNAT operation instead of going through the nat PREROUTING chain.
By the way, the subsequent packets of the connection don’t need to go in
or out eth0 (funny, huh ?) to be properly NATed.

De-MASQ and de-SNAT both are destination address rewrite operations, so
it is consistent that they take place in the same place as the nat
PREROUTING chain which performs DNAT. But keep in mind that they take
place *instead* of trversing the nat PREROUTING chain, so you will never
see packets being de-MASQ-ed or de-SNAT-ed in any nat chain.

举一些例子。

用NAT在国外VPS上搭建一个twitter代理:

199.59.150.39是twitter.com的IP(被墙),A.B.C.D是VPS的IP。这样做之后修改本地hosts将twitter.com指向A.B.C.D即可直接访问https://twitter.com/。这种方法局限性太强,仅用于演示。

# DNAT的--to-destination 和 SNAT的--to-source可以简写成--to
# 通过DNAT将443埠接受到的请求转发给twitter
iptables -t nat -A PREROUTING -p tcp --dport 443 -j DNAT --to-destination 199.59.150.39:443
# 同时将src改为VPS的IP,否则twitter收到的访问请求源地址还是原始客户端IP
# 并且由于这时twitter响应的数据包不会经过VPS,无法自动反向 DNAT
# 而原始客户端只知道自己在和VPS的IP通信,会把twitter响应的数据包丢弃(如果能收到的话)
# 另外,如果不SNAT,很多VPS发给twitter的数据包可能中途就被路由器丢弃了(ip spoofing),如Linode就会丢弃这种。
iptables -t nat -A POSTROUTING -p tcp --dport 443 -j SNAT --to-source A.B.C.D

OpenVPN伺服器端的NAT规则:

# 这里的192.168.0.0/24 是OpenVPN网段
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -o eth0 -j MASQUERADE

在网关(如路由器)上部署OpenVPN client,同时向整个区域网机器提供透明翻墙服务的NAT规则

# 这里的192.168.0.0/24是区域网网段
# 需要"redirect-gateway def1"设置默认路由走VPN。
# 可以不加-o tun0,视具体情况。
iptables -t nat -A POSTROUTING -s 192.168.0.0/24 -o tun0 -j MASQUERADE

如果运行OpenVPN Client的机器只用于自己翻墙(不提供网关服务),则不需要SNAT,也不需要开启ip_forward,只需要”redirect-gateway def1″添加路由即可。

sysctl参数

# net.ipv4.ip_forward 非路由器默认为0
# net.ipv4.conf.all.accept_redirects 非路由器默认为1
# net.ipv4.conf.all.send_redirects 默认为1
# ipv6 同上
# rp_filter 验证incoming数据包源地址。对于openvpn(服务器端)interface应当设为0

#Read:
sysctl -n ipv4.ip_forward

#Write:
sysctl -w net.ipv4.ip_forward=1

#另一种设置方法
echo 1 > /proc/sys/net/ipv4/ip_forward
#/proc/sys/net/ipv4/conf/all/accept_redirects
#/proc/sys/net/ipv4/conf/all/send_redirects
# or per interface config
#/proc/sys/net/ipv4/conf/eth0/accept_redirects
#/proc/sys/net/ipv4/conf/eth0/send_redirects

#永久设置方法
# edit /etc/sysctl.conf, add (or edit) the follow line
#net.ipv4.ip_forward=1
# after changing, run "sysctl -p" to make changes take effect

net.ipv4.ip_forward和net.ipv4.conf.all.accept_redirects这两个值总是相反的,设置一个会相应改变另一个。

关于ICMP Redirect:介绍

开启ICMP Redirect后的ping结果类似这种:

[[email protected] ~]# ping 114.114.114.114
PING 114.114.114.114 (114.114.114.114) 56(84) bytes of data.
64 bytes from 114.114.114.114: icmp_seq=1 ttl=82 time=22.2 ms
From 192.168.0.18: icmp_seq=2 Redirect Host(New nexthop: 192.168.0.1)
64 bytes from 114.114.114.114: icmp_seq=2 ttl=93 time=7.08 ms
64 bytes from 114.114.114.114: icmp_seq=3 ttl=74 time=29.1 ms
64 bytes from 114.114.114.114: icmp_seq=4 ttl=71 time=4.26 ms

192.168.0.18和192.168.0.1是两个网关。