path: root/net/netfilter
AgeCommit message (Collapse)AuthorLines
2019-08-29netfilter: nf_flow_table: clear skb tstamp before xmitFlorian Westphal-1/+2
If 'fq' qdisc is used and a program has requested timestamps, skb->tstamp needs to be cleared, else fq will treat these as 'transmit time'. Signed-off-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-08-27netfilter: conntrack: make sysctls per-namespace againFlorian Westphal-0/+5
When I merged the extension sysctl tables with the main one I forgot to reset them on netns creation. They currently read/write init_net settings. Fixes: d912dec12428 ("netfilter: conntrack: merge acct and helper sysctl table with main one") Fixes: cb2833ed0044 ("netfilter: conntrack: merge ecache and timestamp sysctl tables with main one") Reported-by: Shmulik Ladkani <> Signed-off-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-08-27netfilter: nf_conntrack_ftp: Fix debug outputThomas Jarosch-1/+1
The find_pattern() debug output was printing the 'skip' character. This can be a NULL-byte and messes up further pr_debug() output. Output without the fix: kernel: nf_conntrack_ftp: Pattern matches! kernel: nf_conntrack_ftp: Skipped up to `<7>nf_conntrack_ftp: find_pattern `PORT': dlen = 8 kernel: nf_conntrack_ftp: find_pattern `EPRT': dlen = 8 Output with the fix: kernel: nf_conntrack_ftp: Pattern matches! kernel: nf_conntrack_ftp: Skipped up to 0x0 delimiter! kernel: nf_conntrack_ftp: Match succeeded! kernel: nf_conntrack_ftp: conntrack_ftp: match `172,17,0,100,200,207' (20 bytes at 4150681645) kernel: nf_conntrack_ftp: find_pattern `PORT': dlen = 8 Signed-off-by: Thomas Jarosch <> Signed-off-by: Pablo Neira Ayuso <>
2019-08-27netfilter: xt_physdev: Fix spurious error message in physdev_mt_checkTodd Seidelmann-4/+2
Simplify the check in physdev_mt_check() to emit an error message only when passed an invalid chain (ie, NF_INET_LOCAL_OUT). This avoids cluttering up the log with errors against valid rules. For large/heavily modified rulesets, current behavior can quickly overwhelm the ring buffer, because this function gets called on every change, regardless of the rule that was changed. Signed-off-by: Todd Seidelmann <> Acked-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-08-19netfilter: xt_nfacct: Fix alignment mismatch in xt_nfacct_match_infoJuliana Rodrigueiro-11/+25
When running a 64-bit kernel with a 32-bit iptables binary, the size of the xt_nfacct_match_info struct diverges. kernel: sizeof(struct xt_nfacct_match_info) : 40 iptables: sizeof(struct xt_nfacct_match_info)) : 36 Trying to append nfacct related rules results in an unhelpful message. Although it is suggested to look for more information in dmesg, nothing can be found there. # iptables -A <chain> -m nfacct --nfacct-name <acct-object> iptables: Invalid argument. Run `dmesg' for more information. This patch fixes the memory misalignment by enforcing 8-byte alignment within the struct's first revision. This solution is often used in many other uapi netfilter headers. Signed-off-by: Juliana Rodrigueiro <> Acked-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-08-19netfilter: nft_flow_offload: missing netlink attribute policyPablo Neira Ayuso-0/+6
The netlink attribute policy for NFTA_FLOW_TABLE_NAME is missing. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: Pablo Neira Ayuso <>
2019-08-18netfilter: nf_tables: map basechain priority to hardware priorityPablo Neira Ayuso-3/+18
This patch adds initial support for offloading basechains using the priority range from 1 to 65535. This is restricting the netfilter priority range to 16-bit integer since this is what most drivers assume so far from tc. It should be possible to extend this range of supported priorities later on once drivers are updated to support for 32-bit integer priorities. Signed-off-by: Pablo Neira Ayuso <> Signed-off-by: David S. Miller <>
2019-08-14netfilter: nft_flow_offload: skip tcp rst and fin packetsPablo Neira Ayuso-3/+6
TCP rst and fin packets do not qualify to place a flow into the flowtable. Most likely there will be no more packets after connection closure. Without this patch, this flow entry expires and connection tracking picks up the entry in ESTABLISHED state using the fixup timeout, which makes this look inconsistent to the user for a connection that is actually already closed. Signed-off-by: Pablo Neira Ayuso <>
2019-08-13netfilter: conntrack: Use consistent ct id hash calculationDirk Morris-8/+8
Change ct id hash calculation to only use invariants. Currently the ct id hash calculation is based on some fields that can change in the lifetime on a conntrack entry in some corner cases. The current hash uses the whole tuple which contains an hlist pointer which will change when the conntrack is placed on the dying list resulting in a ct id change. This patch also removes the reply-side tuple and extension pointer from the hash calculation so that the ct id will will not change from initialization until confirmation. Fixes: 3c79107631db1f7 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id") Signed-off-by: Dirk Morris <> Acked-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-08-09netfilter: nf_flow_table: teardown flow timeout racePablo Neira Ayuso-9/+25
Flows that are in teardown state (due to RST / FIN TCP packet) still have their offload flag set on. Hence, the conntrack garbage collector may race to undo the timeout adjustment that the fixup routine performs, leaving the conntrack entry in place with the internal offload timeout (one day). Update teardown flow state to ESTABLISHED and set tracking to liberal, then once the offload bit is cleared, adjust timeout if it is more than the default fixup timeout (conntrack might already have set a lower timeout from the packet path). Fixes: da5984e51063 ("netfilter: nf_flow_table: add support for sending flows back to the slow path") Signed-off-by: Pablo Neira Ayuso <>
2019-08-09netfilter: nf_flow_table: conntrack picks up expired flowsPablo Neira Ayuso-7/+10
Update conntrack entry to pick up expired flows, otherwise the conntrack entry gets stuck with the internal offload timeout (one day). The TCP state also needs to be adjusted to ESTABLISHED state and tracking is set to liberal mode in order to give conntrack a chance to pick up the expired flow. Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure") Signed-off-by: Pablo Neira Ayuso <>
2019-08-09netfilter: nf_tables: use-after-free in failing rule with bound setPablo Neira Ayuso-5/+10
If a rule that has already a bound anonymous set fails to be added, the preparation phase releases the rule and the bound set. However, the transaction object from the abort path still has a reference to the set object that is stale, leading to a use-after-free when checking for the set->bound field. Add a new field to the transaction that specifies if the set is bound, so the abort path can skip releasing it since the rule command owns it and it takes care of releasing it. After this update, the set->bound field is removed. [ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434 [ 24.657858] Mem abort info: [ 24.660686] ESR = 0x96000004 [ 24.663769] Exception class = DABT (current EL), IL = 32 bits [ 24.669725] SET = 0, FnV = 0 [ 24.672804] EA = 0, S1PTW = 0 [ 24.675975] Data abort info: [ 24.678880] ISV = 0, ISS = 0x00000004 [ 24.682743] CM = 0, WnR = 0 [ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000 [ 24.692207] [0000000000040434] pgd=0000000000000000 [ 24.697119] Internal error: Oops: 96000004 [#1] SMP [...] [ 24.889414] Call trace: [ 24.891870] __nf_tables_abort+0x3f0/0x7a0 [ 24.895984] nf_tables_abort+0x20/0x40 [ 24.899750] nfnetlink_rcv_batch+0x17c/0x588 [ 24.904037] nfnetlink_rcv+0x13c/0x190 [ 24.907803] netlink_unicast+0x18c/0x208 [ 24.911742] netlink_sendmsg+0x1b0/0x350 [ 24.915682] sock_sendmsg+0x4c/0x68 [ 24.919185] ___sys_sendmsg+0x288/0x2c8 [ 24.923037] __sys_sendmsg+0x7c/0xd0 [ 24.926628] __arm64_sys_sendmsg+0x2c/0x38 [ 24.930744] el0_svc_common.constprop.0+0x94/0x158 [ 24.935556] el0_svc_handler+0x34/0x90 [ 24.939322] el0_svc+0x8/0xc [ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863) [ 24.948336] ---[ end trace cebbb9dcbed3b56f ]--- Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Signed-off-by: Pablo Neira Ayuso <>
2019-08-05netfilter: nf_flow_table: fix offload for flows that are subject to xfrmFlorian Westphal-0/+43
This makes the previously added 'encap test' pass. Because its possible that the xfrm dst entry becomes stale while such a flow is offloaded, we need to call dst_check() -- the notifier that handles this for non-tunneled traffic isn't sufficient, because SA or or policies might have changed. If dst becomes stale the flow offload entry will be tagged for teardown and packets will be passed to 'classic' forwarding path. Removing the entry right away is problematic, as this would introduce a race condition with the gc worker. In case flow is long-lived, it could eventually be offloaded again once the gc worker removes the entry from the flow table. Signed-off-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-29netfilter: ipset: Fix rename concurrency with listingJozsef Kadlecsik-1/+1
Shijie Luo reported that when stress-testing ipset with multiple concurrent create, rename, flush, list, destroy commands, it can result ipset <version>: Broken LIST kernel message: missing DATA part! error messages and broken list results. The problem was the rename operation was not properly handled with respect of listing. The patch fixes the issue. Reported-by: Shijie Luo <> Signed-off-by: Jozsef Kadlecsik <>
2019-07-29netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and ↵Stefano Brivio-2/+2
hash:ip,mac sets In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the KADT functions for sets matching on MAC addreses the copy of source or destination MAC address depending on the configured match. This was done correctly for hash:mac, but for hash:ip,mac and bitmap:ip,mac, copying and pasting the same code block presents an obvious problem: in these two set types, the MAC address is the second dimension, not the first one, and we are actually selecting the MAC address depending on whether the first dimension (IP address) specifies source or destination. Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags. This way, mixing source and destination matches for the two dimensions of ip,mac set types works as expected. With this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add dev veth1 ip -net A addr add dev veth2 ip link set veth1 up ip -net A link set veth2 up dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset create test_bitmap bitmap:ip,mac range ip netns exec A ipset add test_bitmap,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP ip netns exec A ipset create test_hash hash:ip,mac ip netns exec A ipset add test_hash,${dst} ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP ipset correctly matches a test packet: # ping -c1 >/dev/null # echo $? 0 Reported-by: Chen Yi <> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <> Signed-off-by: Jozsef Kadlecsik <>
2019-07-29netfilter: ipset: Actually allow destination MAC address for hash:ip,mac ↵Stefano Brivio-4/+0
sets too In commit 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the KADT check that prevents matching on destination MAC addresses for hash:mac sets, but forgot to remove the same check for hash:ip,mac set. Drop this check: functionality is now commented in man pages and there's no reason to restrict to source MAC address matching anymore. Reported-by: Chen Yi <> Fixes: 8cc4ccf58379 ("ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <> Signed-off-by: Jozsef Kadlecsik <>
2019-07-25netfilter: nf_tables: Make nft_meta expression more robustPhil Sutter-12/+4
nft_meta_get_eval()'s tendency to bail out setting NFT_BREAK verdict in situations where required data is missing leads to unexpected behaviour with inverted checks like so: | meta iifname != eth0 accept This rule will never match if there is no input interface (or it is not known) which is not intuitive and, what's worse, breaks consistency of iptables-nft with iptables-legacy. Fix this by falling back to placing a value in dreg which never matches (avoiding accidental matches), i.e. zero for interface index and an empty string for interface name. Signed-off-by: Phil Sutter <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-19net: flow_offload: add flow_block structure and use itPablo Neira Ayuso-3/+4
This object stores the flow block callbacks that are attached to this block. Update flow_block_cb_lookup() to take this new object. This patch restores the block sharing feature. Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions") Signed-off-by: Pablo Neira Ayuso <> Acked-by: Jiri Pirko <> Signed-off-by: David S. Miller <>
2019-07-19Merge git:// S. Miller-64/+84
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Fix a deadlock when module is requested via netlink_bind() in nfnetlink, from Florian Westphal. 2) Fix ipt_rpfilter and ip6t_rpfilter with VRF, from Miaohe Lin. 3) Skip master comparison in SIP helper to fix expectation clash under two valid scenarios, from xiao ruizhu. 4) Remove obsolete comments in nf_conntrack codebase, from Yonatan Goldschmidt. 5) Fix redirect extension module autoload, from Christian Hesse. 6) Fix incorrect mssg option sent to client in synproxy, from Fernando Fernandez. 7) Fix incorrect window calculations in TCP conntrack, from Florian Westphal. 8) Don't bail out when updating basechain policy due to recent offload works, also from Florian. 9) Allow symhash to use modulus 1 as other hash extensions do, from Laura.Garcia. 10) Missing NAT chain module autoload for the inet family, from Phil Sutter. 11) Fix missing adjustment of TCP RST packet in synproxy, from Fernando Fernandez. 12) Skip EAGAIN path when nft_meta_bridge is built-in or not selected. 13) Conntrack bridge does not depend on nf_tables_bridge. 14) Turn NF_TABLES_BRIDGE into tristate to fix possible link break of nft_meta_bridge, from Arnd Bergmann. ==================== Signed-off-by: David S. Miller <>
2019-07-19netfilter: bridge: make NF_TABLES_BRIDGE tristateArnd Bergmann-2/+2
The new nft_meta_bridge code fails to link as built-in when NF_TABLES is a loadable module. net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_eval': nft_meta_bridge.c:(.text+0x1e8): undefined reference to `nft_meta_get_eval' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_get_init': nft_meta_bridge.c:(.text+0x468): undefined reference to `nft_meta_get_init' nft_meta_bridge.c:(.text+0x49c): undefined reference to `nft_parse_register' nft_meta_bridge.c:(.text+0x4cc): undefined reference to `nft_validate_register_store' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_exit': nft_meta_bridge.c:(.exit.text+0x14): undefined reference to `nft_unregister_expr' net/bridge/netfilter/nft_meta_bridge.o: In function `nft_meta_bridge_module_init': nft_meta_bridge.c:(.init.text+0x14): undefined reference to `nft_register_expr' net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x60): undefined reference to `nft_meta_get_dump' net/bridge/netfilter/nft_meta_bridge.o:(.rodata+0x88): undefined reference to `nft_meta_set_eval' This can happen because the NF_TABLES_BRIDGE dependency itself is just a 'bool'. Make the symbol a 'tristate' instead so Kconfig can propagate the dependencies correctly. Fixes: 30e103fe24de ("netfilter: nft_meta: move bridge meta keys into nft_meta_bridge") Signed-off-by: Arnd Bergmann <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-18proc/sysctl: add shared variables for range checkMatteo Croce-2/+1
In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [ tipc: remove two unused variables] Link: [ fix net/ipv6/sysctl_net_ipv6.c] [ proc/sysctl: make firmware loader table conditional] Link: [ fix fs/eventpoll.c] Link: Signed-off-by: Matteo Croce <> Signed-off-by: Arnd Bergmann <> Acked-by: Kees Cook <> Reviewed-by: Aaron Tomlin <> Cc: Matthew Wilcox <> Cc: Stephen Rothwell <> Signed-off-by: Andrew Morton <> Signed-off-by: Linus Torvalds <>
2019-07-18netfilter: nft_meta: skip EAGAIN if nft_meta_bridge is not a modulePablo Neira Ayuso-1/+1
If it is a module, request this module. Otherwise, if it is compiled built-in or not selected, skip this. Fixes: 0ef1efd1354d ("netfilter: nf_tables: force module load in case select_ops() returns -EAGAIN") Signed-off-by: Pablo Neira Ayuso <>
2019-07-18netfilter: synproxy: fix rst sequence number mismatchFernando Fernandez Mancera-2/+2
14:51:00.024418 IP > netfilter.90: Flags [S], seq 4023580551, 14:51:00.024454 IP netfilter.90 > Flags [S.], seq 727560212, ack 4023580552, 14:51:00.024524 IP > netfilter.90: Flags [.], ack 1, Note: here, synproxy will send a SYN to the real server, as the 3whs was completed sucessfully. Instead of a syn/ack that we can intercept, we instead received a reset packet from the real backend, that we forward to the original client. However, we don't use the correct sequence number, so the reset is not effective in closing the connection coming from the client. 14:51:00.024550 IP netfilter.90 > Flags [R.], seq 3567407084, 14:51:00.231196 IP > netfilter.90: Flags [.], ack 1, 14:51:00.647911 IP > netfilter.90: Flags [.], ack 1, 14:51:01.474395 IP > netfilter.90: Flags [.], ack 1, Fixes: 48b1de4c110a ("netfilter: add SYNPROXY core/target") Signed-off-by: Fernando Fernandez Mancera <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-18netfilter: nf_tables: Support auto-loading for inet natPhil Sutter-0/+3
Trying to create an inet family nat chain would not cause nft_chain_nat.ko module to auto-load due to missing module alias. Add a proper one with hard-coded family value 1 for the pseudo-family NFPROTO_INET. Fixes: d164385ec572 ("netfilter: nat: add inet family nat support") Signed-off-by: Phil Sutter <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-16netfilter: nft_hash: fix symhash with modulus oneLaura Garcia Liebana-1/+1
The rule below doesn't work as the kernel raises -ERANGE. nft add rule netdev nftlb lb01 ip daddr set \ symhash mod 1 map { 0 : } fwd to "eth0" This patch allows to use the symhash modulus with one element, in the same way that the other types of hashes and algorithms that uses the modulus parameter. Signed-off-by: Laura Garcia Liebana <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-16netfilter: nf_tables: don't fail when updating base chain policyFlorian Westphal-0/+2
The following nftables test case fails on nf-next: tests/shell/ tests/shell/testcases/transactions/0011chain_0 The test case contains: add chain x y { type filter hook input priority 0; } add chain x y { policy drop; }" The new test if (chain->flags ^ flags) return -EOPNOTSUPP; triggers here, because chain->flags has NFT_BASE_CHAIN set, but flags is 0 because no flag attribute was present in the policy update. Just fetch the current flag settings of a pre-existing chain in case userspace did not provide any. Fixes: c9626a2cbdb20 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-16netfilter: conntrack: always store window size un-scaledFlorian Westphal-3/+5
Jakub Jankowski reported following oddity: After 3 way handshake completes, timeout of new connection is set to max_retrans (300s) instead of established (5 days). shortened excerpt from pcap provided: 25.070622 IP (flags [DF], proto TCP (6), length 52) > Flags [S], seq 11, win 64240, [wscale 8] 26.070462 IP (flags [DF], proto TCP (6), length 48) > Flags [S.], seq 82, ack 12, win 65535, [wscale 3] 27.070449 IP (flags [DF], proto TCP (6), length 40) > Flags [.], ack 83, win 512, length 0 Turns out the last_win is of u16 type, but we store the scaled value: 512 << 8 (== 0x20000) becomes 0 window. The Fixes tag is not correct, as the bug has existed forever, but without that change all that this causes might cause is to mistake a window update (to-nonzero-from-zero) for a retransmit. Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0") Reported-by: Jakub Jankowski <> Tested-by: Jakub Jankowski <> Signed-off-by: Florian Westphal <> Acked-by: Jozsef Kadlecsik <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-16netfilter: synproxy: fix erroneous tcp mss optionFernando Fernandez Mancera-2/+4
Now synproxy sends the mss value set by the user on client syn-ack packet instead of the mss value that client announced. Fixes: 48b1de4c110a ("netfilter: add SYNPROXY core/target") Signed-off-by: Fernando Fernandez Mancera <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-16netfilter: nf_tables: fix module autoload for redirChristian Hesse-1/+1
Fix expression for autoloading. Fixes: 5142967ab524 ("netfilter: nf_tables: fix module autoload with inet family") Signed-off-by: Christian Hesse <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-16netfilter: Update obsolete comments referring to ip_conntrackYonatan Goldschmidt-14/+7
In 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.") the new generic nf_conntrack was introduced, and it came to supersede the old ip_conntrack. This change updates (some) of the obsolete comments referring to old file/function names of the ip_conntrack mechanism, as well as removes a few self-referencing comments that we shouldn't maintain anymore. I did not update any comments referring to historical actions (e.g, comments like "this file was derived from ..." were left untouched, even if the referenced file is no longer here). Signed-off-by: Yonatan Goldschmidt <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-16netfilter: nf_conntrack_sip: fix expectation clashxiao ruizhu-38/+56
When conntracks change during a dialog, SDP messages may be sent from different conntracks to establish expects with identical tuples. In this case expects conflict may be detected for the 2nd SDP message and end up with a process failure. The fixing here is to reuse an existing expect who has the same tuple for a different conntrack if any. Here are two scenarios for the case. 1) SERVER CPE | INVITE SDP | 5060 |<----------------------|5060 | 100 Trying | 5060 |---------------------->|5060 | 183 SDP | 5060 |---------------------->|5060 ===> Conntrack 1 | PRACK | 50601 |<----------------------|5060 | 200 OK (PRACK) | 50601 |---------------------->|5060 | 200 OK (INVITE) | 5060 |---------------------->|5060 | ACK | 50601 |<----------------------|5060 | | |<--- RTP stream ------>| | | | INVITE SDP (t38) | 50601 |---------------------->|5060 ===> Conntrack 2 With a certain configuration in the CPE, SIP messages "183 with SDP" and "re-INVITE with SDP t38" will go through the sip helper to create expects for RTP and RTCP. It is okay to create RTP and RTCP expects for "183", whose master connection source port is 5060, and destination port is 5060. In the "183" message, port in Contact header changes to 50601 (from the original 5060). So the following requests e.g. PRACK and ACK are sent to port 50601. It is a different conntrack (let call Conntrack 2) from the original INVITE (let call Conntrack 1) due to the port difference. In this example, after the call is established, there is RTP stream but no RTCP stream for Conntrack 1, so the RTP expect created upon "183" is cleared, and RTCP expect created for Conntrack 1 retains. When "re-INVITE with SDP t38" arrives to create RTP&RTCP expects, current ALG implementation will call nf_ct_expect_related() for RTP and RTCP. The expects tuples are identical to those for Conntrack 1. RTP expect for Conntrack 2 succeeds in creation as the one for Conntrack 1 has been removed. RTCP expect for Conntrack 2 fails in creation because it has idential tuples and 'conflict' with the one retained for Conntrack 1. And then result in a failure in processing of the re-INVITE. 2) SERVER A CPE | REGISTER | 5060 |<------------------| 5060 ==> CT1 | 200 | 5060 |------------------>| 5060 | | | INVITE SDP(1) | 5060 |<------------------| 5060 | 300(multi choice) | 5060 |------------------>| 5060 SERVER B | ACK | 5060 |<------------------| 5060 | INVITE SDP(2) | 5060 |-------------------->| 5060 ==> CT2 | 100 | 5060 |<--------------------| 5060 | 200(contact changes)| 5060 |<--------------------| 5060 | ACK | 5060 |-------------------->| 50601 ==> CT3 | | |<--- RTP stream ---->| | | | BYE | 5060 |<--------------------| 50601 | 200 | 5060 |-------------------->| 50601 | INVITE SDP(3) | 5060 |<------------------| 5060 ==> CT1 CPE sends an INVITE request(1) to Server A, and creates a RTP&RTCP expect pair for this Conntrack 1 (CT1). Server A responds 300 to redirect to Server B. The RTP&RTCP expect pairs created on CT1 are removed upon 300 response. CPE sends the INVITE request(2) to Server B, and creates an expect pair for the new conntrack (due to destination address difference), let call CT2. Server B changes the port to 50601 in 200 OK response, and the following requests ACK and BYE from CPE are sent to 50601. The call is established. There is RTP stream and no RTCP stream. So RTP expect is removed and RTCP expect for CT2 retains. As BYE request is sent from port 50601, it is another conntrack, let call CT3, different from CT2 due to the port difference. So the BYE request will not remove the RTCP expect for CT2. Then another outgoing call is made, with the same RTP port being used (not definitely but possibly). CPE firstly sends the INVITE request(3) to Server A, and tries to create a RTP&RTCP expect pairs for this CT1. In current ALG implementation, the RTCP expect for CT1 fails in creation because it 'conflicts' with the residual one for CT2. As a result the INVITE request fails to send. Signed-off-by: xiao ruizhu <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-15netfilter: nfnetlink: avoid deadlock due to synchronous request_moduleFlorian Westphal-1/+1
Thomas and Juliana report a deadlock when running: (rmmod nf_conntrack_netlink/xfrm_user) conntrack -e NEW -E & modprobe -v xfrm_user They provided following analysis: conntrack -e NEW -E netlink_bind() netlink_lock_table() -> increases "nl_table_users" nfnetlink_bind() # does not unlock the table as it's locked by netlink_bind() __request_module() call_usermodehelper_exec() This triggers "modprobe nf_conntrack_netlink" from kernel, netlink_bind() won't return until modprobe process is done. "modprobe xfrm_user": xfrm_user_init() register_pernet_subsys() -> grab pernet_ops_rwsem .. netlink_table_grab() calls schedule() as "nl_table_users" is non-zero so modprobe is blocked because netlink_bind() increased nl_table_users while also holding pernet_ops_rwsem. "modprobe nf_conntrack_netlink" runs and inits nf_conntrack_netlink: ctnetlink_init() register_pernet_subsys() -> blocks on "pernet_ops_rwsem" thanks to xfrm_user module both modprobe processes wait on one another -- neither can make progress. Switch netlink_bind() to "nowait" modprobe -- this releases the netlink table lock, which then allows both modprobe instances to complete. Reported-by: Thomas Jarosch <> Reported-by: Juliana Rodrigueiro <> Signed-off-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-11Merge git:// Torvalds-512/+2909
Pull networking updates from David Miller: "Some highlights from this development cycle: 1) Big refactoring of ipv6 route and neigh handling to support nexthop objects configurable as units from userspace. From David Ahern. 2) Convert explored_states in BPF verifier into a hash table, significantly decreased state held for programs with bpf2bpf calls, from Alexei Starovoitov. 3) Implement bpf_send_signal() helper, from Yonghong Song. 4) Various classifier enhancements to mvpp2 driver, from Maxime Chevallier. 5) Add aRFS support to hns3 driver, from Jian Shen. 6) Fix use after free in inet frags by allocating fqdirs dynamically and reworking how rhashtable dismantle occurs, from Eric Dumazet. 7) Add act_ctinfo packet classifier action, from Kevin Darbyshire-Bryant. 8) Add TFO key backup infrastructure, from Jason Baron. 9) Remove several old and unused ISDN drivers, from Arnd Bergmann. 10) Add devlink notifications for flash update status to mlxsw driver, from Jiri Pirko. 11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski. 12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes. 13) Various enhancements to ipv6 flow label handling, from Eric Dumazet and Willem de Bruijn. 14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van der Merwe, and others. 15) Various improvements to axienet driver including converting it to phylink, from Robert Hancock. 16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean. 17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana Radulescu. 18) Add devlink health reporting to mlx5, from Moshe Shemesh. 19) Convert stmmac over to phylink, from Jose Abreu. 20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from Shalom Toledo. 21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera. 22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel. 23) Track spill/fill of constants in BPF verifier, from Alexei Starovoitov. 24) Support bounded loops in BPF, from Alexei Starovoitov. 25) Various page_pool API fixes and improvements, from Jesper Dangaard Brouer. 26) Just like ipv4, support ref-countless ipv6 route handling. From Wei Wang. 27) Support VLAN offloading in aquantia driver, from Igor Russkikh. 28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy. 29) Add flower GRE encap/decap support to nfp driver, from Pieter Jansen van Vuuren. 30) Protect against stack overflow when using act_mirred, from John Hurley. 31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen. 32) Use page_pool API in netsec driver, Ilias Apalodimas. 33) Add Google gve network driver, from Catherine Sullivan. 34) More indirect call avoidance, from Paolo Abeni. 35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan. 36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek. 37) Add MPLS manipulation actions to TC, from John Hurley. 38) Add sending a packet to connection tracking from TC actions, and then allow flower classifier matching on conntrack state. From Paul Blakey. 39) Netfilter hw offload support, from Pablo Neira Ayuso" * git:// (2080 commits) net/mlx5e: Return in default case statement in tx_post_resync_params mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync(). net: dsa: add support for BRIDGE_MROUTER attribute pkt_sched: Include const.h net: netsec: remove static declaration for netsec_set_tx_de() net: netsec: remove superfluous if statement netfilter: nf_tables: add hardware offload support net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload net: flow_offload: add flow_block_cb_is_busy() and use it net: sched: remove tcf block API drivers: net: use flow block API net: sched: use flow block API net: flow_offload: add flow_block_cb_{priv, incref, decref}() net: flow_offload: add list handling functions net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free() net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND net: flow_offload: add flow_block_cb_setup_simple() net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC ...
2019-07-09netfilter: nf_tables: add hardware offload supportPablo Neira Ayuso-7/+599
This patch adds hardware offload support for nftables through the existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER classifier and the flow rule API. This hardware offload support is available for the NFPROTO_NETDEV family and the ingress hook. Each nftables expression has a new ->offload interface, that is used to populate the flow rule object that is attached to the transaction object. There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload an entire table, including all of its chains. This patch supports for basic metadata (layer 3 and 4 protocol numbers), 5-tuple payload matching and the accept/drop actions; this also includes basechain hardware offload only. Signed-off-by: Pablo Neira Ayuso <> Signed-off-by: David S. Miller <>
2019-07-09Merge tag 'docs-5.3' of git:// Torvalds-8/+8
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git:// (129 commits) docs: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
2019-07-09Merge tag 'leds-for-5.3-rc1' of ↵Linus Torvalds-1/+1
git:// Pull LED updates from Jacek Anaszewski: - Add a new LED common module for ti-lmu driver family - Modify MFD ti-lmu bindings - add ti,brightness-resolution - add the ramp up/down property - Add regulator support for LM36274 driver to lm363x-regulator.c - New LED class drivers with DT bindings: - leds-spi-byte - leds-lm36274 - leds-lm3697 (move the support from MFD to LED subsystem) - Simplify getting the I2C adapter of a client: - leds-tca6507 - leds-pca955x - Convert LED documentation to ReST * tag 'leds-for-5.3-rc1' of git:// dt: leds-lm36274.txt: fix a broken reference to ti-lmu.txt docs: leds: convert to ReST leds: leds-tca6507: simplify getting the adapter of a client leds: leds-pca955x: simplify getting the adapter of a client leds: lm36274: Introduce the TI LM36274 LED driver dt-bindings: leds: Add LED bindings for the LM36274 regulator: lm363x: Add support for LM36274 mfd: ti-lmu: Add LM36274 support to the ti-lmu dt-bindings: mfd: Add lm36274 bindings to ti-lmu leds: max77650: Remove set but not used variable 'parent' leds: avoid flush_work in atomic context leds: lm3697: Introduce the lm3697 driver mfd: ti-lmu: Remove support for LM3697 dt-bindings: ti-lmu: Modify dt bindings for the LM3697 leds: TI LMU: Add common code for TI LMU devices leds: spi-byte: add single byte SPI LED driver dt-bindings: leds: Add binding for spi-byte LED. dt-bindings: mfd: LMU: Add ti,brightness-resolution dt-bindings: mfd: LMU: Add the ramp up/down property
2019-07-08Merge git:// S. Miller-81/+100
Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <>
2019-07-08Merge git:// S. Miller-76/+472
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Move bridge keys in nft_meta to nft_meta_bridge, from wenxu. 2) Support for bridge pvid matching, from wenxu. 3) Support for bridge vlan protocol matching, also from wenxu. 4) Add br_vlan_get_pvid_rcu(), to fetch the bridge port pvid from packet path. 5) Prefer specific family extension in nf_tables. 6) Autoload specific family extension in case it is missing. 7) Add synproxy support to nf_tables, from Fernando Fernandez Mancera. 8) Support for GRE encapsulation in IPVS, from Vadim Fedorenko. 9) ICMP handling for GRE encapsulation, from Julian Anastasov. 10) Remove unused parameter in nf_queue, from Florian Westphal. 11) Replace seq_printf() by seq_puts() in nf_log, from Markus Elfring. 12) Rename nf_SYNPROXY.h => nf_synproxy.h before this header becomes public. ==================== Signed-off-by: David S. Miller <>
2019-07-06netfilter: nf_tables: force module load in case select_ops() returns -EAGAINPablo Neira Ayuso-0/+10
nft_meta needs to pull in the nft_meta_bridge module in case that this is a bridge family rule from the select_ops() path. Signed-off-by: Pablo Neira Ayuso <>
2019-07-05netfilter: nf_tables: __nft_expr_type_get() selects specific family typePablo Neira Ayuso-5/+8
In case that there are two types, prefer the family specify extension. Signed-off-by: Pablo Neira Ayuso <>
2019-07-05netfilter: nf_tables: add nft_expr_type_request_module()Pablo Neira Ayuso-3/+14
This helper function makes sure the family specific extension is loaded. Signed-off-by: Pablo Neira Ayuso <>
2019-07-05netfilter: nft_meta: move bridge meta keys into nft_meta_bridgewenxu-53/+29
Separate bridge meta key from nft_meta to meta_bridge to avoid a dependency between the bridge module and nft_meta when using the bridge API available through include/linux/if_bridge.h Signed-off-by: wenxu <> Reviewed-by: Nikolay Aleksandrov <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-05ipvs: strip gre tunnel headers from icmp errorsJulian Anastasov-4/+42
Recognize GRE tunnels in received ICMP errors and properly strip the tunnel headers. Signed-off-by: Julian Anastasov <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-05netfilter: nf_tables: Add synproxy supportFernando Fernandez Mancera-0/+299
Add synproxy support for nf_tables. This behaves like the iptables synproxy target but it is structured in a way that allows us to propose improvements in the future. Signed-off-by: Fernando Fernandez Mancera <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-04ipvs: allow tunneling with gre encapsulationVadim Fedorenko-3/+64
windows real servers can handle gre tunnels, this patch allows gre encapsulation with the tunneling method, thereby letting ipvs be load balancer for windows-based services Signed-off-by: Vadim Fedorenko <> Acked-by: Julian Anastasov <> Signed-off-by: Simon Horman <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-04netfilter: nf_queue: remove unused hook entries pointerFlorian Westphal-6/+4
Its not used anywhere, so remove this. Signed-off-by: Florian Westphal <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-04netfilter: nf_log: Replace a seq_printf() call by seq_puts() in seq_show()Markus Elfring-1/+1
A string which did not contain a data format specification should be put into a sequence. Thus use the corresponding function “seq_puts”. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <> Signed-off-by: Pablo Neira Ayuso <>
2019-07-04netfilter: rename nf_SYNPROXY.h to nf_synproxy.hPablo Neira Ayuso-1/+1
Uppercase is a reminiscence from the iptables infrastructure, rename this header before this is included in stable kernels. Signed-off-by: Pablo Neira Ayuso <>
2019-06-29net: make skb_dst_force return true when dst is refcountedFlorian Westphal-1/+5
netfilter did not expect that skb_dst_force() can cause skb to lose its dst entry. I got a bug report with a skb->dst NULL dereference in netfilter output path. The backtrace contains nf_reinject(), so the dst might have been cleared when skb got queued to userspace. Other users were fixed via if (skb_dst(skb)) { skb_dst_force(skb); if (!skb_dst(skb)) goto handle_err; } But I think its preferable to make the 'dst might be cleared' part of the function explicit. In netfilter case, skb with a null dst is expected when queueing in prerouting hook, so drop skb for the other hooks. v2: v1 of this patch returned true in case skb had no dst entry. Eric said: Say if we have two skb_dst_force() calls for some reason on the same skb, only the first one will return false. This now returns false even when skb had no dst, as per Erics suggestion, so callers might need to check skb_dst() first before skb_dst_force(). Signed-off-by: Florian Westphal <> Signed-off-by: David S. Miller <>
2019-06-28Merge git:// S. Miller-80/+95
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter fixes for net: 1) Fix memleak reported by syzkaller when registering IPVS hooks, patch from Julian Anastasov. 2) Fix memory leak in start_sync_thread, also from Julian. 3) Fix conntrack deletion via ctnetlink, from Felix Kaechele. 4) Fix reject for ICMP due to incorrect checksum handling, from He Zhe. ==================== Signed-off-by: David S. Miller <>