path: root/net
AgeCommit message (Collapse)AuthorLines
2011-10-10gro: refetch inet6_protos[] after pulling ext headersYan, Zheng-0/+1
ipv6_gro_receive() doesn't update the protocol ops after pulling the ext headers. It looks like a typo. Signed-off-by: Zheng Yan <> Acked-by: Eric Dumazet <> Signed-off-by: David S. Miller <>
2011-10-06bridge: leave carrier on for empty bridgestephen hemminger-3/+0
This resolves a regression seen by some users of bridging. Some users use the bridge like a dummy device. They expect to be able to put an IPv6 address on the device with no ports attached. Although there are better ways of doing this, there is no reason to not allow it. Note: the bridge still will reflect the state of ports in the bridge if there are any added. Signed-off-by: Stephen Hemminger <> Signed-off-by: David S. Miller <>
2011-10-05netfilter: Use proper rwlock init functionThomas Gleixner-1/+1
Replace the open coded initialization with the init function. Signed-off-by: Thomas Gleixner <> Acked-by: Hans Schillstrom <> Signed-off-by: David S. Miller <>
2011-10-04tcp: properly update lost_cnt_hint during shiftingYan, Zheng-3/+1
lost_skb_hint is used by tcp_mark_head_lost() to mark the first unhandled skb. lost_cnt_hint is the number of packets or sacked packets before the lost_skb_hint; When shifting a skb that is before the lost_skb_hint, if tcp_is_fack() is ture, the skb has already been counted in the lost_cnt_hint; if tcp_is_fack() is false, tcp_sacktag_one() will increase the lost_cnt_hint. So tcp_shifted_skb() does not need to adjust the lost_cnt_hint by itself. When shifting a skb that is equal to lost_skb_hint, the shifted packets will not be counted by tcp_mark_head_lost(). So tcp_shifted_skb() should adjust the lost_cnt_hint even tcp_is_fack(tp) is true. Signed-off-by: Zheng Yan <> Signed-off-by: David S. Miller <>
2011-10-04tcp: properly handle md5sig_pool referencesYan, Zheng-7/+12
tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers only hold one reference to md5sig_pool. but tcp_v4_md5_do_add() increases use count of md5sig_pool for each peer. This patch makes tcp_v4_md5_do_add() only increases use count for the first tcp md5sig peer. Signed-off-by: Zheng Yan <> Signed-off-by: David S. Miller <>
2011-10-04Merge git:// Torvalds-34/+40
* git:// pch_gbe: Fixed the issue on which a network freezes pch_gbe: Fixed the issue on which PC was frozen when link was downed. make PACKET_STATISTICS getsockopt report consistently between ring and non-ring net: xen-netback: correctly restart Tx after a VM restore/migrate bonding: properly stop queuing work when requested can bcm: fix incomplete tx_setup fix RDSRDMA: Fix cleanup of rds_iw_mr_pool net: Documentation: Fix type of variables ibmveth: Fix oops on request_irq failure ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket cxgb4: Fix EEH on IBM P7IOC can bcm: fix tx_setup off-by-one errors MAINTAINERS: tehuti: Alexander Indenbaum's address bounces dp83640: reduce driver noise ptp: fix L2 event message recognition
2011-10-03make PACKET_STATISTICS getsockopt report consistently between ring and non-ringWillem de Bruijn-1/+4
This is a minor change. Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS, ...) would return total and dropped packets since its last invocation. The introduction of socket queue overflow reporting [1] changed drop rate calculation in the normal packet socket path, but not when using a packet ring. As a result, the getsockopt now returns different statistics depending on the reception method used. With a ring, it still returns the count since the last call, as counts are incremented in tpacket_rcv and reset in getsockopt. Without a ring, it returns 0 if no drops occurred since the last getsockopt and the total drops over the lifespan of the socket otherwise. The culprit is this line in packet_rcv, executed on a drop: drop_n_acct: po->stats.tp_drops = atomic_inc_return(&sk->sk_drops); As it shows, the new drop number it taken from the socket drop counter, which is not reset at getsockopt. I put together a small example that demonstrates the issue [2]. It runs for 10 seconds and overflows the queue/ring on every odd second. The reported drop rates are: ring: 16, 0, 16, 0, 16, ... non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74. Note how the even ring counts monotonically increase. Because the getsockopt adds tp_drops to tp_packets, total counts are similarly reported cumulatively. Long story short, reinstating the original code, as the below patch does, fixes the issue at the cost of additional per-packet cycles. Another solution that does not introduce per-packet overhead is be to keep the current data path, record the value of sk_drops at getsockopt() at call N in a new field in struct packetsock and subtract that when reporting at call N+1. I'll be happy to code that, instead, it's just more messy. [1] [2] Signed-off-by: Willem de Bruijn <> Signed-off-by: David S. Miller <>
2011-09-29Merge branch 'for-linus' of git:// Torvalds-42/+48
* 'for-linus' of git:// libceph: fix pg_temp mapping update libceph: fix pg_temp mapping calculation libceph: fix linger request requeuing libceph: fix parse options memory leak libceph: initialize ack_stamp to avoid unnecessary connection reset
2011-09-29can bcm: fix incomplete tx_setup fixOliver Hartkopp-27/+21
The commit aabdcb0b553b9c9547b1a506b34d55a764745870 ("can bcm: fix tx_setup off-by-one errors") fixed only a part of the original problem reported by Andre Naujoks. It turned out that the original code needed to be re-ordered to reduce complexity and to finally fix the reported frame counting issues. Signed-off-by: Oliver Hartkopp <> Signed-off-by: David S. Miller <>
2011-09-29RDSRDMA: Fix cleanup of rds_iw_mr_poolJonathan Lallinger-4/+9
In the rds_iw_mr_pool struct the free_pinned field keeps track of memory pinned by free MRs. While this field is incremented properly upon allocation, it is never decremented upon unmapping. This would cause the rds_rdma module to crash the kernel upon unloading, by triggering the BUG_ON in the rds_iw_destroy_mr_pool function. This change keeps track of the MRs that become unpinned, so that free_pinned can be decremented appropriately. Signed-off-by: Jonathan Lallinger <> Signed-off-by: Steve Wise <> Signed-off-by: David S. Miller <>
2011-09-29ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socketYan, Zheng-0/+3
ipv6_ac_list and ipv6_fl_list from listening socket are inadvertently shared with new socket created for connection. Signed-off-by: Zheng Yan <> Signed-off-by: David S. Miller <>
2011-09-29can bcm: fix tx_setup off-by-one errorsOliver Hartkopp-6/+7
This patch fixes two off-by-one errors that canceled each other out. Checking for the same condition two times in bcm_tx_timeout_tsklet() reduced the count of frames to be sent by one. This did not show up the first time tx_setup is invoked as an additional frame is sent due to TX_ANNONCE. Invoking a second tx_setup on the same item led to a reduced (by 1) number of sent frames. Reported-by: Andre Naujoks <> Signed-off-by: Oliver Hartkopp <> Signed-off-by: David S. Miller <>
2011-09-28libceph: fix pg_temp mapping updateSage Weil-26/+24
The incremental map updates have a record for each pg_temp mapping that is to be add/updated (len > 0) or removed (len == 0). The old code was written as if the updates were a complete enumeration; that was just wrong. Update the code to remove 0-length entries and drop the rbtree traversal. This avoids misdirected (and hung) requests that manifest as server errors like [WRN] client4104 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 Signed-off-by: Sage Weil <>
2011-09-28libceph: fix pg_temp mapping calculationSage Weil-13/+21
We need to apply the modulo pg_num calculation before looking up a pgid in the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes (some) misdirected requests that result in messages like [WRN] client4104 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11 on the server and stall make the client block without getting a reply (at least until the pg_temp mapping goes way, but that can take a long long time). Reorder calc_pg_raw() a bit to make more sense. Signed-off-by: Sage Weil <>
2011-09-27Merge branch 'for-davem' of git:// S. Miller-1/+4
2011-09-27Merge branch 'master' of git:// ↵John W. Linville-1/+4
into for-davem
2011-09-27ipv6-multicast: Fix memory leak in IPv6 multicast.Ben Greear-1/+3
If reg_vif_xmit cannot find a routing entry, be sure to free the skb before returning the error. Signed-off-by: Ben Greear <> Signed-off-by: David S. Miller <>
2011-09-27ipv6: check return value for dst_allocMadalin Bucur-1/+3
return value of dst_alloc must be checked before use Signed-off-by: Madalin Bucur <> Signed-off-by: David S. Miller <>
2011-09-27net: check return value for dst_allocMadalin Bucur-4/+6
return value of dst_alloc must be checked before use Signed-off-by: Madalin Bucur <> Signed-off-by: David S. Miller <>
2011-09-27ipv6-multicast: Fix memory leak in input path.Ben Greear-1/+3
Have to free the skb before returning if we fail the fib lookup. Signed-off-by: Ben Greear <> Signed-off-by: David S. Miller <>
2011-09-27Merge branch 'batman-adv/maint' of git:// S. Miller-5/+5
2011-09-22batman-adv: do_bcast has to be true for broadcast packets onlyAntonio Quartulli-5/+5
corrects a critical bug of the GW feature. This bug made all the unicast packets destined to a GW to be sent as broadcast. This bug is present even if the sender GW feature is configured as OFF. It's an urgent bug fix and should be committed as soon as possible. This was a regression introduced by 43676ab590c3f8686fd047d34c3e33803eef71f0 Signed-off-by: Antonio Quartulli <> Signed-off-by: Marek Lindner <>
2011-09-21cfg80211: Fix validation of AKM suitesJouni Malinen-1/+4
Incorrect variable was used in validating the akm_suites array from NL80211_ATTR_AKM_SUITES. In addition, there was no explicit validation of the array length (we only have room for NL80211_MAX_NR_AKM_SUITES). This can result in a buffer write overflow for stack variables with arbitrary data from user space. The nl80211 commands using the affected functionality require GENL_ADMIN_PERM, so this is only exposed to admin users. Cc: Signed-off-by: Jouni Malinen <> Signed-off-by: John W. Linville <>
2011-09-21xfrm: Perform a replay check after return from async codepathsSteffen Klassert-0/+5
When asyncronous crypto algorithms are used, there might be many packets that passed the xfrm replay check, but the replay advance function is not called yet for these packets. So the replay check function would accept a replay of all of these packets. Also the system might crash if there are more packets in async processing than the size of the anti replay window, because the replay advance function would try to update the replay window beyond the bounds. This pach adds a second replay check after resuming from the async processing to fix these issues. Signed-off-by: Steffen Klassert <> Acked-by: Herbert Xu <> Signed-off-by: David S. Miller <>
2011-09-21fib:fix BUG_ON in fib_nl_newrule when add new fib ruleGao feng-2/+2
add new fib rule can cause BUG_ON happen the reproduce shell is ip rule add pref 38 ip rule add pref 38 ip rule add to goto 38 ip rule del pref 38 ip rule add to goto 38 ip rule add pref 38 then the BUG_ON will happen del BUG_ON and use (ctarget == NULL) identify whether this rule is unresolved Signed-off-by: Gao feng <> Signed-off-by: Eric Dumazet <> Signed-off-by: David S. Miller <>
2011-09-20ipv6: fix a possible double freeRoy Li-2/+2
When calling snmp6_alloc_dev fails, the snmp6 relevant memory are freed by snmp6_alloc_dev. Calling in6_dev_finish_destroy will free these memory twice. Double free will lead that undefined behavior occurs. Signed-off-by: Roy Li <> Acked-by: Eric Dumazet <> Signed-off-by: David S. Miller <>
2011-09-20Merge branch 'master' of ssh://infradead/~/public_git/wireless into for-davemJohn W. Linville-9/+11
2011-09-19Merge branch 'for-3.1' of git:// W. Linville-9/+8
2011-09-18tcp: fix validation of D-SACKZheng Yan-1/+1
D-SACK is allowed to reside below snd_una. But the corresponding check in tcp_is_sackblock_valid() is the exact opposite. It looks like a typo. Signed-off-by: Zheng Yan <> Acked-by: Eric Dumazet <> Signed-off-by: David S. Miller <>
2011-09-18Merge git:// Torvalds-142/+187
* git:// (62 commits) ipv6: don't use inetpeer to store metrics for routes. can: ti_hecc: include linux/io.h IRDA: Fix global type conflicts in net/irda/irsysctl.c v2 net: Handle different key sizes between address families in flow cache net: Align AF-specific flowi structs to long ipv4: Fix fib_info->fib_metrics leak caif: fix a potential NULL dereference sctp: deal with multiple COOKIE_ECHO chunks ibmveth: Fix checksum offload failure handling ibmveth: Checksum offload is always disabled ibmveth: Fix issue with DMA mapping failure ibmveth: Fix DMA unmap error pch_gbe: support ML7831 IOH pch_gbe: added the process of FIFO over run error pch_gbe: fixed the issue which receives an unnecessary packet. sfc: Use 64-bit writes for TX push where possible Revert "sfc: Use write-combining to reduce TX latency" and follow-ups bnx2x: Fix ethtool advertisement bnx2x: Fix 578xx link LED bnx2x: Fix XMAC loopback test ...
2011-09-17ipv6: don't use inetpeer to store metrics for routes.Yan, Zheng-11/+22
Current IPv6 implementation uses inetpeer to store metrics for routes. The problem of inetpeer is that it doesn't take subnet prefix length in to consideration. If two routes have the same address but different prefix length, they share same inetpeer. So changing metrics of one route also affects the other. The fix is to allocate separate metrics storage for each route. Signed-off-by: Zheng Yan <> Signed-off-by: David S. Miller <>
2011-09-16IRDA: Fix global type conflicts in net/irda/irsysctl.c v2Andi Kleen-6/+6
The externs here didn't agree with the declarations in qos.c. Better would be probably to move this into a header, but since it's common practice to have naked externs with sysctls I left it for now. Cc: Signed-off-by: Andi Kleen <> Signed-off-by: David S. Miller <>
2011-09-16net: Handle different key sizes between address families in flow cachedpward-14/+17
With the conversion of struct flowi to a union of AF-specific structs, some operations on the flow cache need to account for the exact size of the key. Signed-off-by: David Ward <> Signed-off-by: David S. Miller <>
2011-09-16ipv4: Fix fib_info->fib_metrics leakYan, Zheng-1/+9
Commit 4670994d(net,rcu: convert call_rcu(fc_rport_free_rcu) to kfree_rcu()) introduced a memory leak. This patch reverts it. Signed-off-by: Zheng Yan <> Signed-off-by: David S. Miller <>
2011-09-16caif: fix a potential NULL dereferenceEric Dumazet-1/+5
Commit bd30ce4bc0b7 (caif: Use RCU instead of spin-lock in caif_dev.c) added a potential NULL dereference in case alloc_percpu() fails. caif_device_alloc() can also use GFP_KERNEL instead of GFP_ATOMIC. Signed-off-by: Eric Dumazet <> CC: Sjur Brændeland <> Acked-by: Sjur Brændeland <> Signed-off-by: David S. Miller <>
2011-09-16sctp: deal with multiple COOKIE_ECHO chunksMax Matveev-0/+11
Attempt to reduce the number of IP packets emitted in response to single SCTP packet (2e3216cd) introduced a complication - if a packet contains two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the socket while processing first COOKIE_ECHO and then loses the association and forgets to uncork the socket. To deal with the issue add new SCTP command which can be used to set association explictly. Use this new command when processing second COOKIE_ECHO chunk to restore the context for SCTP state machine. Signed-off-by: Max Matveev <> Signed-off-by: David S. Miller <>
2011-09-16wireless: Fix rate mask for scan requestRajkumar Manoharan-0/+2
The scan request received from cfg80211_connect do not have proper rate mast. So the probe request sent on each channel do not have proper the supported rates ie. Cc: Reviewed-by: Johannes Berg <> Signed-off-by: Rajkumar Manoharan <> Signed-off-by: John W. Linville <>
2011-09-16wireless: Reset beacon_found while updating regulatoryRajkumar Manoharan-0/+1
During the association, the regulatory is updated by country IE that reaps the previously found beacons. The impact is that after a STA disconnects *or* when for any reason a regulatory domain change happens the beacon hint flag is not cleared therefore preventing future beacon hints to be learned. This is important as a regulatory domain change or a restore of regulatory settings would set back the passive scan and no-ibss flags on the channel. This is the right place to do this given that it covers any regulatory domain change. Cc: Reviewed-by: Luis R. Rodriguez <> Signed-off-by: Rajkumar Manoharan <> Acked-by: Luis R. Rodriguez <> Signed-off-by: John W. Linville <>
2011-09-16libceph: fix linger request requeuingSage Weil-3/+1
The r_req_lru_item list node moves between several lists, and that cycle is not directly related (and does not begin) with __register_request(). Initialize it in the request constructor, not __register_request(). This fixes later badness (below) when OSDs restart underneath an rbd mount. Crashes we've seen due to this include: [ 213.974288] kernel BUG at net/ceph/messenger.c:2193! and [ 144.035274] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 [ 144.035278] IP: [<ffffffffa036c053>] con_work+0x1463/0x2ce0 [libceph] Signed-off-by: Sage Weil <>
2011-09-16libceph: fix parse options memory leakNoah Watkins-0/+1
ceph_destroy_options does not free opt->mon_addr that is allocated in ceph_parse_options. Signed-off-by: Noah Watkins <> Signed-off-by: Sage Weil <>
2011-09-16libceph: initialize ack_stamp to avoid unnecessary connection resetJim Schutt-0/+1
Commit 4cf9d544631c recorded when an outgoing ceph message was ACKed, in order to avoid unnecessary connection resets when an OSD is busy. However, ack_stamp is uninitialized, so there is a window between when the message is sent and when it is ACKed in which handle_timeout() interprets the unitialized value as an expired timeout, and resets the connection unnecessarily. Close the window by initializing ack_stamp. Signed-off-by: Jim Schutt <> Signed-off-by: Sage Weil <>
2011-09-16Merge branch 'master' of ../netdev/David S. Miller-37/+39
2011-09-15Merge branch 'master' of git:// ↵John W. Linville-1/+1
into for-davem
2011-09-15net: don't clear IFF_XMIT_DST_RELEASE in ether_setupnhorman-1/+1
d88733150 introduced the IFF_SKB_TX_SHARING flag, which I unilaterally set in ether_setup. In doing this I didn't realize that other flags (such as IFF_XMIT_DST_RELEASE) might be set prior to calling the ether_setup routine. This patch changes ether_setup to or in SKB_TX_SHARING so as not to inadvertently clear other existing flags. Thanks to Pekka Riikonen for pointing out my error Signed-off-by: Neil Horman <> Reported-by: Pekka Riikonen <> CC: "David S. Miller" <> Acked-by: Eric Dumazet <> Signed-off-by: David S. Miller <>
2011-09-15net: copy userspace buffers on device forwardingMichael S. Tsirkin-5/+25
dev_forward_skb loops an skb back into host networking stack which might hang on the memory indefinitely. In particular, this can happen in macvtap in bridged mode. Copy the userspace fragments to avoid blocking the sender in that case. As this patch makes skb_copy_ubufs extern now, I also added some documentation and made it clear the SKBTX_DEV_ZEROCOPY flag automatically instead of doing it in all callers. This can be made into a separate patch if people feel it's worth it. Signed-off-by: Michael S. Tsirkin <> Signed-off-by: David S. Miller <>
2011-09-15net: Make flow cache namespace-awaredpward-1/+4
flow_cache_lookup will return a cached object (or null pointer) that the resolver (i.e. xfrm_policy_lookup) previously found for another namespace using the same key/family/dir. Instead, make the namespace part of what identifies entries in the cache. As before, flow_entry_valid will return 0 for entries where the namespace has been deleted, and they will be removed from the cache the next time flow_cache_gc_task is run. Reported-by: Andrew Dickinson <> Signed-off-by: David Ward <> Signed-off-by: David S. Miller <>
2011-09-15net/can/af_can.c: Change del_timer to
This is important for SMP platform to check if timer function is executing on other CPU with deleting the timer. Signed-off-by: Rajan Aggarwal <Rajan Aggarwal> Acked-by: Oliver Hartkopp <> Signed-off-by: David S. Miller <>
2011-09-15tcp: Change possible SYN flooding messagesEric Dumazet-49/+33
"Possible SYN flooding on port xxxx " messages can fill logs on servers. Change logic to log the message only once per listener, and add two new SNMP counters to track : TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client TCPReqQFullDrop : number of times a SYN request was dropped because syncookies were not enabled. Based on a prior patch from Tom Herbert, and suggestions from David. Signed-off-by: Eric Dumazet <> CC: Tom Herbert <> Signed-off-by: David S. Miller <>
2011-09-15pkt_sched: cls_rsvp.h was outdatedIgor Maravić-14/+13
File cls_rsvp.h in /net/sched was outdated. I'm sending you patch for this file. [ tb[] array should be indexed by X not X-1 -DaveM ] Signed-off-by: Igor Maravić <> Signed-off-by: David S. Miller <>
2011-09-15Bluetooth: Fix timeout on scanning for the second timeOliver Neukum-9/+8
The checks for HCI_INQUIRY and HCI_MGMT were in the wrong order, so that second scans always failed. Signed-off-by: Oliver Neukum <> Signed-off-by: Gustavo F. Padovan <>