Branch :
| Author | Commit | Date | CI | Message |
|---|---|---|---|---|
| 49f05fab | 2025-02-24 09:40:01 | Refactor LRO turn off code Its easier to turn off LRO via ioctl calls inside of several hardware and pseudo interfaces. Thus, we avoid manipulating internal data structures form the outside and avoid unnecessary reinitializations. Tested by bluhm@ OK bluhm@ | ||
| 45a54130 | 2025-02-21 22:21:20 | Move kassert from resolve to add case in rtrequest(). In case RTM_RESOLVE there is already an assertion about ifa_ifp != NULL. Move it down after the fallthrough to cover also RTM_ADD. This should give a better hint from syzkaller what is going wrong. Reported-by: syzbot+f77fe03091e5efd9aaf9@syzkaller.appspotmail.com OK claudio@ | ||
| 9a71ea36 | 2025-02-21 06:20:12 | replace "if (!task_del) taskq_barrier" with "taskq_del_barrier". as per src/sys/kern/kern_task.c r1.36, it's possible for a task to be re-added while it's currently running. in this situation the "if (!task_del)" skips the barrier but doesn't do anything about the currently running code, which taskq_del_barrier properly handles. ---------------------------------------------------------------------- | ||
| 4039bfa0 | 2025-02-17 20:31:25 | Handle RTF_GATEWAY route with rt_gwroute NULL. rtrequest_delete() calls rt_putgwroute() to set rt_gwroute to NULL. When another thread holds a reference to such a route, an assertion failed in rtisvalid() and rt_getll(). Handle this case, rt_getll() may return NULL then. OK claudio@ | ||
| 3411e7e2 | 2025-02-16 11:39:28 | Revert SMR protection of rt_gwroute. Using a smr_barrier() in rt_putgwroute() slows down adding routes. This is the hot path for BGP router. Syncing the FIB is now taking ages and the system is close to unrespnsive in that time. found by claudio@ | ||
| 51442b8a | 2025-02-14 13:14:13 | add tunneldf support to sec(4) sec(4) is a very thin wrapper around the existing ipsec output processing for encapsulating packets, and inherited the behaviour that the DF flag was propagated from the encapsulated packet to the outer ip header. this means if the sec(4) interface has a large mtu and is carrying packets with DF set over a network that can't transport large(r) packets, these packets are effectively dropped. ipsec applied via the SPD copes with this by having SAs figure out the path mtu and using that when applying policy, but sec(4) is an interface, so the network stack uses the interface mtu rather than the associated SA path mtu. rfc4459 discusses this kind of problem has offers a variety of solutions. this implements one of the simpler options, which is to allow the tunnel endpoints to manage the DF regardless of the payload and reassemble the encapsulated packets. to actually do this, ipsec output packet processing has to be able to take an argument that says how you want DF to be handled. in the future we're going to look at how we can use the path mtu determined by the ipsec SA to try and implement one of the other solutions from the RFC, which is to signal the lower mtu to the sources of tunnelled packets. tested by and ok claudio@ | ||
| dddbedba | 2025-02-13 21:01:34 | Fix route entry race when accessing rt_gwroute. Kassert in rt_getll() was triggered as rt_gwroute could be NULL. Problem was introduced by shared netlock around tcp_timer_rexmt(). PMTU discovery calls rtrequest_delete() which was missing proper locking around rt_gwroute. As rt_getll() is called by ARP and ND6 resolve in the hot path, use SMR to provide the pointer to rt_gwroute lockless. Reference count of the returned route is incremented, caller has to free it. Modifying rt_gwroute or rt_cachecnt in rt_putgwroute() is protected by per route lock. OK mvs@ | ||
| 910ed27a | 2025-02-05 18:29:17 | Limit net.bpf.maxbufsize sysctl(8) to a value that malloc(9) can handle. Introduce MALLOC_MAX definition to keep this value in sync and use it system wide. Reported-by: syzbot+3b7e5274349f7165bf5f@syzkaller.appspotmail.com ok claudio bluhm | ||
| e7387209 | 2025-02-03 09:44:30 | The previous missed release of the reference counter of the pipex session. Also, check the source address on PPPoE as well. ok mvs CVSe ---------------------------------------------------------------------- | ||
| bec1a366 | 2025-02-03 08:58:52 | Limit RX queue of loopback interfaces with 8192 packets. Unlimited queues allow to reach mbufs limit and make network unusable on some architectures. Based on diff proposed by dlg@, but limits only loopback interfaces. Tested by bluhm, additional arm64 tests by kirill. ok bluhm | ||
| 9915416f | 2025-02-01 21:10:02 | Fix pf fragment hole count. Fragment reassembly finishes when no holes are left in the fragment queue. In certain overlap conditions, the hole counter was wrong and pf(4) created an incomplete IP packet. Before adjusting the length, remove the overlapping fragment from the queue and insert it again afterwards. pf_frent_remove() and pf_frent_insert() adjust the hole counter automatically. bug reported and fix tested by Lucas Aubard with Johan Mazel, Gilles Guette and Pierre Chifflier; OK claudio@ | ||
| 1d90c3fb | 2025-01-30 14:40:50 | Get rid of unused `so' argument in sbspace(). No functional changes. ok bluhm | ||
| f08653c5 | 2025-01-25 14:51:34 | wg(4) logging enhancement. * Updated wg(4) debug logging to use log(9) instead of printf(9) * Logging now includes IP addresses of remote endpoints From Lloyd <ng2d68 at proton dot me> ok sthen kirill | ||
| d1df5f10 | 2025-01-25 10:53:36 | Fix if_getgrouplist() mistype made in previous commit. Found and reported by anton@ | ||
| f168c03c | 2025-01-25 02:06:40 | Check the source address for the tunneled packets. ok mvs | ||
| eb64c487 | 2025-01-24 09:19:07 | Move interface groups copyout(9)s out of netlock within ifioctl_get(). The interface groups use complicated linking scheme with special data structures allowing to link multiple interfaces with multiple groups. We can't use iterators here, because some path are netlock covered and we can't sleep in refcnt_finalize(9). We also can't use double locking to protect this linking data because this new lock will cover very wide area in kernel. Link desired interface groups or interfaces from group into temporary lists, protected by new dedicated `if_tmplist_lock' rwlock(9). Bump the reference counter to make concurrent destruction thread wait until temporary linked data became unused. Delivered data are immutable, so netlock required only while filling temporary lists. ok bluhm | ||
| 53d74c0c | 2025-01-24 09:16:55 | Move copyout(9) out of netlock within sysctl_source(). Netlock required only to store data to local variable, the rest could be done lockless. Use union of sockaddr_in and sockaddr_in6 as temporary buffer. ok bluhm | ||
| f6224b7e | 2025-01-21 17:40:57 | Copy if_data stuff to ifnet descriptor. 'if_data' structure contains interface data like type or baudrate mixed with statistics counters. Each interface descriptor 'ifnet' contains the instance of 'if_data' for deliver to the userland by request. It is not clean which lock protects this data. Some old drivers rely on kernel lock, some rely on net lock, ifioctl() relies on both. Moreover, network interface could have per-cpu counters, but in such case both `if_data' counters and per-cpu counters will be used by some pseudo drivers like bridge(4). Copy 'if_data' stuff into 'ifnet' descriptor to separate interface counters from the rest data. This separation allows to start using of consistent locking for such data. Note, non per-cpu counters represented as array and accessed in the per-cpu counters style to unify future usage paths. ok bluhm | ||
| 129cf7bd | 2025-01-19 03:27:27 | make BIOCSWTIMEOUT work with kq events. makes sense jmatthew@ | ||
| 8b2d8634 | 2025-01-16 17:20:23 | Move some copyout()s within ifioctl_get() out of shared netlock. UVM releases exclusive netlock while going to swap, but it can't determine is shared netlock held or not. ifioctl_get() does read-only access, so it could follow sogetopt() way. The copyout()s under shared netlock kept for ifgroup stuff, I will fix this separately. ok bluhm | ||
| a4cc1f24 | 2025-01-15 06:15:44 | let pppoe data packets go through if_vinput instead of the pppoeinq. provide pppoe_vinput() for ether input to call. if the packet is for data in an established pppoe session, it can push it straight into the stack with if_vinput. otherwise pppoe_vinput returns the packet to ether_input, which can queue it for processing with the existing code. this should improve throughput and reduce jitter for pppoe input, and there's some evidence that it reduces packet loss. tested by maurice janssen and myself. | ||
| 0e5b9f78 | 2025-01-09 18:20:29 | Replace bcopy() with memcpy() in route_peeraddr(). from dhill@; OK mvs@ | ||
| 91b74865 | 2025-01-07 05:36:52 | Delete ether_frm_control() which just returned EOPNOTSUPP: pru_control() does that automatically when pr_usrreqs.pru_control is NULL and there are no current plans to add ioctls() on this. ok dlg@ | ||
| 476a4da7 | 2025-01-05 12:36:48 | Retire PR_MPSOCKET flag. TCP socket layer is MP safe for more than a week now. That means all protocols with pr_usrreqs have the PR_MPSOCKET flag. Remove PR_MPSOCKET and use the logic that was used when set. OK mvs@ | ||
| 84d9c64a | 2025-01-03 21:27:40 | Use atomic operations to modify the MTU of route. When unlocking TCP, path MTU discovery will run in parallel. To keep route MTU consistent, make access to rt_mtu atomic. Use compare-and-swap function to detect whether another thread is modifying the MTU field. In this case skip updating rt_mtu. OK mvs@ | ||
| b9ae17a0 | 2024-12-30 02:46:00 | All the device and file type ioctl routines just ignore FIONBIO, so stop calling down into those layer from fcntl(F_SETFL) or ioctl(FIONBIO) and delete the "do nothing for this" stubs in all the *ioctl routines. ok dlg@ | ||
| 7c6c9ed7 | 2024-12-27 10:15:09 | Unlock ah_sysctl() and ipcomp_sysctl(). Both are atomically accessed `ah_enable' and `ipcomp_enable' booleans and per-CPU counters based statistics. esp_sysctl() is much more system wide, so unlock it separately. ok bluhm | ||
| 535d4cde | 2024-12-26 10:15:27 | Make access to tcp_mssdflt atomic. To further unlock TCP sysctl, we need atomic access to tcp_mssdflt. pf(4) is reading the value multiple times. Better read it once and pass mssdflt down the call stack. In pf_calc_mss() was a potential integer underflow. Use the signed variant imax(9) and imin(9) like it has been fixed it in TCP stack. OK mvs@ | ||
| e541a7ae | 2024-12-18 02:25:30 | go back to r1.326, before i fiddled with packet generation and bpf. i've had a couple of reports of redundant firewalls misbehaving since these changes, so until i can figure out what's wrong i'm backing them out. reported by hrvoje popovski and mark patruck | ||
| 2fbde403 | 2024-12-18 01:56:05 | let LLDP packets fall through to being handled on the port interfaces. 802.1ax says that LLDP packets sent to the multicast groups listed in 802.1ab (the lldp spec) should be treated as "control frames" so they can be processed by an lldp agent on physical interface. in our situation that means we shouldn't aggregate LLDP packets so they appear to enter the system on aggr(4) interfaces, we should let the physical port interfaces handle them. this will allow AF_FRAME sockets listening on aggr port interfaces receive lldp packets. jmatthew@ says it looks good. | ||
| 6fb93e47 | 2024-12-15 11:00:05 | add an AF_FRAME socket domain and an IFT_ETHER protocol family under it. this allows userland to use sockets to send and receive Ethernet frames. as per the upcoming frame.4 man page: frame protocol family sockets are designed as an alternative to bpf(4) for handling low data and packet rate communication protocols. Rather than filtering every frame entering the system before the network stack like bpf(4), the frame protocol family processing avoids this overhead by running after the built in protocol handlers in the kernel. For this reason, it is not possible to handle IPv4 or IPv6 packets with frame protocol sockets because the kernel network stack consumes them before the receive handling for frame sockets is run. if you've used udp sockets then these should feel much the same. my main motivation is to implement an lldp agent in userland, but without having to have bpf look at every packet when lldp happens every minute or two. the only feedback i had was positive, so i'm putting it in ok claudio@ | ||
| 09380e00 | 2024-12-11 04:22:41 | get rid of code for an extra DLT_LOOP bpf attachment. pfsync doesnt know the source address in IP packets before it calls ip_output, so the extra bpf attachment has a distorted view of what IP packets are being sent anyway. you can tcpdump on the pfsync syncdev if you want to see what will be on the wire. | ||
| 780061c3 | 2024-12-11 04:18:52 | fix pfsync_encap to cope with pfsync_sendout changes. problem noticed by hrvoje popovski | ||
| b9b60940 | 2024-12-04 18:20:46 | Unlock gre_sysctl(). Both `gre_allow' and `gre_wccp' are atomically accessed integers. They could have only '0' and '1' values, so no extra dances around atomic_load_int(9) required. ok bluhm | ||
| c6b373c6 | 2024-11-26 10:42:58 | let bpf pick the first attached dlt when attaching to an interface. this is instead of picking the lowest numbered dlt, which was done to make bpf more predictable with interfaces that attached multiple DLTs. i think the real problem was that bpf would keep the list in the reverse order of attachment and would prefer the last dlt. interfaces that attach multiple DLTs attach ethernet first, which is what you want the majority of the time anyway. but letting bpf pick the first one means drivers can control which dlt they want to default to, regardless of the numeric id behind a dlt. ok claudio@ | ||
| a4f86d2e | 2024-11-20 02:18:45 | provide ifq_deq_set_oactive. ifq_deq_set_oactive is a variation on ifq_set_oactive that can be called inside an if_deq_begin "transaction". afresh@ found de(4) was calling ifq_set_oactive while holding the ifq mutex via ifq_deq_begin, which led to a panic because ifq_set_oactive also tries to take the ifq mutex. ifq_deq_set_oactive assumes the caller is already holding the mutex. de(4) is confusing, so it seemed simpler to add a small tweak to ifqs than try and do major surgery on such a hairy driver. tested by afresh@ | ||
| 1511e544 | 2024-11-19 23:26:35 | use a tailq for the global list of bpf_if structs. this replaces a hand rolled list that's been here since 1.1. ok claudio@ kn@ tb@ | ||
| ddee6534 | 2024-11-19 02:11:03 | fix tcpdump on pfsync interfaces. after the last rewrite i was showing bpf ip packets, not the pfsync payload like the PFSYNC DLT expected. this also lets bpf see packets being processed by pfsync input handling, so if you want to see only what's being sent you'll need to filter by direction. reported by Marc Boisis | ||
| 2e7772af | 2024-11-17 23:31:01 | bump the "mru" up to MAXMCLBYTES. there's no reason to limit tun/tap to small packets. ok claudio@ | ||
| 05ecc67f | 2024-11-17 23:21:45 | include tun_hdr in the length reported by FIONREAD and kq if it's enabled. | ||
| a921796a | 2024-11-17 12:21:48 | make sure bpfsdetach is holding a bpf_d ref when invalidating stuff. when bpfsdetach is called by an interface being destroyed, it iterates over the bpf descriptors using the interface and calls vdevgone and klist_invalidate against them. however, i'm not sure the reference the interface holds against the bpf_d is accounted for properly, so vdevgone might drop it to 0 and free it, which makes the klist_invalidate a use after free. avoid this by taking a bpf_d ref before calling vdevgone and klist_invalidate so the memory can't be freed out from under the feet of bpfsdetach. Reported-by: syzbot+b3927f8ad162452a2f39@syzkaller.appspotmail.com i wasn't able to reproduce whatever syzkaller did. it's possible this is a double free, but we'll wait and see if it pops up again. ok mpi@ | ||
| 1c386161 | 2024-11-17 00:25:07 | provide network offloads between the kernel and userland again userland can request that network packets that are read from or written to the device special file get prepended with a "tun_hdr" struct. this struct contains bits which say what offloads are requested for the packet, including things like ip/tcp/udp/icmp checksums, tcp segmentation offloads, or ethernet vlan tags. userland can write a packet with any of these offloads requested into the kernel at any time, but has to request which ones it's able to handle coming from the kernel. enabling the tun_hdr struct and which offloads userland can handle is done with a new TUNSCAP ioctl. this is based on the virtio_net_hdr in linux, which jan@ actually implemented and had working with vmd. however, claudio@ and i strongly opposed to what feels like a layer violation by pulling virtio structures into the tun driver, and then trying to emulate virtio/linux semantics in our network stack, and playing catch up when the "upstream" projects decide to change the shape or meaning of these bits. tun_hdr is specific to the openbsd network stack and it's semantics, which simplifies our kernel implementation. jan has been pretty gracious about the extra work on the vmd side of things. tested by and ok jan@ ok claudio@ sthen@ backed this out cos of confusion with the ioctl numbers i picked to controlling this feature. i've picked new numbers that don't conflict this time. | ||
| d43f9610 | 2024-11-14 13:47:38 | revert tun(4) changes for now, breaks in kdump build (TUNSCAP/TIOCEXT clash) tb@ agrees | ||
| 67a47b08 | 2024-11-14 01:51:57 | provide a way to negotiate network offloads between the kernel and userland. userland can request that network packets that are read from or written to the device special file get prepended with a "tun_hdr" struct. this struct contains bits which say what offloads are requested for the packet, including things like ip/tcp/udp/icmp checksums, tcp segmentation offloads, or ethernet vlan tags. userland can write a packet with any of these offloads requested into the kernel at any time, but has to request which ones it's able to handle coming from the kernel. enabling the tun_hdr struct and which offloads userland can handle is done with a new TUNSCAP ioctl. this is based on the virtio_net_hdr in linux, which jan@ actually implemented and had working with vmd. however, claudio@ and i strongly opposed to what feels like a layer violation by pulling virtio structures into the tun driver, and then trying to emulate virtio/linux semantics in our network stack, and playing catch up when the "upstream" projects decide to change the shape or meaning of these bits. tun_hdr is specific to the openbsd network stack and it's semantics, which simplifies our kernel implementation. jan has been pretty gracious about the extra work on the vmd side of things. tested by and ok jan@ ok claudio@ | ||
| 42a2f8b7 | 2024-11-12 04:14:51 | bump the type used to specify traffic queue bandwidth to 64bit. this should let people specify interface and queue bandwidths greater than ~4Gbit. this changes the pf ioctls used to specify queues, so if you want to try this you'll need a new kernel, new headers, and a new pfctl (and systat). or upgrade using a snapshot. the effort and benefit of providing compat isn't worth it. putting it in now so people can kick it around. | ||
| 9618b9b7 | 2024-11-09 04:09:56 | remove unused ifq_is_serialized() missed when the prototype was removed in ifq.h rev 1.25 ok dlg@ | ||
| 0d48c46c | 2024-11-08 13:22:09 | pf(4) when doing af-to translation for ICMP protocol sends packets with TTL field to zero. To fix it function pf_test_state_icmp() must initialize ttl field in pf_pdesc structure for inner packet. feedback from bluhm@ OK bluhm@ | ||
| 67d75188 | 2024-11-04 00:13:15 | remove unused inline function; ok dlg@ | ||
| f5210b0e | 2024-11-01 02:07:14 | remove unused local variable | ||
| c0619985 | 2024-10-31 12:33:11 | Rewrite mbuf handling in wg(4). . Use m_align() to ensure that mbufs are packed towards the end so that additional headers don't require costly m_prepends. . Stop using m_copyback(), the way it was used there was actually wrong, instead just use memcpy since this is just a single mbuf. . Kill all usage of m_calchdrlen(), again this is not needed or can simply be m->m_pkthdr.len = m->m_len since all this code uses a single buffer. . In wg_encap() remove the min() with t->t_mtu when calculating plaintext_len and out_len. The code does not correctly cope with this min() at all with severe consequences. Initial diff by dhill@ who found the m_prepend() issue. Tested by various people. OK dhill@ mvs@ bluhm@ sthen@ | ||
| da76ba4d | 2024-10-31 11:41:31 | Drop forgotten backslashes within vxlan_input(). Seems they are stalled from macro copy-paste. No functional changes. ok mpi dlg | ||
| 5641e477 | 2024-10-29 23:57:54 | move hfsc to using nanoseconds for keeping times. before it was using 256000000 things per second, so this isn't a huge change, but it can use nsecuptime() to get the time. kjc and cheloa like it ok claudio@ | ||
| 5873c738 | 2024-10-29 23:25:45 | use nsecuptime instead of using nanouptime and doing a bunch of maths. ok claudio@ | ||
| 7bbcf947 | 2024-10-22 22:05:17 | correct argument to klist_free(); ok visa@ mvs@ | ||
| defbe25c | 2024-10-17 05:02:12 | remove unneeded if_wg.h and pfsync.h includes | ||
| 21537d41 | 2024-10-16 11:12:31 | cut tun_init() out, it does pointless work. tun_init turns interface/stack config into a set of flags that tun(4) keeps in tun_softc sc_flags, but never uses. ok miod@ kn@ | ||
| cca0aa06 | 2024-10-16 11:03:55 | remove SIOCSIFDSTADDR from the network ioctls. netintro says it's deprecated, and most of our other drivers are doing fine without it. ok miod@ kn@ patrick@ | ||
| ff46e7d6 | 2024-10-15 00:41:40 | remove struct arpreq from net/if_arp.h unused since "rewrite to merge arp and routing tables" in CSRG if_ether.c 7.14 (Berkeley) 06/25/91 used by SIOCSARP, SIOCGARP, SIOCDARP, OSIOCGARP ioctls in Net/2 which were removed before 4.4BSD-Lite ok sthen@ who tested this with a ports build | ||
| c5093773 | 2024-10-13 00:53:21 | remove unneeded limits.h and errno.h includes | ||
| 8a978b4c | 2024-10-12 23:31:14 | remove unneeded rwlock.h include | ||
| e6796ada | 2024-10-12 23:18:10 | remove unneeded time.h include | ||
| 2cc786cd | 2024-10-12 23:10:07 | remove unneeded percpu.h include | ||
| 2d7d7ba6 | 2024-10-10 06:50:58 | neuter the tun/tap ioctls that try and modify interface flags. historically there was just tun(4) that supported both layer 3 p2p and ethernet modes, but had to be reconfigured at runtime by userland to properly change the interface type and interface flags. this is obviously not a great idea, mostly because a lot of stack behaviour around address management makes assumptions based on these parameters, and changing them at runtime confuses things. splitting tun so ethernet was handled by a specific tap(4) driver was a first step at locking this down. this takes a further step by restricting userlands ability to reconfigure the interface flags, specifically IFF_BROADCAST, IFF_MULTICAST, and IFF_POINTOPOINT. this change lets userland pass those values via the ioctls, but only if they match the current set of flags on the interface. these flags are set appropriate for the type of interface when it's created, but should not be changed afterwards. nothing in base uses these ioctls, so the only fall out will be from ports doing weird things. ok claudio@ kn@ | ||
| b985d824 | 2024-09-27 00:38:49 | Previous pipex.c,v 1.155 was broken if the client was not behind a NAT. ok mvs | ||
| 479c151d | 2024-09-20 02:00:46 | remove unneeded semicolons; checked by millert@ | ||
| efb6c398 | 2024-09-09 07:37:47 | Don't take netlock while setting `if_description'. net/if_pppx.c is the only place where `if_description' accessed outside ifioctl() path and there is no reason to take netlock here. SIOCSIFDESCR case of ifioctl() modifies `if_description' with the only kernel lock. ok bluhm | ||
| 845086ff | 2024-09-07 22:41:55 | fix RBT_ENTRY in pf_state and pf_state_key ok sashan@ | ||
| 9593dc34 | 2024-09-04 07:54:51 | Fix some spelling. Input and ok jmc@, jsg@ | ||
| 54fbbda3 | 2024-09-01 03:08:56 | spelling; checked by jmc@, ok miod@ mglocker@ krw@ | ||
| 45a062b0 | 2024-08-31 04:17:14 | add rport(4) for p2p l3 connectivity between route domains. you can basically plug rdomains together and route between them over rport interfaces. people keep asking me if this is so you can leak routes between rdomains, and the answer is yes. this is like pair(4) but cheaper because it avoids all the mucking around with putting an ethernet header on the mbuf just to take it off again later, and is more efficient with address space because it's a p2p ip interface. it has a small tweak from mvs@ ok denis@ claudio@ | ||
| 938c962f | 2024-08-27 13:52:41 | remove some dead code that wasn't cleaned up ok sashan | ||
| 02e922b0 | 2024-08-20 07:47:25 | Unlock etherip_sysctl(). - ETHERIPCTL_ALLOW - atomically accessed integer; - ETHERIPCTL_STATS - per-CPU counters ok bluhm | ||
| 895cab01 | 2024-08-17 09:52:11 | Allow PPP interface to run in an rdomain and get a default route installed in the same routing domain Input and OK claudio@ | ||
| e88074f0 | 2024-08-15 12:20:20 | add BIOCSETFNR, which is like BIOCSETF but doesnt reset the buffer or stats. from Matthew Luckie <mjl@luckie.org.nz> via tech@ deraadt@ likes it. | ||
| 34cc435a | 2024-08-12 17:02:58 | Prepare bpf_sysctl() for upcoming net_sysctl() unlocking. Both NET_BPF_MAXBUFSIZE and NET_BPF_BUFSIZE (`bpf_maxbufsize' and `bpf_bufsize' respectively) are atomically accessed integers. No locks required to modify them. ok bluhm | ||
| c80589b8 | 2024-08-06 16:56:09 | Unlock sysctl net.inet.ip.directed-broadcast. ip_directedbcast is read once in either ip_input() or pf_test() during packet processing. So writing the variable does not need net lock. OK mvs@ | ||
| 2293e682 | 2024-08-05 23:56:10 | restrict the maximum wait time you can set via BIOCSWTIMEOUT to 5 minutes. this is avoids passing excessively large values to timeout_add_nsec. Reported-by: syzbot+f650785d4f2b3fe28284@syzkaller.appspotmail.com | ||
| 270a6ceb | 2024-08-05 17:47:29 | Fix bridging IPv6 fragments with pf reassembly. Sending IPv6 fragments over a bridge with pf did not work. During input pf reassembles the packet, and at bridge output it should be refragmented. This is only done for PF_FWD direction, but bridge(4) and veb(4) called pf_test() with PF_OUT argument. OK sashan@ | ||
| 5e1af158 | 2024-07-30 13:41:15 | Exports the statistics when PIPEXDSESSION. Found by ymatsui at iij. ok mvs | ||
| 8fd4ba52 | 2024-07-26 15:51:09 | Mark ipsecflowinfo immutable. ok mvs | ||
| 207bd73a | 2024-07-26 15:45:31 | In pipex_l2tp_input(), check if ipsecflowinfo is not changed instead of updating it blindly. ok mvs | ||
| 26723e1a | 2024-07-23 20:04:51 | Accept and ignore SADB_X_EXT_REPLAY and SADB_X_EXT_COUNTER payloads for incoming SADB_ADD and SADB_UPDATE message. Since we send them as part of the SADB_GET reply we must also accept them on SADB_ADD/UPDATE as sasyncd will forward payloads previously received in SADB_GET. Fixes a bug where sasync can't restore SAs because pfkey returns EINVAL. From Rafa\xc5\x82 Ramocki ok bluhm@ | ||
| 862c3389 | 2024-07-18 14:46:28 | In pfattach() pass malloc type instead of flags to cpumem_malloc(). from markus@ | ||
| 1f9e444e | 2024-07-14 18:53:39 | Unlock IPv6 sysctl net.inet6.ip6.forwarding from net lock. Use atomic operations to read ip6_forwarding while processing packets in the network stack. To make clear where actually the router property is needed, use the i_am_router variable based on ip6_forwarding. It already existed in nd6_nbr. Move i_am_router setting up the call stack until all users are independent. The forwarding decisions in pf_test, pf_refragment6, ip6_input do also not interfere. Use a new array ipv6ctl_vars_unlocked to make transition of all the integer sysctls easier. Adapt IPv4 to the new style. OK mvs@ | ||
| ac42138b | 2024-07-12 17:20:18 | Switch `so_snd' of udp(4) sockets to the new locking scheme. udp_send() and following udp{,6}_output() do not append packets to `so_snd' socket buffer. This mean the sosend() and sosplice() sending paths are dummy pru_send() and there is no problems to simultaneously run them on the same socket. Push shared solock() deep down to sesend() and take it only around pru_send(), but keep somove() running unedr exclusive solock(). Since sosend() doesn't modify `so_snd' the unlocked `so_snd' space checks within somove() are safe. Corresponding `sb_state' and `sb_flags' modifications are protected by `sb_mtx' mutex(9). Tested and OK bluhm. | ||
| 2c9f5b1f | 2024-07-12 09:25:27 | Run sysctl net.inet.ip.forwarding without net lock. The places in packet processing where ip_forwarding is evaluated have been consolidated. The remaining pieces in pf test, ip input, and icmp input do not need consistent information. If the integer value is changed by another CPU, it is harmless. The sysctl syscall sets the value atomically, so add atomic read in network processing and remove the net lock in sysctl IPCTL_FORWARDING. OK claudio@ mvs@ | ||
| 28c60e63 | 2024-07-04 12:50:08 | Implement IPv6 forwarding IPsec only. IPsec gateways set the forwarding sysctl to 2. While this worked for IPv4 since a long time, adapt this feature for IPv6 now. Set sysctl net.inet6.ip6.forwarding=2 to forward only packets that have been processed by IPsec. Set IPV6_FORWARDING_IPSEC in ip6_input() and pass the flag down to the call stack. This provides consistent view on global variable ip6_forwarding. In ip6_output() or ip6_forward() drop packets that do not match the policy. OK denis@ | ||
| 0e25137a | 2024-07-02 18:33:47 | Read IPsec forwarding information once. Fix MP race between reading ip_forwarding in ip_input() and checking ip_forwarding == 2 in ip_output(). In theory ip_forwarding could be 2 during ip_input() and later 0 in ip_output(). Then a packet would be forwarded that was never allowed. Currently exclusive netlock in sysctl(2) prevents all races. Introduce IP_FORWARDING_IPSEC and pass it with the flags parameter that was introduced for IP_FORWARDING. Instead of calling m_tag_find(), traversing the list, and comparing with NULL, just check the PACKET_TAG_IPSEC_IN_DONE bit. Reading ipsec_in_use in ip_output() is a performance hack that is not necessary. New code only checks tree bits. OK mvs@ | ||
| da5607f6 | 2024-06-26 01:40:49 | return type on a dedicated line when declaring functions ok mglocker@ | ||
| e7c2d835 | 2024-06-22 10:22:29 | remove space between function names and argument list | ||
| e9494cab | 2024-06-21 12:51:29 | My earlier commit [1.1169 of pf.c (2023/01/05)] makes pf(4) to report wrong rule and anchor number when packet matches rule found and anchor depth 2 and more. The issue has been noticed and reported by Giannis Kapetanakis (billias _at_ edu.physics.uoc.gr), who also co-developed and tested the final fix presented in this commit. To fix the issue pf(4) must also remember the anchor where matching rule belongs while rules are traversed to find a match for given packet. The information on anchor is now kept in anchor stack frame.w OK sthen@ | ||
| ab457133 | 2024-06-20 19:25:42 | Read IPv6 forwarding value only once while processing a packet. IPv4 uses IP_FORWARDING to pass down a consistent value of net.inet.ip.forwarding down the stack. This is needed for unlocking sysctl. Do the same for IPv6. Read ip6_forwarding once in ip6_input_if() and pass down IPV6_FORWARDING as flags to ip6_ours(), ip6_hbhchcheck(), ip6_forward(). Replace the srcrt value with IPV6_REDIRECT flag for consistency with IPv4. To have common syntax with IPv4, use ip6_forwarding == 0 checks instead of !ip6_forwarding. This will also make it easier to implement net.inet6.ip6.forwarding=2 for IPsec only forwarding later. In nd6_ns_input() and nd6_na_input() read ip6_forwarding once and store it in i_am_router. The variable name has been chosen to avoid confusion with is_router, which indicates router flag of the packet. Reading of ip6_forwarding is done independently from ip6_input_if(), consistency does not really matter. One is for ND router behavior the other for forwarding. Again use the ip6_forwarding != 0 check, so when ip6_forwarding IPsec only value 2 gets implemented, it will behave like a router. OK deraadt@ sashan@ florian@ claudio@ | ||
| dfc54264 | 2024-06-14 08:32:22 | Switch AF_ROUTE sockets to the new locking scheme. At sockets layer only mark buffers as SB_MTXLOCK. At PCB layer only protect `so_rcv' with corresponding `sb_mtx' mutex(9). SS_ISCONNECTED and SS_CANTRCVMORE bits are redundant for AF_ROUTE sockets. Since SS_CANTRCVMORE modifications performed with both solock() and `sb_mtx' held, the 'unlocked' SS_CANTRCVMORE check in rtm_senddesync() is safe. ok bluhm | ||
| 1a9be797 | 2024-06-09 16:25:27 | Introduce IFCAP_VLAN_HWOFFLOAD for vio(4). Add IFCAP_VLAN_HWOFFLOAD to signal hardware like vio(4) can handle checksum or TSO offloading with inline VLAN tags. tested by Mark Patruck, sf@ and bluhm@ ok sf@ and bluhm@ | ||
| 84b2c343 | 2024-06-07 18:24:16 | Read IP forwarding variables only once. Do not assume that ip_forwarding and ip_directedbcast cannot change while processing one packet. Read it once and pass down its value with a flag. This is necessary for unlocking the sysctl path. There are a few places where a consistent value does not really matter, they are unchanged. Use a proper ip_ prefix for the global variable. OK claudio@ | ||
| baaf996c | 2024-06-07 13:43:21 | remove ph_ppp_proto define, unused since rev 1.123 | ||
| 528abb89 | 2024-05-29 00:48:14 | remove prototypes with no matching function | ||
| 6064c65c | 2024-05-24 06:38:41 | pfsync must let to progress state for destination peer The issue has been noticed by matthieu@ when he was chasing cause of excessive pfsync traffic between firewall boxes. When comparing content of state tables between primary and backup firewall the backup firewall showed many states as follows: ESTABLISHED:SYN_SENT FIN_WAIT_2:SYN_SENT * :SYN_SENT this is caused by pfsync_upd_tcp() which fails to update TCP-state for destination connection peer, so it remains stuck in SYN_SENT. matthieu@ confirms diff helps with 'stuck-state'. It also seems to help with excessive pfsync traffic. ok @dlg | ||
| 150eb84f | 2024-05-17 19:02:04 | Switch AF_KEY sockets to the new locking scheme. The simplest case. Nothing to change in sockets layer, only set SB_MTXLOCK on socket buffers. ok bluhm | ||
| 1adf4b76 | 2024-05-17 18:58:26 | Fix uninitialized memory access in pfkeyv2_sysctl(). pfkeyv2_sysctl() reads the SA type from uninitialized memory if it is not provided by the caller of sysctl(2) because of a missing length check. From Carsten Beckmann. ok bluhm |