Clamp tcp mss mikrotik что это
Перейти к содержимому

Clamp tcp mss mikrotik что это

  • автор:

 

Sidebar

© Daryll Swer and daryllswer.com, 2023. Unauthorized use and/or duplication of this material without express and written permission from this site’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Daryll Swer and daryllswer.com with appropriate and specific direction to the original content.

Disclaimer

This is my personal site, all content and media found here are not affiliated with any organisation or my employer but myself. All content is my own unless otherwise specified.

Recent Posts

Categories

Archives

Home » Edge Router & BNG Optimisation Guide for ISPs

Edge Router & BNG Optimisation Guide for ISPs

Last updated on 23 February 2023

It would be appreciated if you could help me continue to provide valuable network engineering content by supporting my non-profit solitary efforts. Your donation will help me conduct valuable experiments. Click here to donate now.

Introduction

This guide provides configuration instructions for MikroTik RouterOS, but the principles can be applied to other Network Operating Systems (NOSes) as well. The guide will be updated regularly as new technologies, use cases, and more efficient configurations are discovered.

Many ISPs around the globe use MikroTik RouterOS to provide access to their customers via BNGs over PPPoE and for various other roles such as edge routers. In this guide, I will explore common issues and solutions along with best practices.

This guide is also available on the APNIC Blog, but it is not frequently updated there. I recommend you follow the source here for the most up-to-date information.

A brief history of this project

  • The configuration was first tested and deployed on AS135756 (small-sized ISP) with its proprietor and my peer Mr. Varun Singhania.
  • In 2021-22, I tested the configuration further as a downstream customer on AS132559 (IP Transit provider & medium-sized ISP), where I was able to assess the impact and config changes both as an end-user and a consultant.
  • As of 2022-23 I tested the configuration on my own network (AS149794), including the firewall rules, to ensure it would work in any environment as long as the instructions are followed. The tests confirmed that the configuration does not disrupt layer 4 protocols or cause problems for end-users in the last mile.

A few things to keep in mind

  • RouterOS is based on the Linux Kernel. As of RouterOS v7.7 it still uses legacy iptables for packet filtering instead of nftables, which has a negative impact on performance.
  • The guide will be focused on RouterOS v7 as it is the current version of RouterOS.
  • This guide assumes the reader has a basic understanding of typical use cases and technologies/protocols used in an ISP/Telco production environment.
  • This guide focuses on layer 2-4 configuration (and occasionally up to layer 7) by following various RFCs and BCOPs. It is not a network architecture guide, for which Kevin Myers’s guide is recommended.
  • Most (virtually everything) on this article has been tested on RouterOS v7.6 (stable + 7.6 RouterBOARD firmware).

Basic Router Terminology and overview

  • An edge or border router is an inter-AS router that is used for connecting different networks, such as transit, IXP, or PNIs.
    • It is important to keep an edge router stateless i.e. without connection tracking (stateful firewall filter rules or NAT), to avoid performance issues and vulnerability to DDoS attacks.
    • Do not use an edge router for customer delegation, as it will become stateful.
    • Do not confuse an edge router with a Provider Edge router, which is an MPLS-specific terminology.
    • However, some people may incorrectly refer to an edge router as a core router due to linguistic, cultural reasons, or misinformation.

    General Configuration Changes

    Below are the general guidelines that should be applied on all MikroTik devices for optimal performance and security.

    • Upgrade RouterOS and the RouterBOARD firmware to the latest stable (or long-term if available) v7 releases, Use this command to enable firmware auto upgrade: “/system routerboard settings set auto-upgrade=yes”. Remember to reboot the router twice after the RouterOS upgrade to ensure firmware gets automatically upgraded.
    • Implement basic security measures, including reverse path filtering and enabling TCP SYN cookies, for which the latter two are found in IP>Settings.
      • For rp-filter use loose mode when a device is behind asymmetric routing or when in doubt, use strict mode when a device is behind symmetric routing.

      IPv6 Router Advertisements (RA) are used for SLAAC and in MikroTik it is called Neighbor Discovery (ND) which is a bit confusing as ND is an umbrella encompassing various protocols and behaviours and not only RAs.

      IPv6 RA (ND) is enabled by default for all interfaces on RouterOS. This should be disabled to prevent sending RAs randomly out of interfaces that you do not use SLAAC on and for security reasons such as preventing someone from receiving an IPv6 address by connecting a host to a specific port or VLAN along with reducing unnecessary BUM traffic in your network. We disable it using this command:
      /ipv6 nd set [ find default=yes ] disabled=yes

      You can enable IPv6 RA on a per-interface basis as and when required, i.e. if you set “advertise=yes” for an interface via IPv6>Address, then you need to configure RA/ND for that interface like the example below:
      /ipv6 nd add interface=Management_VLAN

      Interface Lists

      Interface lists help us simplify firewall rule management by enabling us to refer to an entire list in a single rule instead of multiple rules for every interface.

      An interface list should only contain layer 3 (L3) interfaces which is an interface with IP addressing attached to it, such as a physical port, L3 sub-interface VLAN, L3 bonding interface or GRE interface.

      The following are basic guidelines for which lists to create and what should be included on those lists:

      • WAN” interface list should contain those interfaces used for connecting to transit, PNI, IXP, upstream peering.
      • LAN” interface list should contain those interfaces used for downstream connectivity to your retail customers or IP Transit customers etc. You should include “dynamic” interfaces to account for PPPoE clients on BNGs.
      • Intra-AS” interface list should contain those interfaces used for connecting one device to another device within the same network such as redundant connectivity between two routers horizontally.
      • Management” interface list should contain those interfaces used exclusively for management.
      • Do not add bridge members individually into any list as they are purely Layer 2 (L2) interfaces.

      It is however, important to note: When you are using bridges (which is discussed later in this article), the interface placements depend on how you set up the bridge – If you’re using a single bridge with physical/bonding interfaces as bridge members without any VLAN configuration, then the bridge will be a member of “LAN”. But if you are using VLANs on top of the bridge, then place the VLANs into their appropriate LAN/Intra-AS/Management list based on your local network topology. For example:
      “Management VLAN” will be in the management list, or VLAN123 will be in the “intra-AS” or “LAN” list.

      Figure-1 (LAN Include Dynamic)

      Connection Tracking

      • Disable connection tracking on the edge router and enable loose TCP tracking on all routers using the following commands:
        “/ip firewall connection tracking set enabled=no”
        “/ip firewall connection tracking set loose-tcp-tracking=yes”
      • Use the recommended connection tracking timeout values to improve stability and performance, especially for UDP traffic like VoIP and gaming. If necessary, upgrade the router’s RAM to accommodate these values.

      Figure-2 (Recommended Connection Tracking Timeout Values)

      Miscellaneous

      • Give the router an accurate system clock by enabling the Network Time Protocol (NTP) client and specifying a reliable NTP server such as this example:
        “/system ntp client set enabled=yes server-dns-names=time.cloudflare.com”

      To ensure reliable network performance, it is essential to configure the MTU consistently across all devices in the path in both L2 and L3. Inconsistent MTU configurations can result in dropped frames or strange behaviours. Additionally, it is essential to minimize IP fragmentation, properly deploy RFC4638, and ensure PMTUD is working for both IPv4 and IPv6. This will help to ensure reasonable auto-detected TCP MSS negotiation values.

      Jumbo frames are ideally the way to go about MTU configuration as it’s future-proofing your network for whatever protocols you may throw at it. You should encourage your provider, peers, and customers to also configure jumbo frames on their network.

      Bigger frames = more data per frame, meaning less frames required to transmit data, less CPU/resource utilisation required as packets per second flow will decrease.

      Guidelines

      Layer 2 MTU

      L2 MTU, also known as the “underlay MTU” should be configured to the maximum supported value on physical interfaces such as Ethernet ports, SFP and wireless interfaces. This applies to any networking hardware, including routers, switches, and hypervisors. The maximum supported value may vary by vendor or model, but that is okay as the L3 MTU will handle the actual packet size negotiation.

      However, it is important to note that, you must ensure the interfaces all have consistently maximum values to minimise the number of MTU profiles on the device – The switch chip or ASIC has limited support for n number of MTU profiles which if exceeded could hurt performance.

      By properly configuring the L2 MTU, you can run any protocol you want (such as VXLAN, MPLS, VPLS, or WireGuard) and still have an MTU far greater than 1500 for layer 3 packets, thereby avoiding fragmentation completely on the overlay intra-as.
      Example:

      • Edge router (L2 MTU 10k) > BNG (L2 MTU 10k) > Switch (L2 MTU 10k) > Wireless AP (L2 MTU 2290) > Customer (L2 MTU for WAN will be 2290 as it is the smallest in the path)
      • Edge router (L2 MTU 9216k) > BNG (L2 MTU 9216k) > Switch (L2 MTU 9k) > OLT (L2 MTU 9k) > Customer (L2 MTU for WAN will be 9k as it is the smallest in the path)

      Layer 3 MTU

      Configure it to the maximum allowed value on all interfaces including physical ports. If there is any L2 overhead, such as on a layer 3 sub-interface VLAN, the system will automatically subtract from the underlay and will show us the subtracted L2 MTU, for which you can just copy & paste that into the L3 MTU parameter.

      The basic gist of this is, we use the maximum allowed L3 MTU on intra-AS interfaces and even inter-AS.

      This allows your downstream transit customers to talk to your network and your customers in jumbo frames – For which, you should inform your customer if you’ve enabled jumbo frames for them, their L3 MTU must match your L3 MTU.

      But if for example, you are configuring an interface towards your transit or IXP, then you should ask your provider if they support >1500 MTU and configure accordingly. Some transit providers and IXPs supports 9000 MTU, so we take advantage of that when possible.

      Some things to be careful of:

      • If using Stacked VLANs (QinQ), both S and C VLANs should have equal L3 MTU.
      • If your customer equipment does not support high jumbo frame sizes greater than 9000, then simply configure your L3 MTU to match theirs, which is usually 9000 (downstream customer port<>your port).
      • Edge router (L3 MTU 10k) > BNG (L3 MTU 10k) > Switch (L3 MTU 10k) > Wireless AP (L3 MTU 2290) > Customer (L3 MTU for WAN will be 2290 as it is the smallest in the path)
      • Edge router (L3 MTU 9216k) > BNG (L3 MTU 9216k) > Switch (L3 MTU 9k) > OLT (L3 MTU 9k) > Customer (L3 MTU for WAN will be 9k as it is the smallest in the path)

      MTU Scripts

      You can automate the MTU configuration using the scripts below. Please run each one separately as I didn’t put delays in between preventing synchronisation, but be mindful to manually configure L2, L3 MTU and advertised L2 MTU for VPLS/Other PPP interfaces.

      The screenshots below are only for reference as they are of obsolete advice of mine with 1600 MTU, I will not update the screenshots.

      Figure-3 [Ethernet MTU (Jumbo Frames on L2, L3 = 1500 for WAN and 1600 for LAN)]
      Figure-4 [L3 MTU for Bonding interfaces (L3 = 1500 for WAN and 1600 for LAN)]
      Figure-5 (VLAN L3 MTU = 1600)
      Figure-6 [QinQ (Stacked VLANs) L3 MTU = 1600 on both]

      Linux Bridge Approach

      A Linux bridge is a kernel module that acts as a virtual network switch and is used to forward packets between connected interfaces (also known as bridge ports or members). Many network operators do not follow MikroTik’s official guidelines to properly implement L2/3 using a bridge, which results in degraded performance as hardware offloading and/or bridge Fast Path/Fast Forward becomes unusable along with the inability to perform L2 filtering.

      To maximize performance benefits and give you L2 filtering capabilities, it is recommended by MikroTik to create a single bridge per device with all downstream (and intra-AS) interfaces (physical, LACP bonding etc) as bridge members. Tagged/untagged VLANs and hybrid VLANs can be configured using bridge VLAN filtering. Refer to vendor guidelines for model-specific configuration instructions.

      If you created an LACP bonding interface between two routers (or switches) for redundancy, you can add the bond interface into the same bridge as a bridge member, where in turn either the bridge itself or the L3 sub-interface VLANs will be an interface list member depending on your topology as discussed in the previous interface lists section.

      You can also add your management port to the bridge, and segregate it with VLAN as the other ports, to help keep configuration simple.

      A separate bridge can also be created as a loopback interface without impacting physical interface performance. You can assign the “.0” IPv4 address to this interface along with the “::1” of an IPv6 subnet for management, testing purposes or for using as the loopback IPs with OSPF.

      Below is a sample configuration from a CCR1036 router using MikroTik guidelines along with sample interface lists:

      R/M(STP)

      I will not deep dive into how STP works, as that is outside the scope of a guide post like this one. However, a few quick things to keep in mind:

      • MikroTik allows us to selectively enable/disable STP/BPDU per-port if required. This may be needed in your network with complex layer 2 designs.

      Multicast traffic on the bridge

      I personally had a few challenges with multicast traffic/IGMP Snooping best practices, for which I had to reach out to MikroTik support for some clarity. Below are a few basic guidelines to follow based on what I gathered from MikroTik docs and their support team. This is of utmost importance for networks that makes use of multicast routing and traffic for their IPTV services and similar.

      • Be mindful of IGMP Snooping (and IGMP Proxy/PIM) limitations such as tagged VLAN, and features depending on your local network topology.
      • Keep in mind that IPv6 SLAAC will break if you enable multicast querier, for which, you need RouterOS v7.7 onwards to work around this.
      • In a layer 2 network if you are using IGMP Snooping, it should be enabled on all the bridges (devices) involved.
      • You can also enable IGMP multicast querier on all the bridges, only one will get elected with the rest acting as failover in case a device fails.
      • If you are using PPPoE then there’s no such thing as true multicast, because whilst it may multicast on layer 3, it will not be true multicast on layer 2 due to the nature of PPPoE which is a tunnel over layer 2. If you are using DHCP (preferably) or IPoE, then this issue does not apply.

      Prefix size for PTP links

      I have noticed a lot of operators talking about how short they are on IPv4 addresses – Yet for unknown reasons they like to waste 2 extra addresses for every PTP or inter-router link by using a /30. Please, stop doing that and start using /31s for PTP links as per RFC3021.

      However, RouterOS v6+v7 does not support /31 natively, the following is how we do it.

      Example below:
      Prefix: 103.176.189.0/31

      As per RFC6164, it is advised to use /127s on PTP links to avoid various forms of network attacks described in the RFC.

      However, for ease of management and subnetting, I would advise not to subnet longer (smaller) than a /64. Say you have a /48 that you’d like to use for backbone/core/PTP, subnet it directly to /64s out of which, you can use a /127 from each /64 per PTP link i.e. a /64 is reserved for only one PTP link – This ensures there’s room for growth in the future in case your link/network grows and a /127 is no longer sufficient.

      Note that on MikroTik, /127s do not work with BGP for unknown reasons and hence the longest prefix size we can use would be a /126.

      Example below:
      Prefix: 2400:7060::/126

      However, if you look closely, you might’ve noticed that I avoided using the initial zeroes leading interface ID “2400:7060::/126″ and instead used “2400:7060::1/126″. The reason for this is, that in some routers, using the “::” (all leading zeroes) interface ID (address) on a link could cause strange behaviours.

      Routing loops with RFC6890 space

      I have observed that in most of the networks, including my own personal home lab (AS149794), I find a lot of traffic where source IP = my end hosts or CPE WAN IP (either it is CGNAT IP or public IP), but destination IP = unused RFC6890 blocks. This is why I (and MikroTik themselves) created a forward rule to drop RFC6890 from escaping to WAN.

      Now let us step back and think about this: The majority of the ISPs do not implement these filter rules, which means that traffic from customers whereby dst-IP=RFC6890 is forwarded from their CPE to the BNGs, and from there the underlying L3/L2 paths will carry it all the way to the edge router, where further, goes towards your transit or peers if there is a default route. If there is no default route or more specific route for any given dst-IP matching RFC6890 blocks, it would simply loop back and forth until the TTL expires, which means wasted resources, CPU and bandwidth when your network is at scale and you have thousands of customers. So in order to solve this with a quick fix, I derived a simple yet effective solution – Route RFC6890 blocks to blackhole.

      We route all RFC6890 space to black hole directly on the edge routers for well edge cases, but we will also do the same on the BNGs directly.

      It will not impact your use of the private space for any given interface/servers etc – Because remember, more specific prefixes always win and hence your private /24s etc will always be preferred over the less specific /10 for example and hence will be accessible. Someone on the MikroTik forum has discussed this a bit, in the past.

      For BNG

      Note: For 2023 and going forward it is recommended to migrate to FQ_Codel to minimise bufferbloat end-to-end on your network. I will share further details in the coming year as you need to deploy it on your entire network backbone for maximum efficiency, not just on BNG or customer simple queues.

      There have been decades-long debates on which algorithm to use, and which method to implement the best possible QoS mechanism.

      In my testing, I observed the following:

      • Capping on a per-customer basis using a single simple queue worked best
      • As for the algorithm of choice
        • I pick SFQ due to the observed low jitter/bufferbloat phenomenon on the customer side
        • Keep in mind, high bufferbloat = bad, low bufferbloat = good

        I have not included a screenshot for every algorithm as that’s unnecessary, but the test scenario was simple, SFQ compared to the rest of the algorithms, and the result was SFQ gave the best possible bufferbloat score in my testing.

        Figure-7 (Simple Queue + PFIFO resulted in high bufferbloat)
        Figure-8 (Simple Queue + SFQ resulted in low bufferbloat)

        PPPoE

        Issues

        • Packet fragmentation due to non-standard 1500 MTU/MRU
          • Typically, ISPs use 1492 or 1480 or some other strange MTU size
          • Both BNG device and customer router need to make use of hacks like TCP MSS Clamping to work around this
          • PMTUD is simply unreliable as per RFC 8900
            • Gets worse with CGNAT because remote end-points cannot determine the MTU of your PPPoE customer behind it
            • Most assume that using a single profile for different PPPoE Servers running on different interfaces will work fine

            Solutions

            • The real long term solution is to migrate to DHCP to completely avoid all performance and MTU issues that are exclusively only an issue on PPPoE and similar encapsulation protocols.
            • Deploy RFC 4638
              • Keep in mind that in a network, MTU affects the whole path of L2/L3 devices whether physical or virtual, as long as you follow the MTU section above, you should be good
              • Simply set MTU and MRU to 1500 inside PPPoE Server on the BNG
                • However, if you are interested in the whole jumbo frames to your peers/PNI/IXP etc – You can configure MTU/MRU to fixed 9000 bytes, the reason for 9000 nytes for inter-AS traffic is explained here
                  • In order for this to work correctly you need to strictly follow the MTU section
                  • If using Wireless APs, then it would 2290-8=2282 bytes

                  Figure-9 (PPPoE Server MTU/MRU & TCP MSS Clamping config)

                  • Disable (and delete!) TCP MSS Clamping rules inside IP>Firewall>Mangle
                    • Why set some arbitrary value when you can let the engine determine automatically to ensure optimal performance?
                      • MikroTik has long since allowed automatic TCP MSS Clamping
                      • Make use of PPP>Profile>Default* to enable TCP MSS Clamping directly on the PPPoE engine. This will do the work for any customer whose MTU/MRU is less than 1500.
                      • On the customer side, not all routers can take advantage of RFC4638, such as TP-Link, Tenda etc. For them, MTU will remain capped at 1492.
                        • The 1492 limitation on their end won’t cause issues with packet fragmentation as packets would fragment at the source (their routers) before it exits the interface and hits the BNG and TCP Clamping on PPPoE engine takes care of anything coming in from the outside world toward the customer
                        • I have observed 1500 MRU when pinging from the outside world. Suggesting some of these consumer routers support 1500 MRU
                        • If they are using MikroTik, pfSense, VyOS etc, they can take advantage of RFC4638 aka 1500 MTU/MRU for their PPPoE Client
                        • Some ONT/ONU devices have strange behaviour for MTU negotiation where they simply do not allow RFC4638 to work (even in bridge mode), only a few brands like GX, TP-Link, and Huawei have been found to be flawless in my personal testing.

                        Verify MTU config

                        If you have properly configured MTU and MSS Clamping as per the steps above, then you should see the following results when testing from customer-side using this tool:

                        Figure-10 (MTU and TCP MSS correctly working on the internet)

                        Extra Note on PPPoE

                        • Create a single CGNAT pool on a per BNG basis and you can use it for n Number of PPPoE Servers on n number of interfaces
                          /ip pool
                          add name=CGNAT_Pool comment=»100.64.0.0-9 is reserved for each PPPoE Server Gateway/Profile» ranges=100.64.0.10-100.127.255.255
                          • Here we are reserving 100.64.0.0-9 for gateway IPs on a per-interface/PPPoE server basis, assuming we only have 10 VLANs/Interfaces
                            • Reserve as per your local requirements
                            • One common mistake is using the router’s public IP from the WAN interface as the local address, which I’ve seen could lead to issues like traceroute failures or some strange packet loss, you should be using an address that does not exist in IP>Address
                            • Each PPPoE Server needs unique profile/gateway in order to allow inter-VLAN communication between CPEs (which is needed to allow two customers behind a NATted IP to play a P2P Xbox game with each other on different VLANs) and will also ensure a clean network approach
                              • If you have 100 PPPoE Servers, there should be 100 unique PPP Profiles with unique local addresses for each

                              CGNAT

                              Issues

                              • The majority of ISPs are using RFC1918 subnets for CGNAT and can clash with subnets on the customer site
                              • Breaks P2P traffic
                              • Kills the end-to-end principle
                              • Requires proper NAT traversal for various protocols including IPsec
                              • Routing Loops will occur for any traffic coming from the outside destined towards the public IP pools

                              Solutions

                              • Make use of the 100.64.0.0/10 subnet as it’s meant for CGNAT usage to prevent clashing on the customer site
                              • Enable the NAT traversal Helpers on the Router like the following inside IP>Firewall>Service Ports

                              Figure-11 (NAT Traversal Helpers on RouterOS)

                              • Use a simple netmap rule with IPsec passthrough (will allow customers to initiate IPsec outbound without issues) configured.
                              • Use a single NAT rule for all CGNAT customers on a per BNG basis to reduce CPU usage.
                                • /ip firewall nat add
                                  action=netmap chain=srcnat comment=»CGNAT rule» dst-address-list=!not_in_internet ipsec-policy=out,none out-interface-list=WAN src-address-list=cgnat_subnets to-addresses=103.176.189.0/25
                                  • Here cgnat_subnets=address list containing CGNAT subnets aka100.64.0.0/10
                                  • dst-address-list=!not_in_internet is self-explanatory, anything destined towards private subnets shouldn’t be NATted towards WAN
                                    • Customers should be able to talk to each otherusing their CGNAT IP, Xbox makes use of this and is mentioned in RFC 7021. This is equivalent (sort of) to old school days of everyone having a public IP and hence is reachable

                                    Below is what MikroTik support had to say about my port forwarding rules

                                    Figure-12 (MikroTik support suggests my port forwarding rules are correct)

                                    • Avoid Deterministic NAT, the above configuration allows P2P traffic initiated from the inside to be reachable from the outside with various applications that make use of ephemeral ports/UDP NAT punching/STUN etc
                                    • We were able to successfully seed the official Ubuntu Torrent behind the CGNAT with the above configuration, which can mean only one thing: P2P networking from in-bound established works!

                                    Figure-13 (BitTorrent Seeding Behind CGNAT)

                                    • We tried with src nat as action for src NAT chain but it resulted in the NATted public IP constantly changing on the customer side and breaking things

                                    Below is what MikroTik support had to say about netmap vs src nat as action for src nat chain

                                    Figure-14 (Src nat = breaks P2P traffic | Netmap = static mapping per client IP)

                                    • Now we fix routing loops
                                      • We will use DST NAT to account for remaining traffic such as ICMP and NAT it to a loopback interface
                                        • Remember to add the bridge to LAN interface list & add the /31 to lan_subnets address list as well

                                        Subscription Ratio Recommendation

                                         

                                        In my extensive testing and observations, when using the above parameters and steps, I was able to have 200 users behind a /30 without any known complaints from them. BitTorrent worked as expected too, this is likely due to the obvious fact that not all users out of 200 will max out 65k connections and hence use up all the IP:Port combination. Where will you find a CPE that can handle 65k NAT entries anyways?

                                        So tl;dr you can use a /30 per 200 users as long as you follow the steps properly and also to be future-proof and safe, ensure you provide IPv6 as well.

                                        End Result

                                        Figure-15 (Your NAT Table should look as dead simple as this one)

                                        Logging compliances for government and regulatory requirements

                                        For CGNAT logging for compliances purpose, you can use Traffic Flow which also adds additional option for NAT events logging in the configuration.

                                        Issues

                                        • Addressing may not be optimally subnetted/broken down
                                        • ISP may only have something like a single /48 with 5000 customers downstream which exceeds possible /56s out of the /48
                                        • Not following the proper guidelinesfor IPv6 deployment
                                        • Lack of persistent assignment feature on MikroTik
                                          • This applies to the majority of ISPs even though they may use Cisco, Juniper etc which supports persistent assignment configuration

                                          Solutions

                                          • IPv6 address planning and architecture will be covered in a separate article, that’s currently a work-in-progress
                                            • However, the logic is simple
                                              Ensure customers get /64 WAN side and /56 LAN side for home users
                                              Ensure customers get /64 WAN side and /48 LAN side for enterprise/SMEs/DC etc

                                            Now I will cover a simple configuration use-case where a BNG has exactly 1000 customers. The goal here is to ensure that the WAN side of each customer gets a /64 and the LAN side gets a /56.

                                            • Disable redirects
                                              /ipv6 settings set accept-redirects=no

                                            The following struck through text is no longer recommended. Please follow the IPv6 section at the beginning of this article. I will remove these texts in the near future.

                                            • Modify the parameters for Neighbour Discovery Protocol (these values ensure quick discovery)
                                              • /ipv6 nd set [ find default=yes ] ra-interval=30s-1m
                                              • /ipv6 nd prefix default set preferred-lifetime=45m valid-lifetime=1h30m
                                              • Next need to create two separate pools, one for WAN and one for the LAN side of the customer
                                                • /ipv6 pool
                                                  add name=Customer-CPE-LAN prefix=2405:a140:8::/46 prefix-length=56
                                                  add name=Customer-CPE-WAN prefix=2405:a140:f:d400::/54 prefix-length=64
                                                  • Here, prefix-length specifies what prefix length the customer gets, which in this case as per standards, we are giving the WAN side a /64 and the LAN side a /56
                                                  • Remote IPv6 prefix is for the WAN side of the customer
                                                  • DHCPv6 PD Pool is for the LAN side of the customer

                                                  Figure-16 (PPPoE IPv6 configuration)

                                                  That’s it, now the customers will dynamically get a routed /64 and routed /56 for WAN and LAN sides respectively.

                                                  Verify IPv6 config

                                                  If you have properly configured IPv6 as per the steps above, then you should see the following results when testing from customer-side using this tool:

                                                  Figure-17 (IPv6 working correctly)

                                                  Routing Loop prevention

                                                  If a customer happens to go offline (due to power loss etc), traffic destined for those customers will continue to persist until they time out leading to increased CPU usage. To solve this, we simply route aggregated customer prefixes to blackhole – Because remember in routing, more specific prefixes always win, so should those more specific prefixes go offline, the less specific (aggregated) routes take precedence in which case we are routing to blackhole and hence all pending traffic times out with immediate effect to give us optimal CPU usage.

                                                  Firewall/Security

                                                  Issues

                                                  • Blocks inbound ports based on the false logic of “protecting” the customer
                                                    • Port blocking does nothing to improve security, it only breaks legitimate traffic such as apps or games that use various methods for VoIP
                                                    • Malware can make use of port 443 and that is the reality of modern-day malware anyway
                                                    • Such as blocking TCP/UDP traffic destined towards Cloudflare or Google Anycast DNS

                                                    Solutions

                                                    • Remove most “port blocking” rules
                                                      • Customer Site security should be handled on the customer site such as having proper basic firewalling on their Edge Routers
                                                      • I’ve dropped some ports on the RAW table directly
                                                      • Source of truth for ICMPv4 deprecated types
                                                      • Source of truth for ICMPv6 deprecated types

                                                      Below are the generic firewall rules that should be deployed on the BNG to cover basic security grounds.

                                                      IPv4 Firewall

                                                      IPv6 Firewall

                                                      I have now added a rule in the raw table to drop header 0, 43 as per this, now the linked article also suggests dropping header 60, but I decided to not drop header 60 for reasons stated in the re-tweet here – Please note, this only works in ROS v7.4 onwards as there is a bug that was fixed in that version and going forward.

                                                      I have now also removed the forward rules completely to improve performance and moved them to the raw table.

                                                      For Edge Router

                                                      The purpose of the Edge router is to route as fast as possible. So, with that in mind, along with the basic general changes I’ve mentioned at the beginning of this article, the following should also be kept in mind:

                                                      1. No NAT
                                                      2. No connection tracking aka stateful firewalling (filter table on the firewall section)
                                                        • If you enable stateful firewalling on the edge, the router will die in case of DDoS attacks or even just heavy traffic in general
                                                      3. No fancy “features” (like Hotspot, PPPoE)
                                                        • Use your BNG routers for any customer delegation that is required

                                                      BGP Optimisation

                                                      This is a work in progress section and at this point in time, I am writing based on my experience with Indian ISPs, so if you’re in the EU/US or other locations, you’re probably already implementing the following:

                                                      BGP Timers

                                                      Based on Huawei documentation here and here, I personally tested the following configuration and observed that BGP negotiation time and stability (during occasional link flaps/packet loss) improved significantly, so I would recommend network operators to set the same timers globally on their networks (for both eBGP and iBGP) – Keepalive time to 20s, Holdtime to 60s.

                                                      • /routing bgp template
                                                        set default as=149794 disabled=no hold-time=1m keepalive-time=20s

                                                      Preferably convince your peers to do the same config on their end as well at least for the individual BGP sessions that are between you and them.

                                                      Traffic Engineering and loop prevention
                                                      • Always route your aggregated prefixes [Like say you have a /24 or /22 (IPv4) or /32 or /36 (IPv6)] to blackhole for IPv4+IPv6 to prevent layer 3 looping and stop disabling synchronisation on RouterOS v6, it is anyways mandatory on RouterOS v7 to either route to blackhole or have the prefix assigned to an interface
                                                        • This will also reduce CPU usage whenever downstream routers/users/switches go offline and incomplete traffic from remote hosts/networks keeps trying to establish a connection and since it gets routed to blackhole it will immediately timeout and save resources.
                                                          • In other words, there’s no sense in doing things that increase CPU usage (not routing to blackhole)
                                                          • And there is no sense in avoiding loop prevention mechanisms
                                                          • If you have multi-homing transit
                                                            • Always at the very least, request for partial routing table from all the upstream providers you’re connected to. If the router can handle full tables from the upstreams, go for it!
                                                              • This will ensure your router has the best paths to choose from
                                                              • Stop going with the strange concept of taking only default routes from the upstreams and creating asymmetric routing conditions where outgoing traffic is going via Transit A and incoming traffic is coming in via Transit B.
                                                              • If you need traffic engineering, you can consider BGP based load balancing or local preferences with some automation like Pathvector
                                                              • Still request for partial table/full table whichever fits your router’s specs in order to futureproof in case you plan to go multi-home

                                                              Filtering & Security

                                                              We only need to do broadly two things for filtering and security:

                                                              1. Implement MANRS throughout your network (and business)
                                                              2. Use the RAW table to drop remaining bogon/rubbish traffic similar to the one used on the BNG and you can also use it for ACL if you need that
                                                                • CPU usage stays minimal when using the RAW table
                                                                • Absolutely nothing on the filter table i.e. no stateful firewalling
                                                                  • The only exception here is we can use FastTrack for untracked traffic i.e. stateless traffic to improve IPv4 routing performance
                                                              IPv4 Firewall
                                                              IPv6 Firewall

                                                              Firewall Explanation

                                                              I will keep this concise as stated earlier I suggest you study and understand how iptables function in general and study the packet flow to know what rule does what: With that being said, I will break it down into simpler points

                                                              • I used this and this as the source for building the base for the firewall
                                                                • MikroTik has ensured to conform to various RFCs and taken the efforts to not break any legitimate protocol/traffic
                                                                • The RAW rules drop anything coming from WAN that’s spoofed (RFC 6890 addresses)
                                                                • The RAW rules drop anything coming from LAN that does not match your public prefixes/internal subnets (aka lan_subnets address list), meaning any spoofing traffic is dropped from exiting your network
                                                                  an APNIC blog post detailing more on this subject

                                                                Strange Anomalies

                                                                These are some strange behaviours that I could not explain. If you have further information, please reach out to me.

                                                                TCP MSS – Maximum Segment Size option

                                                                This article will discuss the TCP Maximum Segment Size option also called TCP MSS value. Give a brief understanding of how TCP MSS works with examples.

                                                                It has been defined under RFC that the host will not send datagrams larger than 576 bytes to its destination unless the destination specifies its source that it can accept larger datagrams.

                                                                According to the above statement, the Maximum Segment Size will be 576-20(IP)-20(TCP) = 536 bytes if options not present in the TCP header.

                                                                TCP MSS Option

                                                                If the host will not advertise its acceptance to receive more data, the sender would send only 536 bytes of TCP segment to receiver hence causing slow throughout, more packets on the wire, and long time to transfer a file.

                                                                It provides an option called Maximum segment size exchanged during the 3-way handshake process in SYN and SYN+ACK segments. This option when received by each side tells about the maximum segment size accepted on TCP connection. Remember this option is non-negotiated as other options used by TCP. The maximum segment size (TCP MSS) value advertised in the option field is completely independent on each side.

                                                                To notify sender, of the largest segment the receiver can receive, calculated as below

                                                                MSS = MTU – TCP header – IP header

                                                                From below screenshot TCP option field is 32 bit long which includes 16-bit MSS value. Maximum value can be 65535 however that’s very rare.

                                                                TCP MSS wireshark

                                                                MSS value depends on interface MTU. So, if we change the MTU of some source interface it would cause the MSS value to change.

                                                                Clamping TCP MSS

                                                                The value of MSS can change or clamped at either source host or at the transit nodes like router, firewall etc.

                                                                Source Host clamping

                                                                To simulate this, we will use the window machine to change the MTU size from 1500 bytes to 900 bytes.

                                                                Current Wi-Fi interface MTU is 1500 bytes as below

                                                                Now will set the MTU to 900 bytes as below

                                                                Initiated the connection and observer the MSS value which now changed to 860 bytes seen in below wireshark output

                                                                TCP MSS clamped wireshark

                                                                Transit Node Clamping MSS

                                                                The MSS value can be also clamped in transit on the way to destination host. This is mostly performed by routers or firewalls etc.

                                                                TCP MSS clamp at router

                                                                In cisco routes we can use the command “ip tcp adjust-mss” per-interface level to change the maximum segment size value in the SYN segment going through a router.

                                                                In the above figure Router A will clamp the value to 1000 bytes causing Host B not to send segment more than 1000 bytes even though Host A can receive 1460 bytes segment.

                                                                iptables clamp-mss-to-pmtu и выбор цепочки.

                                                                Где правильно применять clamp-mss-to-pmtu и —set-mss в iptables. В FORWARD, в mangle FORWARD или в POSTROUTING? Сколько не смотрел доков и статей, единства не нашел.

                                                                Общий ответ в mangle. А forward или postrouting зависит от желаемого результата, через forward пройдут только транзитные пакеты, через postrouting и транзитные и локальные.

                                                                Благодарю! Еще вопрос о направлении трафика достаточно одного правила out или нужно еще in?

                                                                Вы имеете в виду входящие/исходящие? mss корректируют у исходящих пакетов. Входящий к Вам уже пришел.

                                                                Некоторые сайты не открываются через тунель.

                                                                Вот так не работает, с одним правилом.

                                                                а если добавить:

                                                                Как я уже писал «зависит от желаемого результата». В общем случае построутинга хватит всем 🙂

                                                                В вашем примере это два почти разных по поведению правила. В первом случае пакет «нового соединения» который улетит с tun0 и без разницы с какого интерфейса он прилетел включая тот же tun0. Во втором, пакет «нового соединения» который прилетел на tun0 и улетит с любого интерфейса включая тот же tun0. Одинаковое поведение у этих правил будет только в случае если у пакета входящий и исходящий интерфейсы tun0.
                                                                ЗЫ Простите, может невнятно написал, в первом моем ответе я не рассматривал указание интерфейса, подразумевал только таблицу mangle и цепочку FORWARD или POSTROUTING.

                                                                Во первых, что у Вас за tun0? И во вторых, —clamp-mss-to-pmtu делают на интерфейсе, смотрящем в провайдера, т.е. никак не tun0. Если у Вас tun0 — это openvpn, то в нем самом нужно фиксить mss. Там есть параметр в конфиге, нет под рукой — погуглите.

                                                                Схема следующая, туннель gre/ipsec на одном конце MikroTik на другом VPS Ubuntu. MTU на туннельном интерфейсе 1418 одинаков и на Linux и на Mikrotik. Со стороны Mikrotik есть несколько клиентов которые должны ходить через этот тунель. Возникла проблема с некоторым кол-вом сайтов, которые не открываются, видимо PMTUD по какой либо причине не может быть задействован. По этой причине возникла идея с MSS. Сейчас на Ubuntu:

                                                                Ну вот у меня например wifi на мобилу раздаётся, мне как прописывать?

                                                                iptables -t mangle -A POSTROUTING -p tcp —tcp-flags SYN,RST SYN -j TCPMSS —clamp-mss-to-pmtu

                                                                И во вторых, —clamp-mss-to-pmtu делают на интерфейсе, смотрящем в провайдера, т.е. никак не tun0.

                                                                Да шо ви говорите? Ну вот интерфейс прова у меня eth1, но дэфроут улетает на tun15 через прова по udp, и как мне поможет в этом случае исправление mss на интерфейсе eth1 ?

                                                                А если нет? Что делааать, что делааать? Памагите.

                                                                Вы очень узко рассматриваете задачу, в виде один входящий, один исходящий и mss-to-pmtu. Но вот представьте вариант (не реальный, но описывающий саму проблему). У вас сервер на котором исходящих интерфейсов более одного, на котором:
                                                                1. сервер ovpn1, всем клиентам на исходящие новые соединения надо установить mss 1000, клиентов раскидываем по разным исходящим интерфейсам.
                                                                2. серер ovpn2, всем клиентам на исходящие новые соединения надо установить mss 1200, клиентов раскидываем по разным исходящим интерфейсам.
                                                                3. ipsec — клиентам устанавливаем mss 1380
                                                                4. локальный исходящий, mss не трогаем
                                                                Решение возможно только с указанием входящего интерфейса (в случае ipsec направления), но никак не исходящего.

                                                                MikroTik.by

                                                                For every complex problem, there is a solution that is simple, neat, and wrong.

                                                                • Темы без ответов
                                                                • Активные темы
                                                                • Поиск
                                                                • Список форумовФорум по операционной системе MikroTik RouterOSОбщие вопросы
                                                                • Поиск

                                                                Вопрос про firewall mark-routing

                                                                Вопрос про firewall mark-routing

                                                                Сообщение atbizz » 06 фев 2019, 08:15

                                                                Есть обычный gre-tunnel между mikrotik и сервером вне россии, нужно траффик например к заблокированному rutracker.org направить через этот туннель, а остальной по умолчанию.

                                                                Если сделать routing таким образом (ниже) то все работает:

                                                                Re: Вопрос про firewall mark-routing

                                                                Сообщение Chupaka » 06 фев 2019, 17:10

                                                                Ну, в целом, должно работать идентично (за исключением «src-address=192.168.1.0/24»).

                                                                У вас точно нет других правил, которые могут мешать этому? Перемаркировывать некоторые пакеты, например.

                                                                По поводу Change-MSS: а если просто галку на GRE выставить Clamp TCP MSS — не работает без этих правил?

                                                                 

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *