Showing posts with label Router. Show all posts
Showing posts with label Router. Show all posts

Veeam backup causes BGP route flapping on VMware NSX-T Edge VMs

When running VMware NSX-T with BGP and BFD and you are using Veeam backup, you may see BGP route flapping or BGP neighbor adjchanges or Down BGP Notification FSM-ERR.

Issue could be caused by Veeam backup, which is creating a snapshot of your NSX-T edge VM in order to back it up.

Logs show something like:
2020-12-20T20:38:05.278Z| vcpu-0| I125: Checkpoint_Unstun: vm stopped for 142898 us
2020-12-20T20:35:05.806Z| vcpu-0| I125: SnapshotVMXTakeSnapshotComplete: Done with snapshot 'VEEAM BACKUP TEMPORARY SNAPSHOT': 153

Router logs show something like:
date=2020-12-20,time=20:35:10,devname="fwdev01",logid="0103020300",type="event",subtype="router",level="warning",vd="dev",eventtime=693310,logdesc="BGP neighbor status changed",msg="BGP: %BGP-5-ADJCHANGE: neighbor 172.23.39.35 Up "
date=2020-12-20,time=20:35:10,devname="fwdev01",logid="0103020301",type="event",subtype="router",level="warning",vd="dev",eventtime=693310,logdesc="Routing log",msg="BGP: 172.23.39.35-Outgoing [DECODE] Open Cap: unrecognized capability code 73 len 8"
date=2020-12-20,time=20:35:10,devname="fwdev01",logid="0103020301",type="event",subtype="router",level="warning",vd="dev",eventtime=693310,logdesc="Routing log",msg="BGP: 172.23.39.35-Outgoing [DECODE] Open Cap: unrecognized capability code 69 len 4"
date=2020-12-20,time=20:35:06,devname="fwdev01",logid="0103020300",type="event",subtype="router",level="warning",vd="dev",eventtime=693306,logdesc="BGP neighbor status changed",msg="BGP: %BGP-5-ADJCHANGE: neighbor 172.23.39.35 Down BGP Notification FSM-ERR"
date=2020-12-20,time=20:35:06,devname="fwdev01",logid="0103020301",type="event",subtype="router",level="warning",vd="dev",eventtime=693306,logdesc="Routing log",msg="BGP: %BGP-3-NOTIFICATION: received from 172.23.39.35 6/2 (Cease/Administratively Shutdown.) 0 data-bytes

Small/Medium Businesses - New network devices (switches, routers,..) - Minimum ToDo list

Most small/medium businesses don't do much configuration, monitoring, cfg-baselineing or follow best practises with their network devices like switches, routers, wireless controllers, access points etc. Here is a short list of things you should do at minimum:

1. Extend your monitoring of your network devices

Not only ping them, check their uptime by snmp, but:
1.1 Monitor all uplinks (e.g. SNMP bandwidth)
1.2 Monitor all important ports (ports of the servers, firewalls, storage, etc; again e.g. with SNMP bandwidth)
1.3 Monitor device health, fan status, temperature, etc
1.4 Monitor the routing table, especially if you use dynamic routing protocols and/or have many routes
1.5 Monitor utilization of cpu, mem, i/o, etc..
1.6 Monitor everything with secure protocols like SSH, SNMPv3 AuthPriv AES+SHA
1.7 Send SNMP traps from devices to your monitoring system
1.8 Send Syslog from your devices to your monitoring system & logging solution

2. Harden your network

2.1 Disable telnet
2.2 Disable http
2.3 Implement ACLs for allowing access only from dedicated trusted hosts
2.4 Implement ACLs for dynamics routing protocols like BGP, OSPF, etc
2.5 Use LDAPS/Radius authentication for LDAP/AD-authentication for device mgmt
2.6 Send Syslog from your devices to your monitoring system & logging solution
2.7 Disable SNMPv1/v2c
2.8 Use DHCP-Snooping for Rouge DHCP server protection
2.9 Use ARP Spoofing Protection
2.10 Think about disabling link-layer discovery protocols like LLDP, CDP, EDP, etc
2.11 Allow local admin account login only if LDAPS/Radius server is not reachable
2.12 Delete default users, groups and communities

3. Authentication & dynamische vlan assignment

3.1 Use IEEE 802.1x with certificates (at least two AAA Radius Serves (e.g. FreeRadius) with EAP-TLS)
3.2 Use rfc3580 for dynamic vlan assignment
3.3 Think about using either a quarantine fallback vlan for not authenticated clients or a guest vlan with internet access only
3.4 Think of using DHCP Snooping (forwarding) for your devices which does device fingerprinting

4. Documentation

4.1 Create a layer1 and layer2 network plan (e.g. in visio)
4.2 Create a layer2 and layer3 network plan (e.g. in visio)
4.3 Use the l2&l3 plan as background for your monitoring system in a map to have a live-overview

5. Testing

5.1 Test your loop protection (m/r/stp, loop-protect, elrp, + broadcast limit thresholds like max 200 broadcasts per second, etc) in a maintenance window
5.2 Test your "CrossVlan Protection" in a maintenace window. By "CrossVlan" I mean not wanted connections between to vlans, which should be separated (m/r/stp, loop-protect, extra VLAN which is tagged on all ports and sends ELRP or similar loop protection protocols, etc)
5.3 Test your monitoring alerting - is an alert really send when e.g. an important uplink is full or disconnected, if an important lacp lag is down, etc (test using simulation, e.g. via jPerf, Observer etc)
5.4 Check and test if all best practises of the vendor are applied

6. IP-Subnetting

Yes, so many small and medium companys still have a huge flat layer2 network per site :(
6.1 The more subnets, the more a network issue stays only in that tiny subnet
6.2 The smaller the subnet, the less background noise
6.3 Microsegmentation is key! The smaller the subnet and the more it is separated (using private vlans, ACLs, a firewall, filtering device, host firewalls, a microsegmentation solution, NSX-T or something similar), the more it is protected and lateral movement gets harder.

There are many more things, like using LACP instead of static link aggregation groups, using LACP Mode Fast instead of the default slow, using Bidirectional Forwarding Detection "BFD" for everything, using multi-chassis link-aggregation (like MC-LAG, MLAG, etc) instead of Stacking (firmware-updates & reboots mostly cost the whole stack-topology to reboot, which is not the case in MLAG), using Out of Band management, and much more.

The listed items are the things which should be done at minimum.

Monitor UniFi WLAN Access Point with PRTG with SNMPv3 Auth+Encrypted

This is a tiny guide howto monitor your UniFi wireless accesspoint, in this case a Unifi U7 pro with SNMPv3 with AES-Encryption and SHA-Auth...