One of the common questions asked by people who begin their BGP journey is related to BGP ‘Next-Hop Self’ configuration option. What does it do? Should I use it on my network? What will happen if I forget to configure it? Today we’ll try to answer these questions.
BGP Next-Hop Attribute
RFC 4271 defined Next-Hop attribute as follows:
The NEXT_HOP is a well-known mandatory attribute that defines the IP address of the router that SHOULD be used as the next hop to the destinations listed in the UPDATE message.
Basically, Next-Hop forces the router to do a recursive lookup in order to determine which egress interface should be used to send the packets out.
Let’s look at the following example. Router’s BGP table is populated with BGP routes and associated Next-Hop attributes.
CE1#show ip bgp BGP table version is 38, local router ID is 192.168.3.231 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found Network Next Hop Metric LocPrf Weight Path *> 1.0.0.0 120.0.4.17 0 100 i
For example, if router needs to forward traffic destined to a device within 1.0.0.0/8 range, it will
- determine that the next-hop for this network is 120.0.4.17
- do a route-lookup to determine what egress interface and next-hop to use to send traffic to 120.0.4.17
- will forward packets destined to 1.0.0.0/8 via the interface / next-hop determined in the previous step
Here is a way to retrieve this information from a Cisco IOS-based router:
CE1#show ip bgp 1.0.0.0/8 BGP routing table entry for 1.0.0.0/8, version 3 Paths: (1 available, best #1, table default) Not advertised to any peer Refresh Epoch 1 100 120.0.4.17 from 120.0.4.17 (120.0.2.2) Origin IGP, localpref 100, valid, external, best rx pathid: 0, tx pathid: 0x0 CE1#show ip route 120.0.4.17 Routing entry for 120.0.4.16/30 Known via "connected", distance 0, metric 0 (connected, via interface) Routing Descriptor Blocks: * directly connected, via GigabitEthernet2 Route metric is 0, traffic share count is 1
Indirect Next-Hop
The previous example demonstrated a simple scenario, where BGP speakers used its interface IP address as the Next-Hop IP. This is the traditional case for EBGP deployments as shown below:
Things get more complicated if CE1 has to re-advertised EBGP-learned route via IGBP to other routers within the network.
The default behavior for CE1 is to propagate EBGP-learned prefixes to IBGP peers without changing the next-hop.
This means that both CE2 and CE3 will receive 1.0.0.0/8 prefix with the next-hop attribute of 120.0.4.17.
CE2: #show ip bgp 1.0.0.0/8 BGP routing table entry for 1.0.0.0/8, version 0 Paths: (1 available, no best path) Not advertised to any peer Refresh Epoch 1 100 120.0.4.17 (inaccessible) from 200.0.0.11 (200.0.0.11) Origin IGP, metric 0, localpref 100, valid, internal rx pathid: 0, tx pathid: 0 CE3: #show ip bgp 1.0.0.0/8 BGP routing table entry for 1.0.0.0/8, version 0 Paths: (1 available, no best path) Not advertised to any peer Refresh Epoch 2 100 120.0.4.17 (inaccessible) from 200.0.0.11 (200.0.0.11) Origin IGP, metric 0, localpref 100, valid, internal rx pathid: 0, tx pathid: 0
But now we are facing a problem – Next-Hop IP 120.0.4.17 is not known to CE2 and CE3. Because CE2/CE3 don’t know how to reach this IP, 1.0.0.0/8 route will not be installed in the routing table.
CE2: #show ip route 1.0.0.0 255.0.0.0 % Network not in table
Next-Hop Reachability
We determined that without Next-Hop reachability, BGP-learned route will not be injected into BGP. There are two ways to solve this issue:
- Advertise Next-Hop subnet via IGP (OSPF, IS-IS, RIP, EIGRP, etc)
- Use Next-Hop command to modify the next-hop IP
Solution 1 – Inject Next-Hop IP into IGP
This method requires you to advertise EBGP-facing range into IGP.
In our lab, we run OSPF as IGP protocol, so we’d add the following configuration statements on CE-1 device:
router ospf 1 network 120.0.4.16 0.0.0.3 area 0
Now, if you check the routing table on CE2, you should see that the network 1.0.0.0/8
CE2#show ip bgp 1.0.0.0/8 BGP routing table entry for 1.0.0.0/8, version 2 Paths: (1 available, best #1, table default) Not advertised to any peer Refresh Epoch 1 100 120.0.4.17 (metric 2) from 200.0.0.11 (200.0.0.11) Origin IGP, metric 0, localpref 100, valid, internal, best rx pathid: 0, tx pathid: 0x0 CE2#show ip route 120.0.4.17 Routing entry for 120.0.4.16/30 Known via "ospf 1", distance 110, metric 2, type intra area Last update from 200.0.1.1 on GigabitEthernet2, 00:06:06 ago Routing Descriptor Blocks: * 200.0.1.1, from 200.0.0.11, 00:06:06 ago, via GigabitEthernet2 Route metric is 2, traffic share count is 1 CE2#show ip route 10.0.0.8 255.0.0.0 Routing entry for 10.0.0.0/8 Known via "bgp 111100", distance 200, metric 0 Tag 100, type internal Last update from 120.0.4.17 00:06:14 ago Routing Descriptor Blocks: * 120.0.4.17, from 200.0.0.11, 00:06:14 ago Route metric is 0, traffic share count is 1 AS Hops 1 Route tag 100 MPLS label: none
If you decide to adopt this approach, it is paramount to configure EBGP-facing interface as IGP Passive, otherwise you risk merging IGP domains with your EBGP peer. In simple terms, you might end up creating one huge OSPF area that will span across multiple networks under different administrative controls – disaster in the making.
router ospf 1 passive-interface GigabitEthernet2
Solution 2 – Next-Hop Self
Although injecting EBGP point-to-point blocks into IGP is a possible solution for Next-Hop reachability problem, it creates unnecessary security risk and complicates configuration. More elegant solution is to force EBGP-speaking routers to modify the Next-Hop attribute before re-advertising the route to IBGP peers.
In this case, you are no longer required to advertise EBGP-facing Point-to-Point IP prefixes via IGP. What you need to do is to configure Next-Hop Self for IBGP sessions.
router bgp 111100 neighbor 200.0.0.12 next-hop-self neighbor 200.0.0.13 next-hop-self
This will change the next-hop IP for 1.0.0.0/8 on CE2 and CE3:
CE2>show ip bgp 1.0.0.0/8 BGP routing table entry for 1.0.0.0/8, version 70 Paths: (1 available, best #1, table default) Not advertised to any peer Refresh Epoch 1 100 200.0.0.11 (metric 2) from 200.0.0.11 (200.0.0.11) Origin IGP, metric 0, localpref 100, valid, internal, best rx pathid: 0, tx pathid: 0x0 CE2>show ip route 200.0.0.11 Routing entry for 200.0.0.11/32 Known via "ospf 1", distance 110, metric 2, type intra area Last update from 200.0.1.1 on GigabitEthernet2, 00:49:37 ago Routing Descriptor Blocks: * 200.0.1.1, from 200.0.0.11, 00:49:37 ago, via GigabitEthernet2 Route metric is 2, traffic share count is 1 CE2>show ip route 1.0.0.0 255.0.0.0 Routing entry for 1.0.0.0/8 Known via "bgp 111100", distance 200, metric 0 Tag 100, type internal Last update from 200.0.0.11 00:02:18 ago Routing Descriptor Blocks: * 200.0.0.11, from 200.0.0.11, 00:02:18 ago Route metric is 0, traffic share count is 1 AS Hops 1 Route tag 100 MPLS label: none
Conclusion
Next-Hop IP is a mandatory attribute and as such present in all IP Prefix advertisements. Failure to resolve Next-Hop will cause BGP routes to be rejected.
While injecting EBGP Peer-facing Point-to-Point IP addresses into IGP protocol is a viable workaround, it is recommended to use Next-Hop Self instead to simplify configuration and avoid security issues.
Next-Hop Self FAQ
Next-Hop Attributes and Route-Reflectors
Route-Reflectors must not change Next-Hop attribute for routers that are being reflected. Failure to follow this rule will attract data traffic to the Route-Reflectors. This is not desirable, as Route-Reflectors are Control, not Data nodes and might not have capacity to forward traffic.
Next-Hop Self and EBGP Peers
Be default, routes advertised to EBGP peers will have Next-Hop attribute changed to EBGP session’s source IP address. You don’t have to do anything. There is no point in configuring ‘next-hop self’ on EBGP sessions, everything is done automatically.
How to configure Next-Hop Self on different platforms
- Cisco IOS and IOS-XE
router bgp 111100 neighbor 200.0.0.12 next-hop-self
- Cisco IOS-XR
RP/0/0/CPU0:router(config)# router bgp 111100 RP/0/0/CPU0:router(config-bgp)# neighbor 200.0.0.12 RP/0/0/CPU0:router(config-bgp-nbr)# remote-as 111100 RP/0/0/CPU0:router(config-bgp-nbr)# address-family ipv4 unicast RP/0/0/CPU0:router(config-bgp-nbr-af)# next-hop-self
- Cisco NX-OS
switch (config) # router bgp 111100 switch (config-router) # address-family ipv4 unicast switch (config-router) # neighbor 200.0.0.12 remote-as 111100 switch (config-router-af) # next-hop-self
- Juniper JunOS
Set Format: set policy-options policy-statement NextHopSelf term one then next-hop self set protocols bgp group IBGP export NextHopSelf Curly Braces: policy-statement NextHopSelf { term one { then { next-hop self; } } } protocols { bgp { group IBGP { export NextHopSelf; } } }
very clear and concise ..exactly what i needed for a quick refresh on why iBGP routes may not be advertised to eBGP peers
Very well explained, pleasure reading 🙂
Thank you
Very well explained and thank you !