BGP Next-Hop Self Explained

One of the common questions asked by people who begin their BGP journey is related to BGP ‘Next-Hop Self’ configuration option. What does it do? Should I use it on my network? What will happen if I forget to configure it? Today we’ll try to answer these questions.

BGP Next-Hop Attribute

RFC 4271 defined Next-Hop attribute as follows:

The NEXT_HOP is a well-known mandatory attribute that defines the IP address of the router that SHOULD be used as the next hop to the destinations listed in the UPDATE message.

Basically, Next-Hop forces the router to do a recursive lookup in order to determine which egress interface should be used to send the packets out.

Let’s look at the following example. Router’s BGP table is populated with BGP routes and associated Next-Hop attributes.

CE1#show ip bgp

BGP table version is 38, local router ID is 192.168.3.231

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,

              x best-external, a additional-path, c RIB-compressed,

Origin codes: i - IGP, e - EGP, ? - incomplete

RPKI validation codes: V valid, I invalid, N Not found

 

     Network          Next Hop            Metric LocPrf Weight Path

 *>  1.0.0.0          120.0.4.17                             0 100 i


 

For example, if router needs to forward traffic destined to a device within 1.0.0.0/8 range, it will

  • determine that the next-hop for this network is 120.0.4.17
  • do a route-lookup to determine what egress interface and next-hop to use to send traffic to 120.0.4.17
  • will forward packets destined to 1.0.0.0/8 via the interface / next-hop determined in the previous step

Here is a way to retrieve this information from a Cisco IOS-based router:

CE1#show ip bgp 1.0.0.0/8

BGP routing table entry for 1.0.0.0/8, version 3

Paths: (1 available, best #1, table default)

  Not advertised to any peer

  Refresh Epoch 1

  100

    120.0.4.17 from 120.0.4.17 (120.0.2.2)

      Origin IGP, localpref 100, valid, external, best

      rx pathid: 0, tx pathid: 0x0



CE1#show ip route 120.0.4.17

Routing entry for 120.0.4.16/30

  Known via "connected", distance 0, metric 0 (connected, via interface)

  Routing Descriptor Blocks:

  * directly connected, via GigabitEthernet2

      Route metric is 0, traffic share count is 1

Indirect Next-Hop

The previous example demonstrated a simple scenario, where BGP speakers used its interface IP address as the Next-Hop IP. This is the traditional case for EBGP deployments as shown below:

BGP Next-Hop Unchanged EBGP
BGP Next-Hop Self over EBGP

Things get more complicated if CE1 has to re-advertised EBGP-learned route via IGBP to other routers within the network.

The default behavior for CE1 is to propagate EBGP-learned prefixes to IBGP peers without changing the next-hop.

BGP Next-Hop Unchanged
BGP Next-Hop Unchanged

This means that both CE2 and CE3 will receive 1.0.0.0/8 prefix with the next-hop attribute of 120.0.4.17.

CE2:

#show ip bgp 1.0.0.0/8

BGP routing table entry for 1.0.0.0/8, version 0

Paths: (1 available, no best path)

Not advertised to any peer

Refresh Epoch 1

100

120.0.4.17 (inaccessible) from 200.0.0.11 (200.0.0.11)

Origin IGP, metric 0, localpref 100, valid, internal

rx pathid: 0, tx pathid: 0



CE3:

#show ip bgp 1.0.0.0/8

BGP routing table entry for 1.0.0.0/8, version 0

Paths: (1 available, no best path)

Not advertised to any peer

Refresh Epoch 2

100

120.0.4.17 (inaccessible) from 200.0.0.11 (200.0.0.11)

Origin IGP, metric 0, localpref 100, valid, internal

rx pathid: 0, tx pathid: 0

But now we are facing a problem – Next-Hop IP 120.0.4.17 is not known to CE2 and CE3. Because CE2/CE3 don’t know how to reach this IP, 1.0.0.0/8 route will not be installed in the routing table.

CE2:

#show ip route 1.0.0.0 255.0.0.0

% Network not in table

Next-Hop Reachability

We determined that without Next-Hop reachability, BGP-learned route will not be injected into BGP. There are two ways to solve this issue:

  • Advertise Next-Hop subnet via IGP (OSPF, IS-IS, RIP, EIGRP, etc)
  • Use Next-Hop command to modify the next-hop IP

Solution 1 – Inject Next-Hop IP into IGP

This method requires you to advertise EBGP-facing range into IGP.

In our lab, we run OSPF as IGP protocol, so we’d add the following configuration statements on CE-1 device:

router ospf 1

 network 120.0.4.16 0.0.0.3 area 0

Now, if you check the routing table on CE2, you should see that the network 1.0.0.0/8

CE2#show ip bgp 1.0.0.0/8

BGP routing table entry for 1.0.0.0/8, version 2

Paths: (1 available, best #1, table default)

  Not advertised to any peer

  Refresh Epoch 1

  100

    120.0.4.17 (metric 2) from 200.0.0.11 (200.0.0.11)

      Origin IGP, metric 0, localpref 100, valid, internal, best

      rx pathid: 0, tx pathid: 0x0

 

CE2#show ip route 120.0.4.17

Routing entry for 120.0.4.16/30

  Known via "ospf 1", distance 110, metric 2, type intra area

  Last update from 200.0.1.1 on GigabitEthernet2, 00:06:06 ago

  Routing Descriptor Blocks:

  * 200.0.1.1, from 200.0.0.11, 00:06:06 ago, via GigabitEthernet2

      Route metric is 2, traffic share count is 1

 

CE2#show ip route 10.0.0.8 255.0.0.0

Routing entry for 10.0.0.0/8

  Known via "bgp 111100", distance 200, metric 0

  Tag 100, type internal

  Last update from 120.0.4.17 00:06:14 ago

  Routing Descriptor Blocks:

  * 120.0.4.17, from 200.0.0.11, 00:06:14 ago

      Route metric is 0, traffic share count is 1

      AS Hops 1

      Route tag 100

      MPLS label: none

If you decide to adopt this approach, it is paramount to configure EBGP-facing interface as IGP Passive, otherwise you risk merging IGP domains with your EBGP peer. In simple terms, you might end up creating one huge OSPF area that will span across multiple networks under different administrative controls – disaster in the making.

router ospf 1

 passive-interface GigabitEthernet2

Solution 2 – Next-Hop Self

Although injecting EBGP point-to-point blocks into IGP is a possible solution for Next-Hop reachability problem, it creates unnecessary security risk and complicates configuration. More elegant solution is to force EBGP-speaking routers to modify the Next-Hop attribute before re-advertising the route to IBGP peers.

BGP Next-Hop Self
BGP Next-Hop Self

In this case, you are no longer required to advertise EBGP-facing Point-to-Point IP prefixes via IGP. What you need to do is to configure Next-Hop Self for IBGP sessions.

 router bgp 111100

  neighbor 200.0.0.12 next-hop-self

  neighbor 200.0.0.13 next-hop-self

 

This will change the next-hop IP for 1.0.0.0/8 on CE2 and CE3:

CE2>show ip bgp 1.0.0.0/8

BGP routing table entry for 1.0.0.0/8, version 70

Paths: (1 available, best #1, table default)

  Not advertised to any peer

  Refresh Epoch 1

  100

    200.0.0.11 (metric 2) from 200.0.0.11 (200.0.0.11)

      Origin IGP, metric 0, localpref 100, valid, internal, best

      rx pathid: 0, tx pathid: 0x0

 

CE2>show ip route 200.0.0.11

Routing entry for 200.0.0.11/32

  Known via "ospf 1", distance 110, metric 2, type intra area

  Last update from 200.0.1.1 on GigabitEthernet2, 00:49:37 ago

  Routing Descriptor Blocks:

  * 200.0.1.1, from 200.0.0.11, 00:49:37 ago, via GigabitEthernet2

      Route metric is 2, traffic share count is 1



CE2>show ip route 1.0.0.0 255.0.0.0

Routing entry for 1.0.0.0/8

  Known via "bgp 111100", distance 200, metric 0

  Tag 100, type internal

  Last update from 200.0.0.11 00:02:18 ago

  Routing Descriptor Blocks:

  * 200.0.0.11, from 200.0.0.11, 00:02:18 ago

      Route metric is 0, traffic share count is 1

      AS Hops 1

      Route tag 100

      MPLS label: none

Conclusion

Next-Hop IP is a mandatory attribute and as such present in all IP Prefix advertisements. Failure to resolve Next-Hop will cause BGP routes to be rejected.

While injecting EBGP Peer-facing Point-to-Point IP addresses into IGP protocol is a viable workaround, it is recommended to use Next-Hop Self instead to simplify configuration and avoid security issues.

Next-Hop Self FAQ

Next-Hop Attributes and Route-Reflectors

Route-Reflectors must not change Next-Hop attribute for routers that are being reflected. Failure to follow this rule will attract data traffic to the Route-Reflectors. This is not desirable, as Route-Reflectors are Control, not Data nodes and might not have capacity to forward traffic.

Next-Hop Self and EBGP Peers

Be default, routes advertised to EBGP peers will have Next-Hop attribute changed to EBGP session’s source IP address. You don’t have to do anything. There is no point in configuring ‘next-hop self’ on EBGP sessions, everything is done automatically.

How to configure Next-Hop Self on different platforms

  • Cisco IOS and IOS-XE
router bgp 111100

  neighbor 200.0.0.12 next-hop-self
  • Cisco IOS-XR
RP/0/0/CPU0:router(config)# router bgp 111100

RP/0/0/CPU0:router(config-bgp)# neighbor 200.0.0.12 

RP/0/0/CPU0:router(config-bgp-nbr)# remote-as 111100

RP/0/0/CPU0:router(config-bgp-nbr)# address-family ipv4 unicast

RP/0/0/CPU0:router(config-bgp-nbr-af)# next-hop-self
  • Cisco NX-OS
switch (config) # router bgp 111100

switch (config-router) # address-family ipv4 unicast

switch (config-router) # neighbor 200.0.0.12 remote-as 111100

switch (config-router-af) # next-hop-self
  • Juniper JunOS
Set Format:

set policy-options policy-statement NextHopSelf term one then next-hop self

set protocols bgp group IBGP export NextHopSelf



Curly Braces:

policy-statement NextHopSelf {

    term one {

        then {

            next-hop self;

        }

    }

}

 

protocols {

    bgp {

     group IBGP {

      export NextHopSelf;

      }

    }

}

3 thoughts on “BGP Next-Hop Self Explained”

  1. very clear and concise ..exactly what i needed for a quick refresh on why iBGP routes may not be advertised to eBGP peers

Leave a Reply

Your email address will not be published. Required fields are marked *