|
0:00:15
|
We are going to talk about the other part of the higher availableability, which is the failover
|
|
0:00:20
|
Now the asa supports two different types of failover we were talking about
|
|
0:00:25
|
active standby and active active
|
|
0:00:27
|
where active standby means that there is going to be one physical unit that is passing the traffic
|
|
0:00:33
|
as the standby unit waits
|
|
0:00:35
|
for the active unit to fail
|
|
0:00:38
|
for the second variation which is active active failover
|
|
0:00:41
|
both units are going to be forwarding traffic at the same time
|
|
0:00:45
|
because we are running in multiple context mode
|
|
0:00:48
|
and different context are going to be active
|
|
0:00:51
|
on the different physical firewalls at the same time
|
|
0:00:55
|
So if I had asa1 and asa2
|
|
0:00:58
|
and context a and context b
|
|
0:01:00
|
I could have asa1
|
|
0:01:02
|
forwarding for context a
|
|
0:01:04
|
and asa 2 forwarding for context b
|
|
0:01:07
|
then if there were any link failures or any node failures
|
|
0:01:11
|
the entire box just crashes
|
|
0:01:13
|
then whatever is the standby
|
|
0:01:16
|
for that particular context is going to takeover
|
|
0:01:19
|
So if asa1 is forwarding just for a
|
|
0:01:22
|
and asa2 goes down
|
|
0:01:24
|
asa1 will then start forwarding for a and b at the same time
|
|
0:01:29
|
so active-active is only going to be available when we are working with the
|
|
0:01:34
|
the multiple context mode
|
|
0:01:36
|
Now when we look at the variations of the firewall mode
|
|
0:01:41
|
active standby
|
|
0:01:42
|
is going to run in either
|
|
0:01:44
|
single context mode with the routed firewall
|
|
0:01:47
|
or single context mode with the transparent firewall
|
|
0:01:52
|
where active active is going to run in multi context mode with the routed firewall
|
|
0:01:56
|
or multi context mode with the transparent firewall
|
|
0:02:02
|
now in either case
|
|
0:02:05
|
regardless of whether we are running active standby or active active
|
|
0:02:08
|
and what the firewall mode is, whether its routed or transparent
|
|
0:02:13
|
the standby unit is always going to be monitoring the active unit in two different ways
|
|
0:02:19
|
based on a dedicated failover link monitoring
|
|
0:02:23
|
that is going to be using a special layer2 live alive
|
|
0:02:26
|
that is specific to the ASAs
|
|
0:02:28
|
and then also interface monitoring
|
|
0:02:31
|
for links that are running IPv4
|
|
0:02:34
|
we are going to use regular icmp pings as the monitoring
|
|
0:02:38
|
so this is typically where we would be watching the outside interface or the inside interface
|
|
0:02:43
|
where the layer2 polling over the failover link
|
|
0:02:47
|
is a separate physical dedicated link between the devices
|
|
0:02:53
|
Now there is two different modes of failover
|
|
0:02:56
|
that are considered stateless or stateful failover
|
|
0:03:00
|
where with stateless failover, the connection state table or the translation table
|
|
0:03:05
|
is not going to be copied from the active device down to the standby device
|
|
0:03:11
|
So this means that if there is a link failure and the standby device needs to take over
|
|
0:03:15
|
or the active completely fails
|
|
0:03:18
|
and standby needs to take over
|
|
0:03:20
|
that all the connections are going to need to be re-eshtablised
|
|
0:03:23
|
so if you are in the middle of a phone call
|
|
0:03:26
|
and failover occurs
|
|
0:03:27
|
if you are running stateless failover, then the phone call is going to drop
|
|
0:03:31
|
Same would be true of like a telnet session
|
|
0:03:34
|
and this is the default mode for
|
|
0:03:36
|
the, the failover
|
|
0:03:39
|
Now for stateful failover
|
|
0:03:42
|
the active unit is constantly going to be copying the state table down
|
|
0:03:46
|
thats going to include things like the NAT translations
|
|
0:03:49
|
the tcp sessions, the udp sessions
|
|
0:03:52
|
any of the phase I
|
|
0:03:54
|
negotiations for IKE
|
|
0:03:56
|
which is our ISEkamp security association
|
|
0:03:59
|
any of out IPSec phase II negotiations
|
|
0:04:02
|
for the IPSec sa
|
|
0:04:04
|
and then other things like the MAC address table
|
|
0:04:06
|
if we were in transparent mode
|
|
0:04:08
|
or the apr cache if we are in routed mode
|
|
0:04:12
|
but to do this
|
|
0:04:13
|
we are going to have to have a dedicated link that is specifically for the stateful failover
|
|
0:04:18
|
it could be the same interface that were doing the normal failover tracking on
|
|
0:04:22
|
or it could be an additional dedicated physical link
|
|
0:04:28
|
Now for the active standby failover
|
|
0:04:32
|
we are going to use the dedicated
|
|
0:04:33
|
failover interface which is also called the lan based failover
|
|
0:04:37
|
for the standby asa
|
|
0:04:40
|
to pull the active one
|
|
0:04:43
|
Now once we do the initial
|
|
0:04:45
|
failover configuration
|
|
0:04:47
|
we are going to make all of our initial ... changes on the active firewall
|
|
0:04:52
|
then the configuration is going to be replicated down from the active to the standby one
|
|
0:04:58
|
Now as I mentioned by default it is not stateful failover so the state tables are not going to be copied
|
|
0:05:04
|
but this portion here
|
|
0:05:07
|
this is going to be a very important point
|
|
0:05:09
|
when we look at the order of operations of our configuration
|
|
0:05:13
|
if you configure the failover in the wrong order
|
|
0:05:17
|
you can have the standby
|
|
0:05:20
|
firewall become active
|
|
0:05:22
|
and replicate its configuration over to who is supposed to be active one
|
|
0:05:29
|
So lets say for example that we have
|
|
0:05:32
|
in .. just like in our particular design
|
|
0:05:34
|
that were trying to do failover
|
|
0:05:36
|
for asa2
|
|
0:05:39
|
so asa2 is completely configured
|
|
0:05:42
|
with its interfaces with its IP addresses, with its routing
|
|
0:05:46
|
with all its policies configured
|
|
0:05:48
|
then we bring asa1 online
|
|
0:05:52
|
and we want this to be the backup device or the standby device
|
|
0:05:57
|
Now if I configure this in the wrong order
|
|
0:06:00
|
and asa1 accidentally becomes
|
|
0:06:03
|
the active router or the active firewall
|
|
0:06:06
|
then it could replicate its config
|
|
0:06:08
|
this direction
|
|
0:06:10
|
and if the only thing that I have is just the failover
|
|
0:06:15
|
and essentially everything is blank
|
|
0:06:17
|
other than that
|
|
0:06:19
|
you can end up
|
|
0:06:20
|
failing a blank configure over, blank configuration over the correct one
|
|
0:06:26
|
So before you do this, you always want to make sure that you have a backup of the configuration
|
|
0:06:31
|
of the device that is supposed to be active
|
|
0:06:34
|
So in the case that you do in the wrong order and the config gets deleted
|
|
0:06:37
|
that you could just paste it back in and its going to recover
|
|
0:06:42
|
Now once the failover actually occurs
|
|
0:06:46
|
the standby unit is going to take over
|
|
0:06:49
|
for not only the ip address
|
|
0:06:51
|
but the mac address of the primary device
|
|
0:06:55
|
so this would then be subject to any other layer2 convergence
|
|
0:07:00
|
that we have underneath
|
|
0:07:02
|
So the amount of time that a layer2 switch needs to update cam table
|
|
0:07:05
|
to possible update there spanning tree topology
|
|
0:07:09
|
again the rest of the network infrastructure
|
|
0:07:11
|
is going to have to plug in to the overall high availability design
|
|
0:07:17
|
now the failover can either be detected
|
|
0:07:20
|
based on the polling thats happening on the physical link
|
|
0:07:24
|
or the polling thats happening on the the
|
|
0:07:27
|
ip interfaces, like with the icmp pings
|
|
0:07:30
|
or for testing we can do manual failover
|
|
0:07:34
|
So I can say that I want this standby unit to become active
|
|
0:07:38
|
then once we
|
|
0:07:40
|
start forwarding the traffic on the new device
|
|
0:07:43
|
then we could take the old one out for
|
|
0:07:45
|
whatever type of maintenance that we need to do on it
|
|
0:07:48
|
So its kind of like doing manual failover of
|
|
0:07:52
|
a route processor
|
|
0:07:53
|
in like the 7200s, or the manual failover of a supervisor in
|
|
0:07:58
|
some of the catalyst switches
|
|
0:08:03
|
Now for the actual configuration
|
|
0:08:05
|
in active standby
|
|
0:08:08
|
first thing we need to do is specify
|
|
0:08:10
|
what is going to be the ip addressing
|
|
0:08:13
|
of the primary unit
|
|
0:08:15
|
versus the secondary unit
|
|
0:08:18
|
So since on interfaces that is running ip
|
|
0:08:21
|
like our normal inside or normal outside
|
|
0:08:24
|
the monitoring is going to be using icmp
|
|
0:08:27
|
we need to make sure that the
|
|
0:08:29
|
primary device has an ip address in the secondary device
|
|
0:08:33
|
so like in our design here
|
|
0:08:35
|
with asa2 if we are going to have it fail it over to asa1
|
|
0:08:39
|
on the outside link
|
|
0:08:41
|
I need an address that is dedicated to asa1
|
|
0:08:45
|
on the DMZ and on the inside
|
|
0:08:47
|
assuming that I want to poll those interfaces or track those interfaces to make sure that they are
|
|
0:08:52
|
actually working
|
|
0:08:56
|
So once I configure my addressing
|
|
0:08:58
|
I am going to specify who is the unit that is the primary one
|
|
0:09:01
|
this is the one that is in the charge of replicating the configuration
|
|
0:09:05
|
down to the other devices
|
|
0:09:08
|
I then need to specify whats the actual interface
|
|
0:09:11
|
that I am going to send the failover information over
|
|
0:09:14
|
this is the failover lan interface
|
|
0:09:17
|
and then specify what the ip addressing is on that
|
|
0:09:21
|
So in my case I am going to use
|
|
0:09:24
|
ethernet 0/2 as the failover interface
|
|
0:09:27
|
when I configure the IP
|
|
0:09:30
|
its not going to go under interface ethernet 0/2 mode
|
|
0:09:34
|
its going to go in global config
|
|
0:09:36
|
with the failover interface ip command
|
|
0:09:40
|
then last thing I am going to enable failover
|
|
0:09:42
|
with just the failover command
|
|
0:09:46
|
now this option here, this is going to be the very last step
|
|
0:09:49
|
once both the active device
|
|
0:09:52
|
and the primary, once the active device and the standby device are configured for the failover
|
|
0:09:58
|
if you do this one of the wrong order
|
|
0:10:01
|
thats where you can end up in the case where you
|
|
0:10:03
|
fail the configuration over in the wrong direction
|
|
0:10:06
|
and you override your good working configuration with the blank one
|
|
0:10:12
|
Now for the
|
|
0:10:14
|
standby configuration or the secondary configuration
|
|
0:10:18
|
we need to designate this unit as the
|
|
0:10:21
|
secondary, as a host to the primary
|
|
0:10:23
|
So it means that it is going to be receiving
|
|
0:10:25
|
the configuration down from the
|
|
0:10:27
|
the active unit, or from the primary unit
|
|
0:10:31
|
the rest of the configuration is
|
|
0:10:33
|
pretty standard
|
|
0:10:34
|
as compared to the other one
|
|
0:10:36
|
where we need to specify whats the interface we are actually doing it on
|
|
0:10:40
|
whats the address thats going to be assigned there
|
|
0:10:42
|
and then finally enable the failover
|
|
0:10:46
|
so after this final step, if I turn
|
|
0:10:48
|
failover that actual command
|
|
0:10:50
|
failover on the active device and the standby device
|
|
0:10:53
|
then I should see the configuration actually replicate down
|
|
0:10:57
|
between the two of them
|
|
0:10:59
|
So we should see that the active device tries to poll
|
|
0:11:03
|
the secondary one, it says that searching for a secondary mate
|
|
0:11:08
|
then on the secondary device, you should see it detect the active device
|
|
0:11:12
|
and then the configuration be copied down
|
|
0:11:17
|
okay there is a question why are the standby IPs
|
|
0:11:21
|
on the various interfaces of the secondary device if its going to assume
|
|
0:11:25
|
the primary address
|
|
0:11:27
|
the reason is because of the monitoring
|
|
0:11:29
|
So when the
|
|
0:11:31
|
standby device is tracking the active one
|
|
0:11:35
|
its going to do that with the icmp messages between the
|
|
0:11:39
|
active address and the standby address
|
|
0:11:42
|
but once the standby device takes over as the primary unit
|
|
0:11:47
|
its going to inherit the
|
|
0:11:49
|
primary address or the active address
|
|
0:11:51
|
So which one is actuall used
|
|
0:11:54
|
is dependent on whether you are active or standby
|
|
0:11:56
|
and we will see that when we will look at the show failover on the command line
|
|
0:12:03
|
Now in addition to configuring the normal failover parameters
|
|
0:12:07
|
like who is the primary or secondary units
|
|
0:12:09
|
whats the interface we are going to use, whats the address we are going to use
|
|
0:12:13
|
we may also want to change how the
|
|
0:12:16
|
devices are actually tracking each other
|
|
0:12:18
|
and what particular links that they are doing the
|
|
0:12:21
|
the tracking on
|
|
0:12:23
|
now the polling of the device itself to the entire unit
|
|
0:12:28
|
is going to controlled by the failover poll time
|
|
0:12:32
|
where we could also control this per interface level
|
|
0:12:36
|
with a separate failover poll time for the interface
|
|
0:12:40
|
but under normal cases
|
|
0:12:43
|
the fault detection is at the link level
|
|
0:12:45
|
thats already going to be on by default
|
|
0:12:48
|
So we are looking at a case where the
|
|
0:12:51
|
secondary device is sending the layer2 keep alive
|
|
0:12:55
|
so thats the failover polling
|
|
0:12:57
|
if the active device doesn't respond
|
|
0:13:00
|
it things that there is some sort of software failure
|
|
0:13:03
|
So may be the link is still on the up and up state
|
|
0:13:06
|
but if it can't respond to the failover
|
|
0:13:09
|
may be there is
|
|
0:13:10
|
a some sort of hung interface
|
|
0:13:14
|
so its going to know this based on the fail over polling
|
|
0:13:17
|
for the link level
|
|
0:13:19
|
this is where we are using the ip
|
|
0:13:22
|
packets for monitoring, we are using the icmp pings
|
|
0:13:25
|
this is would be controlled by the monitor interface
|
|
0:13:27
|
and then the failover poll time for the interface
|
|
0:13:31
|
So if have had the case where
|
|
0:13:34
|
the ASAs are running failover
|
|
0:13:38
|
So we have asa1
|
|
0:13:41
|
asa2
|
|
0:13:44
|
we have the outside interfaces
|
|
0:13:47
|
but these links are not going to be physically connected to each other
|
|
0:13:51
|
they are going to go to some layer2 switches
|
|
0:13:54
|
So whatever they are, we will say switch1 and switch2 in this case
|
|
0:13:58
|
but if there is a failure
|
|
0:14:01
|
on ASA1's outside link
|
|
0:14:05
|
we know that this is not going to effect the line protocol
|
|
0:14:08
|
of the outside interface of asa2
|
|
0:14:12
|
so this is why we want do the monitoring on these interfaces
|
|
0:14:17
|
but we also want to do it on the dedicated failover link between them
|
|
0:14:22
|
So if I don't care to keep alive here
|
|
0:14:24
|
then I definitely know, I need to take over the failover state
|
|
0:14:28
|
but for the polling of the interfaces here
|
|
0:14:30
|
this is going to be based on a policy that we can define
|
|
0:14:34
|
which is the interface monitoring policy
|
|
0:14:37
|
So the monitoring policy says normally
|
|
0:14:40
|
that if one of your interfaces you are tracking goes down
|
|
0:14:44
|
then failover should occur
|
|
0:14:47
|
but I can't say that I am going to wait
|
|
0:14:49
|
for may be two of the links to go down
|
|
0:14:51
|
before failover occurs
|
|
0:14:53
|
because I might have multiple outsides and multiple insides
|
|
0:14:58
|
which I may not care if one of my inside links go down
|
|
0:15:01
|
as long as there is still one left
|
|
0:15:04
|
but if both of them go down then may I want to fail over to the
|
|
0:15:08
|
secondary device
|
|
0:15:11
|
okay, there is a question the failover interface ip don't change
|
|
0:15:15
|
its all the other interfaces for the through traffic
|
|
0:15:21
|
that change the
|
|
0:15:25
|
correct, so the
|
|
0:15:26
|
if I would have said that the primary address
|
|
0:15:29
|
was
|
|
0:15:31
|
like in this case, if I would have said the primary address were
|
|
0:15:35
|
the .12 here
|
|
0:15:38
|
so router5 is using that as its default gateway
|
|
0:15:40
|
the ACS server is using that .12 address as its gateway
|
|
0:15:44
|
if this fails over to asa1
|
|
0:15:47
|
asa1 needs to pick this address of the .12
|
|
0:15:51
|
in which case then asa2
|
|
0:15:54
|
would be taking over with whatever the
|
|
0:15:57
|
the secondary address is, that was configured
|
|
0:16:01
|
So if asa2 fails and then comes back later
|
|
0:16:04
|
its not automatically going to do a premption
|
|
0:16:08
|
we can either force it to
|
|
0:16:10
|
become active again or just
|
|
0:16:12
|
leave it at the, in the standby state, for a later time
|
|
0:16:17
|
because ideally it really doesn't matter which device is active versus standby
|
|
0:16:21
|
because they have to have the same physical properties to begin with
|
|
0:16:25
|
so the same type of physical interfaces
|
|
0:16:27
|
the same model, the same, even, licensing
|
|
0:16:31
|
or you cannot run failover between like a
|
|
0:16:33
|
5540 and 5520
|
|
0:16:35
|
will have to be two 5540s or two 5520s
|
|
0:16:42
|
then the other portion that we would need to
|
|
0:16:45
|
be concerned about here how does the stateful failover work
|
|
0:16:49
|
Now this is configured separately than the lan based failover
|
|
0:16:53
|
which is going to use either a separate link
|
|
0:16:56
|
or potentially the same interface
|
|
0:16:59
|
that is used for the lan failover
|
|
0:17:02
|
but the key is that state information
|
|
0:17:04
|
needs to be exchanged between the active device and the stand by device
|
|
0:17:09
|
and from a design point of view
|
|
0:17:11
|
usually its recomended to not use the same interface
|
|
0:17:15
|
because the amount of state information can generate a lot of traffic
|
|
0:17:21
|
so specially for very
|
|
0:17:23
|
high traffic platforms
|
|
0:17:26
|
when you look at some of the
|
|
0:17:27
|
the higher models
|
|
0:17:29
|
we may be forwarding incredibly fast over the links
|
|
0:17:33
|
if we go to cisco.com/go/asa
|
|
0:17:38
|
and look at the model comparision
|
|
0:17:41
|
So some of these high grade ones, the high end network security appliances
|
|
0:17:45
|
this one says the
|
|
0:17:48
|
the through put is 5 Giga bits per seconds
|
|
0:17:51
|
So thats the aggregate of all the links
|
|
0:17:54
|
So if we were forwarding that fast
|
|
0:17:57
|
and it says we could make potentially 90,000 connections per second
|
|
0:18:01
|
or even some of the higher level platforms, this one says we can make 3,50,000 connections per second
|
|
0:18:07
|
figure each time you make one of those
|
|
0:18:09
|
you would need to replicate that state down to the other device
|
|
0:18:14
|
So typically when we get to the larger level platforms
|
|
0:18:18
|
you would want a different interface
|
|
0:18:20
|
thats used for the stateful failover
|
|
0:18:23
|
versus the normal
|
|
0:18:24
|
lan failover
|
|
0:18:26
|
but in our application, we are going to use the same one
|
|
0:18:30
|
but the configuration of the stateful failover is only one additional command its failover link
|
|
0:18:37
|
So the thats the interface that the state information is then going to be exchanged over
|