Thursday, January 28, 2016

Upgrading Check Point Gateway Cluster (R77.30)

Install / Upgrade Checkpoint Full HA (Gateway and Management) is the old post for installing or upgrading to R77.10. This post is recorded for R77.30 upgrading purpose with more details , although all steps are almost same as previous version. 
1. Standalone Check Point Gateway Upgrade
Check Point Product Upgrade is not that complicated and Check Point has provided a couple of ways to do it :
1.1 CPUSE (WebUI)
You will need vaild license and your gateway will need Internet access to connect to Check Point User Center for updating available hotfix/packages list. You also can import the package downloaded manually from Check Point Support site then do installation from CPUSE / WebUI interface.




1.2 CLI
Command line is also popular way to install Check Point Hotfix / Upgrading Package. Here are commands used to do R77.30 upgrades.

[Expert@HostName]# tar -zxvf Check_Point_R77.30_T204.Gaia.tgz
[Expert@HostName]# ./UnixInstallScript


2. Check Point Gateway Cluster Upgrade
Those methods are great for Standalone Check Point implementation. If your implementation is Cluster (ClusterXL), the procedures are different. You will have to think about down time during upgrading.

Check Point has SK107042 to list all ways for Cluster Upgrade:

  • Minimal Effort (Maximum Downtime will be needed)
  • Zero Downtime (Short Downtime for old Connections to be dropped)
  • Optimal Service Upgrade (Very Short Downtime for old Connections to be dropped)
  • Connectivity Upgrade (No Downtime , no Connection dropped)
  • Full Connectivity Upgrade (Not Supported after R75GA )
Upgrade Method
Description
Network Impact
Duration of Upgrade
(Simple Upgrade)
Cluster members can be upgraded to any version.
Each cluster member is upgraded as an independent Security Gateway.
Existing connections are disrupted.
No connectivity as all cluster members are out of for maintenance.
Requires a substantial maintenance window.
As long as it takes to upgrade all cluster members.
Cluster members can be upgraded to any version.
During this type of upgrade, there is always at least one Active cluster member that handles the traffic.
Connections are not synchronized between cluster members running different Check Point software versions.
Upgraded cluster members are in "Ready" state until the cluster members running the previous version are stopped (with cphastop, or cpstop command).
Connections that were initiated on a cluster members running the previous version are dropped when the cluster member is upgraded to a new version.
Requires a relatively short maintenance window for old connections to be dropped.
Relatively short.
Cluster members can be upgraded according to the "Upgrade paths" table below.
Newly established connections are forwarded to the upgraded cluster members while the cluster members running the previous version continue to inspect the old existing connections.
The more time the upgrade procedure takes, the less old connections exist, and upon stopping the cluster members running the previous version, the connection drop is minimal.
Despite long duration of this upgrade procedure, security and connectivity are fully maintained.
A minimal number of connections that were initiated before the upgrade and were not closed during the upgrade procedure are dropped after the upgrade.
Requires a very short maintenance window for old connections to be dropped.
Long.
The nature of this upgrade procedure requires time for old connections to be closed while newly established connections are transferred to the upgrade clcuster members for inspection.

also refer to

Cluster members can be upgraded to R77.20 and above according to the "Upgrade paths" table below.
Connection failover is guaranteed.
The procedure is very similar to "Zero Downtime" with the addition of synchronizing the connections to the upgraded cluster members.
No connections are dropped.
Requires no maintenance window.
Short.
This upgrade method is considered obsolete and not supported since R75 GA.


Upgrade methodsActually even you read all those documentations, you may still confuse which way I should go. Which one is best for this upgrade?

Check Point explained some situations for those methods:

Effort and time efficient upgrades with some loss of connectivity
  • Simple Upgrade (with downtime) - Select this option if you have a period of time during which network downtime is allowed. This method is the simplest, because each cluster member is upgraded as an independent Gateway.
  • Zero Downtime - Select this option if you cannot have any network downtime and need to complete the upgrade quickly, with a minimal number of dropped connections. During this type of upgrade, there is always at least one active member that handles traffic. Connections are not synchronized between cluster members running different Check Point software versions.
    Note - Connections that were initiated on a cluster member running the old version get dropped when the cluster member is upgraded to a new version. Network connectivity, however, remains available during the upgrade, and connections initiated on an upgraded cluster member are not dropped.


Upgrades that guarantee minimal connectivity loss
  • Optimal Service Upgrade (OSU) - Select this option if security is of utmost concern. During this type of upgrade two cluster members process network traffic. Connections that are initiated during the upgrade stay up through the upgrade. A minimal number of connections that were initiated before the upgrade get dropped after the upgrade.
  • Connectivity Upgrade (CU) - Select this option, if you need to upgrade a Security Gateway or a VSX cluster to any version, and guarantee connection failover. Connections that were initiated before the upgrade are synchronized with the upgraded Security Gateways and cluster members so that no connections are dropped.


From my experience, Zero Downtime usually is good for most situations. If there is a requirement for no downtime, Connectivity Upgrades will be the only choice.

No matter which way, backup is always first thing you must do. Either Snapshot or Backup will help you when there is unexpected failure happens during upgrading.
Snapshot

Backup
Here is the steps I were doing Cluster upgrade from R77.10 to R77.30.
2.1 Find and Download Check Point Upgrade Package from Upgrade Wizard
You will download a upgrade package Check_Point_R77.30_T204.Gaia.tgz which size is about 1.5G.

2.2 Upload Image to Check Point Gateways

You may need to use command to change user cli shell from Clish to Bash, so you could use sftp to upload this 1.5G size file into installation folder.

HostName> set user admin shell /bin/bash
HostName> save config

To Change it back , you can use following command from Expert mode:

[Expert@Pub-cp2:0]# chsh -s /etc/cli.sh admin
Changing shell for admin.

Shell changed.
Pub-cp2> set user admin shell 
shell: specifies the user's command interpreter, which is invoked on login.

        Range: No range. See file /etc/shells for valid login shells.
        Default: /etc/cli.sh.

Pub-cp2> set user admin shell /etc/cli.sh

Pub-cp2> save config

After you changed user CLI shell to bash, you will be able to use sftp software such as WinSCP to upload it.

2.3 Install Uploaded Package


[Expert@FW-CP1:0]# tar -zxvf Check_Point_R77.30_T204.Gaia.tgz

[Expert@FW-CP1:0]# ./UnixInstallScript 


***********************************************************
Welcome to Check Point R77.30 installation 
***********************************************************
Verifying installation environment for R77.30...Done!
The following components will be installed:
* R77.30
Installation program is about to stop all Check Point Processes.
Do you want to continue (y/n) ? y
Stopping Check Point Processes...Done!
Installing Security Gateway / Security Management R77.30...Done!

Installing Mobile Access R77.30...Done!

Installing Performance Pack R77.30...Done!

UserAuthority Server  is not installed. Skipping installation.

INIT: version 2.86 reloading

Installing GAIA R77.30...Done!


************************************************************************

Package Name                                                    Status
------------                                                    ------
Security Gateway / Security Management R77.30                   Succeeded

Mobile Access R77.30                                            Succeeded


Performance Pack R77.30                                         Succeeded


UserAuthority Server R77.30                                     Skipped


GAIA R77.30                                                     Succeeded



************************************************************************


Installation program completed successfully.
Do you wish to reboot your machine (y/n) ? y

Broadcast message from admin (pts/2) (Thu Jan 28 15:22:02 2016):

The system is going down for reboot NOW!

Broadcast message from admin (pts/2) (Thu Jan 28 15:22:02 2016):

The system is going down for reboot NOW!

INIT: Sending processes the TERM signalhu Jan 28
[Expert@FW-CP2:0]# Stopping sshd: [  OK  ]
Stopping arp: <not configured> 
Stopping xinetd: [  OK  ]
Stopping acpi daemon: [  OK  ]
Stopping crond: [  OK  ]
CPshell shutdown:  [  OK  ]
Stopping auditd: [  OK  ]
Shutting down kernel logger: [  OK  ]
Shutting down system logger: [  OK  ]
Starting killall:  [  OK  ]
Starting bypass_on:  [  OK  ]
Sending all processes the TERM signal... xpand[6routed[4874]: task_terminate: manager quitting
routed[4874]: Exit routed[4874] version routed-06.21.2015-13:44:30

Sending all processes the KILL signal... 
Saving random seed:  
Syncing hardware clock to system time 
Turning off swap:  
Unmounting file systems:  
mount: /proc is busy
Please stand by while rebooting the system...
Restarting system.



Output from console for rebooting:

                                                                                  
PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved                       
PCI: Not using MMCONFIG.                                                        
ACPI: Getting cpuindex for acpiid 0x31.80GHz                                    
ACPI: Getting cpuindex for acpiid 0x4                                           
ΓΏ  Reading all physical volumes.  This may take a while...                      
  Found volume group "vg_splat" using metadata type lvm2                        
  8 logical volume(s) in volume group "vg_splat" now active                     
Setting clock  (utc): Thu Jan 28 15:22:41 EST 2016 [  OK  ]                     
Starting udev: [  OK  ]CA                                                       
Setting hostname FW-CP1:  [  OK  ]                                         
Setting up Logical Volume Management:   8 logical volume(s) in volume group "vg_splat" now active
[  OK  ]                                                                        
Checking filesystems                                                            
Checking all file systems.                                                      
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a /dev/mapper/vg_splat-lv_current         
/dev/mapper/vg_splat-lv_current: clean, 41374/4194304 files, 2589690/8388608 blocks
[/sbin/fsck.ext3 (1) -- /boot] fsck.ext3 -a /dev/sda1                           
/boot: clean, 230/38152 files, 96780/152584 blocks                              
[/sbin/fsck.ext3 (1) -- /var/log] fsck.ext3 -a /dev/mapper/vg_splat-lv_log      
/dev/mapper/vg_splat-lv_log: clean, 2503/15728640 files, 3081118/15728640 blocks
[  OK  ]hardware clock to system time 
Remounting root filesystem in read-write mode:  [  OK  ]
Mounting local filesystems:  [  OK  ]
vm.balance_pgdat_limit = 20
vm.balance_pgdat_zone = 2ooting the system...
grep: /etc/udev/rules.d//00-OS-XX.rules: No such file or directory
vm.max_map_count = 524288
Enabling /etc/fstab swaps:  [  OK  ]
INIT: Entering runlevel: 3
Applying Intel CPU microcode update: [  OK  ]
Starting sysstat:  Calling the system activity data collector (sadc): 
[  OK  ]
Running UP accel driver check.
IP series driver not present
Starting background readahead: [  OK  ]
Checking for hardware changes [  OK  ]
Configuring ipv6 kernel support:  ipv6_xlate[4451]: ipv6_xlate: FW ipv6 state OFF
[  OK  ]
Starting kdump:[  OK  ]
Inserting ipsctlmod.2.6.18.cp.i686: [  OK  ]
CKP: Loading SecureXL:  [  OK  ]
no ixgbe interfaces on the machine
no igb interfaces on the machine
CKP: Loading FW-1 IPv4 Instance 0:  [  OK  ]
CKP: Loading VPN-1     Instance 0:  [  OK  ]
CKP: Loading FW-1 IPv4 Instance 1:  [  OK  ]
CKP: Loading VPN-1     Instance 1:  [  OK  ]
FW1: Starting cpWatchDog
fwha_read_boot_conf: WARNING: cluster_id is not set in ha_boot.conf.
Starting wrp:  
[  OK  ]
Starting auditd: [  OK  ]
Starting system logger: [  OK  ]
Starting kernel logger: [  OK  ]
Fulcrum switch not installed
Starting upgrade_db:  [  OK  ]
Update Interfaces in Database:  0 bindings were imported
[  OK  ]
Generating vrfs:  [  OK  ]
Configuring NetAccess:  [  OK  ]
Generating NTP configuration:  [  OK  ]
Generating Time Zone configuration:  [  OK  ]
Generating domain name configuration:  [  OK  ]
Generating keyboard mapping configuration:  [  OK  ]
Generating hostname configuration:  [  OK  ]
Configuring Interfaces:  [  OK  ]
Generating /etc/monitor_mode:  [  OK  ]
Generating /etc/fonic_pairs:  [  OK  ]
Configuring NDP:  [  OK  ]
Generating hosts.conf:  [  OK  ]
Generating resolv.conf:  [  OK  ]
Generating dhclient.conf:  [  OK  ]
Generating pwcontrol.conf [  OK  ]
Generating passwd + shadow [  OK  ]
Generating group + gshadow [  OK  ]
Generating routed.conf [  OK  ]
Generating routed0.conf [  OK  ]
Generating extended commands:  [  OK  ]
Generating MOTD:  [  OK  ]
Generating banner message:  [  OK  ]
Generating /etc/raddb/server:  [  OK  ]
Generating TACACS+ configuration:  [  OK  ]
Generating /etc/msmtp.conf:  [  OK  ]
Generating /etc/pam.d/system-auth:  [  OK  ]
Generating /etc/sysconfig/external.if:  [  OK  ]
Generating /etc/lldpd.conf:  [  OK  ]
Generating DHCP server configuration:  Write DSTATE called 
ServerConfigured = 1 
DdnsConfigured = 0 
[  OK  ]
Generating /etc/adjust_radius:  [  OK  ]
Running /bin/arp_xlate:  [  OK  ]
Generating SNMP configuration:  [  OK  ]
Generating SNMP Monitor configuration:  [  OK  ]
Generating Job Scheduler configuration:  [  OK  ]
Updating general configuraion file:  [  OK  ]
Updating syslogd configuration:  Reloading syslogd...[  OK  ]
Reloading klogd...[  OK  ]
[  OK  ]
Updating httpd2 configuration:  [  OK  ]
 Updating httpd-ssl configuration:  [  OK  ]
Applying NetFlow configuration [  OK  ]
Configuring PPPoE:  [  OK  ]
Configuring hostaccess:  [  OK  ]
CPshell initialization:  [  OK  ]
Initializing CP Process Manager..
Starting cp_pm_rl2:  [  OK  ]
Starting cp_pm_rl3:  [  OK  ]
Starting cp_pm_rl4:  [  OK  ]
Starting acpi daemon: [  OK  ]
Starting sshd: [  OK  ]
Starting arp: <not configured> 
Starting xinetd: [  OK  ]
Starting bp_init:  [  OK  ]
Starting bypass_off:  [  OK  ]
Starting crond: [  OK  ]
Starting cpri_d:  cpridstart: Starting cprid
[1] 7362
[  OK  ]
Starting cpboot:  cpstart: Power-Up self tests passed successfully

cpstart: Starting product - SVN Foundation

SVN Foundation: cpWatchDog already running
Starting cpviewd
starting the history daemon
cpwd_admin: 
Process HISTORYD started successfully (pid=7428) 
SVN Foundation: Starting cpd
SVN Foundation: Starting PostgreSQL Database
Multiportal daemon: starting mpdaemon
SVN Foundation started

cpstart: Starting product - VPN-1

FireWall-1: starting external VPN module -- OK
fwha_read_boot_conf: WARNING: cluster_id is not set in ha_boot.conf.
cpwd_admin: 
Process CPHAMCSET started successfully (pid=7728) 
FireWall-1: Starting fwd

SecureXL disabled, cannot use affinity commands
SecureXL will be started after a policy is loaded. 
FireWall-1: Fetching policy

Installing Security Policy FW_1 on all.all@FW-CP1
Fetching Security Policy from localhost succeeded
SIM: using arbitrary CPU 0

Fetching FW1 Security Policy From: 10.4.2.5

 Local Policy is Up-To-Date.
 The Policy was not installed because it is the same as the Policy already on the Security Gateway.
Installing Threat Prevention policy from -n

Fetching Threat Prevention Security Policy From: 10.4.20.50 

Threat Prevention Security Policy wasn't loaded
Fetching Threat Prevention policy failed
AntiMalware was not started
FireWall-1: enabling bridge forwarding
FireWall-1 started
SIM: using arbitrary CPU 0

cpstart: Starting product - FloodGate-1

FloodGate-1 is disabled. If you wish to start the service, please run 'etmstart enable'.

cpstart: Starting product - SmartView Monitor

SmartView Monitor: Not active

cpstart: Starting product - SmartLog


cpstart: Starting product - Mobile Access

Mobile Access service is disabled.
If you wish to start Mobile Access, please enable the Mobile Access blade in the SmartDashboard and configure the Mobile Access policy.

cpstart: Starting product - Deployment Agent

cpwd_admin: 
Process DASERVICE started successfully (pid=9527) 
[  OK  ]
Starting cpboot_refetch:  [  OK  ]
Inserting vrrp_lkm.2.6.18.cp.i686: [  OK  ]


This system is for authorized use only.
login: 




Log in to gateway to verify installed package.

[Expert@FW-CP1:0]# fw ver
This is Check Point's software version R77.30 - Build 503
[Expert@FW-CP1:0]#



2.4 Check Point Mgmt Server Changes

2.4.1 Change Gateway version to R77.30
2.4.2 Install Policy with clear the check for option "For Gateway Clusters install on all the members, if it fails do not install at all"

Installation to the upgraded gateway (R77.30) will be successful with some warning notification since the installation will fail on active but non-upgraded gateway (R77.10). You can safely ignore it.

2.5 Upgrade another Cluster member from R77.10 to R77.30
On left R77.10 gateway, you can do cpstop to failover active role to new R77.30 gateway, then you will following same steps on 2.4 to upgrade R77.10 to r77.30.

After second gateway upgraded to R77.30, you can push policy again but this time you will not get warning notification for policy push status.

2.6 Verify Status

[Expert@FW-CP1:0]# cphaprob stat

Cluster Mode:   High Availability (Active Up) with IGMP Membership

Number     Unique Address  Assigned Load   State       

1 (local)  10.9.9.15     0%              Ready            

(*) 'Ready' state might be caused due to configuration inconsistency between members:
    32bit/64bit/usermode, number of CoreXL instances or different SW version.

[Expert@FW-CP1:0]# cphaprob stat

Cluster Mode:   High Availability (Active Up) with IGMP Membership

Number     Unique Address  Assigned Load   State       

1 (local)  10.9.9.15     100%            Active          


[Expert@FW-CP1:0]# fw stat
HOST      POLICY     DATE            
localhost FW_Policy_1  28Jan2016 15:42:36 :  [>eth1] [<eth1] [>eth2] [<eth2] [>eth3] [>Mgmt] [<Mgmt] 
[Expert@FW-CP1:0]#  



Reference:



1 comment:

  1. Getting this error messages. Can you help? What dir did you run that command from?
    ./UnixInstallScript
    bash: ./UnixInstallScript: No such file or directory

    ReplyDelete