Best security practices for ESXi environments

Credit to Author: Angela Gunn| Date: Wed, 07 Aug 2024 09:52:08 +0000

Organizations worldwide use the VMware ESXi hypervisor for virtualization. ESXi is a type-1 (or “bare metal”) hypervisor, which means that it sits directly on the hardware, rather than atop an operating system such as Windows.  It is common for enterprises to run mission-critical servers on one or more ESXi hosts, all managed by vCenter Server (VMware’s platform for managing such environments and their dependent components).

Unfortunately for defenders, ESXi hosts themselves do not currently support natively run EDR (endpoint detection and response). If logging is enabled, certain events on those hosts can be forwarded to a SIEM, but this workaround is less than ideal for a variety of reasons. There are a ton of small- and mid-size businesses that have neither a SIEM, nor the staffing to properly monitor and react to SIEM logs and alerts. This gap in protection has not gone unnoticed by attackers. In particular, all too many ransomware attacks over the years have exploited this issue.

The Sophos Managed Risk team regularly fields questions about insecure host configurations, and provides guidance for how those can be remediated. Though nothing can substitute for in-depth conversations with live humans, we’ve compiled a top-ten list of recommended practices for this article. Where appropriate, we describe and link to the most current instructions available, which are generally maintained by VMware (Broadcom) itself. In a few cases, we’ve shared tips or tricks we’ve amassed through our own experience with these remediations.

Why ESXi?

What make ESXi hosts so attractive to attackers? It’s a matter of speed and efficiency, in addition to ESXi’s significant market share.

Generally speaking, with insecure host configurations, an attacker doesn’t even have to rely on the type of exploits that EDR would typically flag — in other words, if they aim for the host, the bar for attackers is set far lower. (Think like an attacker applies here: Why deal with EDR, and potentially even MDR (managed detection and response), by attacking the VMs themselves, when you can just duck all those protections and target the underlying, insecurely configured host?)

By targeting the host, an attacker can quickly do a disproportionate amount of damage to an organization — encrypting an entire ESXi host, along with the VMs it is hosting, literally with one click. For some organizations, an attacker might potentially still wreak havoc, and command a ransom payment, if they only encrypt the ESXi infrastructure. (Sophos X-Ops’ Incident Response team has written about potential methods to extract data from encrypted virtual disks, but it’s obviously best to never reach that state.)

Fortunately, there are things defenders can do to interfere with an attack on ESXi. At minimum, these precautions slow attackers down (giving defenders more opportunity to detect and respond), and they may even succeed in stopping the attack against ESXi altogether. This article covers ten tactics, with links to source materials and additional information where appropriate. In no particular order:

  1. Ensure that vCenter and ESXi hosts are running supported versions and are fully patched
  2. Consider not joining vCenter and ESXi hosts to the domain
  3. Enable normal lockdown mode
  4. Deactivate SSH when not in use
  5. Enforce password complexity for vCenter and ESXi hosts
  6. Require account lockout after failed login attempts
  7. Enable UEFI Secure Boot
  8. Configure host to only run binaries delivered via signed VIB
  9. Deactivate Managed Object Browser (MOB), CIM, SLP, and SNMP services if not used
  10. Set up persistent logging

For the purposes of this guide, we will use ESXi (as opposed to vSphere) to denote the host-plus-management-center configuration in question.

Where possible, this guide covers implementation of the recommendations for environments that utilize vCenter to manage all hosts, as well as environments that do not.

Ensure that vCenter Server and ESXi hosts are running supported versions and are fully patched

Why it helps

Ensuring that all vCenter Servers and ESXi hosts are running supported versions of their respective software, and that they are patched regularly, will reduce the attack surface associated with known vulnerabilities for which a patch exists.

How to do it

When applying updates, it is recommended to first update vCenter Server (if an update is available), and then update the ESXi hosts. It is best that the management layer’s updates be fully in place before the hosts start updating.

At the time of writing (early August 2024), only vCenter Server / ESXi versions 7.0 and 8.0 are currently in support. Moreover, 7.0’s time is coming to an end, as VMware has announced that this version will reach end-of-life on April 2, 2025 and that they will provide no further updates. If you have not already upgraded to 8.0, you should use the time before April 2025 to plan and execute your upgrades. Moreover, VMware strongly recommends having vCenter Server on the same or higher version of the ESXi Host build number; in VMware’s own words, “connecting ESXi Host of a higher build number to vCenter Server may succeed but [is] not recommended.” If you are running a version that’s already out of support, your upgrade situation gets both more urgent and more complicated; in order to upgrade vCenter Server appliances prior to versions 6.7, you must first upgrade to version 6.7 or 7.0, and then upgrade to version 8.0.

While the vCenter process to upgrade versions is essentially a migration to a new instance, patching is straightforward. The patching process is done via the vCenter Server Management portal; the full instruction set is available on the VMware Docs site. (It is advised that you back up vCenter Server before installing any update or patch.)

To upgrade and patch ESXi hosts that are connected to vCenter, you’ll use the vSphere Lifecycle Manager. VMware has published an excellent video covering this multipart process; we’ve found that in this specific situation, it’s easiest to simply watch a video rather than reading the instructions step by step.

To patch a standalone ESXi host via the web client, you’ll need to access the host via SSH (Secure Shell protocol). We will have more to say about proper SSH hygiene in a later section, but for now:

To patch a standalone ESXi host via the web client, you’ll need to access the host via SSH (Secure Shell protocol). We will have more to say about proper SSH hygiene in a later section, but for now:

  1. Select Host > Actions > Enter maintenance mode
  2. Expand Actions again, select Services > Enable Secure Shell (SSH)
  3. Access the host via SSH
  4. Run the following command to identify what current updates and VIBs are installed:
    esxcli software profile get
  5. Run the following command to allow webtraffic through the firewall:
    esxcli network firewall ruleset set -e true -r httpClient
  6. List the online update packages available to you (grep your version at the end for a faster response):
    esxcli software sources profile list -d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml  | grep -i ESXi-7
  7. Identify the package you want to install (ideally the most recent) and insert the package name into the following command:
    esxcli software profile update -p PACKAGE-NAME -d https://hostupdate.vmware.com/software/VUM/PRODUCTION/main/vmw-depot-index.xml
  8. Reboot the host once the update is complete
  9. Verify that the installation was successful by running the following command again:
    esxcli software profile get
  10. If it was successful, run the below command to disable web traffic through the firewall:
    esxcli network firewall ruleset set -e false -r httpClient

Interim Mitigation Options

Running currently supported, fully patched software should always be the goal. That said, there are situations in which the newer version of the software requires upgrades to the hardware on which it’s running. Depending on timing and budget, this may not be something the enterprise can undertake right away. As an interim mitigation, consider running the management functions of the ESXi hosts on a separate network from the VMs on those hosts – ideally, setting up a separate network just for ESXi management. Depending on the resources at your disposal, this could be handled primarily via code, using VLANs and tagging, or even by deploying a combination of physical switches and routers. The goal in this situation is to limit the network exposure of the host until it can be upgraded. It should not be treated as a permanent or even a long-term alternative to upgrading.

Consider not joining vCenter and ESXi hosts to the domain

Why it helps

Just as “keep your estate patched” is good general infosec advice with specific application to ESXi, “mind your passwords” is general advice with special ESXi and vCenter applicability. If an attacker manages by whatever means to acquire credentials to a sufficiently privileged domain account, they may well use those to target vCenter or ESXi hosts for purposes of lateral movement or (again) data encryption. Keeping vCenter and ESXi hosts separated from the organization’s domain reduces this attack surface, especially when combined with unique and complex passwords.

At this writing, Microsoft has just released an advisory regarding a vulnerability that granted full administrative access to the ESXi hypervisor by default, without proper validation, to accounts that had been added to the ESX Admins AD group. Vulnerabilities like these are an additional reason to consider not joining vCenter and ESXi hosts to the domain.

How to do it

In practice, good password hygiene means that an alternate set of credentials will be required for individuals who administer vCenter/ESXi. These credentials should be unique within the organization and should vary significantly from the individuals’ domain password (i.e., domain pass = <securepass>@123, esxi pass = <securepass>@123! is a bad choice). An effective way to manage this is the use of a corporate password manager such as 1Password or Keeper, which can tackle the overhead associated with keeping track of multiple unique passwords or passphrases. A corporate password manager is strongly preferred to an individual employee password manager, as that gives the corporation several benefits; these include the ability to audit security logs associated with the password manager used, enforcement of password complexity, and the ability to require multifactor authentication (MFA) to access the password manager itself. (More on ESXi and MFA in a moment.)

Best practice also dictates that each ESXi administrator-level user should have their own named account, as opposed to sharing “root” or “administrator” accounts. In terms of role permissions within vCenter, there are three roles available:

  • Operator: Local users with the operator user role can read vCenter Server configuration
  • Administrator: Local users with the administrator user role can configure vCenter Server
  • Super Administrator: Local users with the super administrator user role can configure vCenter Server, manage the local accounts, and use the Bash shell

Please note that the default root user in ESXi is a Super Administrator – another strong argument for not permitting shared root or admin accounts. In any case, actions should be taken from root accounts only in very limited circumstances, such as when adding a host to vCenter or when managing local account creation/deletion.

To see a list of all local user accounts in vCenter, access the vCenter appliance shell via an account with Super Administrator privileges and run the following command:

  • localaccounts.user.list

If you wish to add an admin account, this is done with the following command. In all cases, the password prompt will appear after command execution.

  • localaccounts.user.add –role admin –username test –password

If you wish to add an admin account and specify the full name and email of the user:

  • localaccounts.user.add –role admin –username test –password –fullname TestName –email test@mail.com

If you wish to update the password of a user:

  • localaccounts.user.password.update –username test –password

In addition

Complicating matters slightly is the lack of native support of MFA for vCenter access by local accounts. It’s possible to handle that indirectly, should your enterprise choose to do so. In this case, one easy approach would be to use robust (long, unique, complex) passwords as recommended above; while it’s still a single authentication factor, long complex passwords are extremely resistant to brute forcing. Another option would be to set up an isolated network for the ESXi management portals, similar to those described in the “Interim Mitigation Options” section of the previous recommendation. In this case, you would use your MFA-enabled remote access solution of choice to apply access controls to the gateway. Only explicitly defined users would be able to access the jump host (cautious administrators will also wish to define known hosts for each user), and only the jump host, along with the vCenter local users, could access the management portals.

Enable normal lockdown mode

Why it helps

Implementing normal lockdown mode restricts direct access to ESXi hosts, mandating management via vCenter Server to uphold defined roles and access control. This mitigates risks associated with unauthorized or insufficiently audited activities. When a host is in lockdown mode, users on the Exception Users list can access the host from the ESXi Shell and through SSH, if they have the Administrator role on the host. (Because this control involves vCenter, it is not available for standalone ESXI hosts.)

Some administrators may be concerned that normal lockdown mode may interfere with certain operations like backup and troubleshooting. If this is a consideration, temporary deactivation is an option, as long as reactivation upon completion of a given task is standard operating procedure.

How to do it

For an ESXi host, via vSphere Web Client:

  1. Select the host
  2. Select Configure, then expand System and select Security Profile > Lockdown Mode> Edit
  3. Click the Normal radio button

Connect to the ESXi host and, from a PowerCLI command prompt, run the following commands. (These are shown in the list below, but all four can actually be entered at the same time. If you choose to cut and paste from this article, be sure to avoid the bullets.)

  • $level = “lockdownNormal”
  • $vmhost = Get-VMHost -Name | Get-View
  • $lockdown = Get-View $vmhost.ConfigManager.HostAccessManager
  • $lockdown.ChangeLockdownMode($level)

Deactivate SSH when not in use

Why it helps

Now and then it’s necessary to use SSH when interacting with vCenter Servers and ESXi hosts — for instance, while patching, as mentioned above. However, turning off SSH when not in use reduces the attack surface by removing a target for brute force attacks, or use of compromised credentials.

How to do it

For vCenter, follow the instructions on the linked page, making sure the Enable SSH login radio button is unselected.

For an ESXi host, via vSphere Web Client:

  1. Select the host
  2. Select Configure > System >Services
  3. Select > SSH > Edit Startup Policy
  4. Set the Startup Policy is set to Start and Stop Manually
  5. Click OK
  6. While ESXi Shell is still selected, click Stop

Alternately, use the following PowerCLI command (beware the bullet):

  • Get-VMHost | Get-VMHostService | Where { $_.key -eq “TSM-SSH” } | Set-VMHostService -Policy Off

For a standalone ESXi host via the web client:

  1. Select Manage > Services > TSM-SSH > Actions
  2. Click “Stop”
  3. Select Actions again, then Policy > Start and stop manually

Enforce password complexity for vCenter and ESXi hosts

Why it helps

Complex passwords help to mitigate brute force attacks. Attackers will often utilize password lists that are publicly available; they also may create their own lists based on information about your organization that they have gathered in advance of (or during) an attack. Ensuring that vCenter and the ESXi hosts themselves don’t accept a non-complex password is helpful for password policy enforcement. As mentioned above, a password manager can help greatly with this mitigation, even providing extra security and auditability.

How to do it

The enforcement of password complexity is managed through the Security.PasswordQualityControl parameter. With it, you can control allowed password length, character set requirements, and failed logon attempt restrictions.

The CIS benchmark recommended setting is

retry=3 min=disabled,15,15,15,15 max=64 similar=deny passphrase=3

ESXi uses the pam_passwdqc module for password control, which is documented elsewhere. Referencing that manual, though, we can quickly break down what the individual components of this CIS recommendation accomplish:

  • With “retry=3,” the user will be prompted up to three times if a new password is not sufficiently strong, or if the password is not entered correctly twice
  • For the “min” component:
  •      The “disabled” setting disallows passwords consisting of characters from one character class only (the four character classes are digits, lowercase letters, uppercase letters, and other characters)
  •      The first 15 is the minimum length for a password of two character classes
  •      The second 15 is the minimum length for a passphrase
  •      The third and fourth 15s are minimum lengths for passwords consisting of characters from three and four character classes, respectively
  • The “max=64” component sets the maximum password length
  • The “similar=deny” component will deny a password that is similar to the previous one. (Passwords are considered to be similar when there is a sufficiently long common substring between the two, and the new password with the substring removed would be too weak; e.g., password123 and password124)
  • The “passphrase” switch sets the minimum number of words (here, three) required to create a passphrase; this is in addition to the character length requirement set above

For an ESXi host, via vSphere Web Client:

  1. Select the host > Configure > System > Advanced System Settings
  2. Select the Security.PasswordQualityControl value and set it, as shown above, to “retry=3 min=disabled,15,15,15,15 max=64 similar=deny passphrase=3” (or, if your organization’s standards vary, adjust the values according to your policy)

For a standalone ESXi host via the web client:

  1. Select Manage > System > Advanced settings
  2. Scroll or search Security.PasswordQualityControl
  3. Select Edit option
  4. Set the value to “retry=3 min=disabled,15,15,15,15 max=64 similar=deny passphrase=3”(or, if your organization’s standards vary, adjust the values according to your policy)
  5. Click Save

For vCenter implementation, the CIS benchmark does not specifically address vCenter password policies. However, based on our understanding of the components of the CIS benchmark recommendation, some portions can be partially applied to vCenter password configurations.

  1. In vSphere Client, select Administration in the hamburger menu
  2. Under Single Sign On, select Configuration
  3. Select Local Accounts > Password Policy > Edit
  4. Set the Maximum lifetime number in accordance with your organization’s policy concerning password lifetime
  5. Set Restrict reuse in accordance with your organization’s password-reuse policy
  6. Set Maximum length to 64, as in the settings above
  7. Set Minimum length to 15, as in the settings above
  8. For Character requirements, set the “At least” value in accordance with your organization’s policy; the minimum value is 1
  9. Set “Identical adjacent characters” in accordance with your organization’s password-adjacent characters policy

Require account lockout after failed login attempts

Why it helps

The enforcement of account lockouts also interferes with brute force attacks. Technically, the attacker can still try a brute force attack, but they will have to be extremely lucky to get it right with only five chances before being locked out. This control is applicable for vCenter, SSH, and vSphere Web Services SDK access, though not for the Direct Console Interface (DCUI) and the ESXi Shell.

How to do it

The CIS recommended setting is to configure hosts to have the Security.AccountLockFailures parameter set to 5. This control can also be implemented on vCenter.

For vCenter itself:

  1. Login with root
  2. Select Administration > Single Sign-on > Configuration > Local Accounts > Lockout Policy
  3. Set the maximum number of failed attempts to 5

For an ESXi host, via vSphere Web Client:

  1. Select the host
  2. Select Configure > System > Advanced System Settings
  3. Set the Security.AccountLockFailures value to 5

From a PowerCLI command prompt while connected to the ESXi host, run the following command (if copying and pasting, beware of the bullet):

  • Get-VMHost | Get-AdvancedSetting -Name Security.AccountLockFailures | Set-AdvancedSetting -Value 5

For a standalone ESXi host via the web client:

  1. Select Manage > System > Advanced settings
  2. Scroll or search for Security.AccountLockFailures
  3. Select Edit option
  4. Set the value to 5
  5. Click “Save”

Enable UEFI Secure Boot

Why it helps

UEFI Secure Boot’s primary purpose is to ensure that only signed and trusted boot loaders and operating system kernels are allowed to execute during system startup. By verifying the digital signatures of bootable applications and drivers, Secure Boot prevents potentially harmful code from compromising the boot process, thereby reducing the attack surface of the ESXi hosts. This configuration is also a prerequisite for the recommendation in the next section, “Configure host to only run binaries delivered via signed VIB.”

How to do it

The target ESXi host must have a Trusted Platform Module (TPM) for this configuration to be enabled; older hardware may not have TPM. Assuming your hardware has TPM, the steps are as follows:

  • 1. Access the target ESXi host via the ESXi shell
  • 2. Verify whether secure boot is currently enabled with the following command (if copying and pasting, beware the “a.”, which is simply part of the list formatting):
  •      a. esxcli system settings encryption get
  •           i. If Require Secure Boot’s value is “true,” no change is necessary
  •           ii. If Require Secure Boot’s value is “false,” enable it
  •           iii. If Require Secure Boot’s value is “none,” first enable a TPM in the host’s firmware and then run the following command (if copying and pasting, beware the “1.”, which is simply part of the list formatting):
  •                1. esxcli system settings encryption set –mode=TPM
  • 3. Enable the secure boot environment
  •      a. Shut the host down gracefully
  •           i. Right-click the ESXi host in the vSphere Client and select Power > Shut Down
  •      b. Enable secure boot in the firmware of the host
  •           i. This procedure will vary depending on the hardware on which you run your ESXi host(s); consult your specific vendor’s hardware documentation
  • 4. Restart the host
  • 5. Run the following ESXCLI command (if copying and pasting, beware the “a.”, which is simply part of the list formatting):
  •      a. esxcli system settings encryption set –require-secure-boot=T
  • 6. Verify that the change took effect (if copying and pasting, beware the “a.”, which is simply part of the list formatting):
  •      a. esxcli system settings encryption get
  •           i. If Require Secure Boot’s value is “true” then you are all set
  • 7. To save the setting, run the following command (if copying and pasting, beware the “a.”, which is simply part of the list formatting):
  •      a. /bin/backup.sh 0

Configure host to only run binaries delivered via signed VIB

Why it helps

To enhance the integrity of the system, an ESXi host can be configured to only execute binaries originating from a valid, signed vSphere Installable Bundle (VIB). This measure thwarts attackers’ attempts to use prebuilt toolkits on the host. This configuration requires UEFI Secure Boot to be enabled.

How to do it 

The setting governing this behavior is VMkernel.Boot.execInstalledOnly set to True.

For an ESXi host, via vSphere Web Client:

  1. Select the host
  2. Select Configure > System > Advanced System Settings
  3. Select the “VMkernel.Boot.execInstalledOnly” value and set it to True

For a standalone ESXi host via the web client

  1. Select Manage > System > Advanced settings
  2. Scroll or search for VMkernel.Boot.execInstalledOnly
  3. Select Edit option
  4. Set the value to True
  5. Click Save

Deactivate Managed Object Browser (MOB), CIM, SLP, and SNMP services if not used

Why it helps

Shutting down all externally accessible services that your organization does not make use of is critical for reducing attack surface; these four in particular should be managed.

  • The Managed Object Browser (MOB) is a web-based server application that lets you examine and change system objects and configurations
  • The Common Information Model (CIM) system provides an interface for hardware-level management from remote applications via a set of standard application programming interfaces (APIs)
  • The Service Location Protocol (SLP) is used for the discovery and selection of network services in local area networks; admins use it to simplify configuration by allowing computers to find necessary services automatically. The service that handles this is called the SLPD (Service Level Protocol Daemon), as shown in the steps below
  • The venerable Simple Network Management Protocol (SNMP) facilitates the management of networked devices

How to do it

For an ESXi host, via the vSphere web client:

  1. Select the host
  2. Select Configure > System > Advanced System Settings
  3. Select the Config.HostAgent.plugins.solo.enableMob value and set it to False
  4. Select Configure > System > Services > CIM Server > Edit Startup Policy
  5. Set the Startup Policy to Start and Stop Manually
  6. Stop the CIM Server service if it is currently running
  7. Select SLPD > Edit Startup Policy
  8. Set the Startup Policy to Start and Stop Manually
  9. Stop the SLPD service if it is currently running
  10. Select SNMP Server > Edit Startup Policy
  11. Set the Startup Policy to Start and Stop Manually
  12. Stop the SNMP Server service if it is currently running

For a standalone ESXi host via the web client:

  1. Select Manage > System > Advanced settings
  2. Scroll or search for Config.HostAgent.plugins.solo.enableMob
  3. Select Edit and set the value to False
  4. Click Save
  5. Select Services > SLPD > Actions
  6. Click Stop
  7. Select Actions again and click on Policy
  8. Select Start and stop manually from the menu
  9. Select sfcbd-watchdog (this is the CIM server) and select Actions
  10. Click “Stop”
  11. Select Actions > Policy again
  12. Select Start and stop manually from the menu
  13. Select snmpd and click Actions
  14. Click “Stop”
  15. Select Actions > Policy once more
  16. Select Start and stop manually from the drop-down menu

Set up persistent logging

Why it helps

Configuring persistent logging is the only recommendation on this list that doesn’t reduce attack surface. However, it will come in handy in the event of a security incident affecting ESXi hosts. By default, EXSi logs will be stored in an in-memory filesystem that retains only a single day’s worth of logs. Since those logs are stored in memory, they will be lost on reboot. While a persistent local log is a significant improvement over the default, sending the logs to a remote syslog collector is even better. In the unfortunate event that your ESXi hosts are encrypted along with any attached drives, with a syslog collector in place there is a higher chance that you will still have access to those logs, or to some portion of them. The other benefit of shipping logs out of the host is that if your organization uses a SIEM, the ESXi logs could be ingested there as well.

How to do it 

First, create a persistent location. After that’s done:

For an ESXi host, via vSphere Web Client:

  1. Select the host
  2. Select Configure > System > Advanced System Settings
  3. Select the Syslog.global.logDir value and set it to the location you designated for log storage

For a standalone ESXi host via the web client:

  1. Select Manage > System > Advanced settings
  2. Scroll or search for Syslog.global.logDir
  3. Click Edit option
  4. Set the value to the location you designated for log storage
  5. Click Save

Next, set a target syslog collector for ESXi logs, and enable the outbound syslog traffic on at the ESXi host firewall.

For an ESXi host, via vSphere Web Client:

  1. Select the host
  2. Click the Configure tab
  3. Select Logging > Actions > Edit Settings
  4. Under Host Syslog Configuration, select Send log data to a remote syslog server
  5. Set the value to the address associated with your syslog server
  6. Click OK
  7. While still on the Configure tab for the host, expand System and select Firewall
  8. Browse to the syslog outbound rule and enable it

For a standalone ESXi host via the web client:

  1. Select Manage > System > Advanced settings
  2. Scroll or search for Syslog.global.logHost
  3. Click Edit option
  4. Set the value to the address associated with your syslog server
  5. Click Save
  6. In the sidebar navigator on the left, select Networking > Firewall rules
  7. Select the syslog rule and choose Actions
  8. Click Enable

Conclusion

While implementing the recommendations covered in this article is no guarantee that your ESXi hosts are safe, doing so can make it considerably more difficult for attackers to cause quick and severe harm. Moreover, layering controls increases friction for would-be attackers, costing them time and effort – precisely what they were likely hoping to avoid by going after ESXi – and giving defenders a larger window and more options for detection and response.

 

http://feeds.feedburner.com/sophos/dgdY