Over the last few years, I have been exposed to the storage architecture surrounding Dell Equallogic. Equallogic’s market seems to be the mid-range storage tier, providing a cost effective way for small – mid size organizations to start their own SAN (storage area network) deployment. Originially when Equallogic was released to the market, they were their own company but eventually purchased by dell and since we have seen a change, some for the better some for the worst.
Example Infrastructure Design:
- HP DL585 G7 Cluster (3 Hosts in the Cluster)
- Management and Vmotion is on a unique designated VLAN
- VSphere 5.1 Update 1 (Build 1117900)
- Dell Mem 1.1.2 Installed
- Equallogic PS5000 units in a pool
- ISCSI traffic is on a separate VLAN from VSphere management
- No Spanning-tree enabled, flow control enabled, Jumbo Frames enabled
- Network Infrstructure
- A pair Cisco Nexus 2248’s connected to a pair of Cisco Nexus 5000’s
One specific issue we seem to be experiencing is an ongoing multi-pathing related error in our vshere event logs. An example is below:
We were seeing these events several times every 5 minutes and then it would return to it’s natural state. An example is below:
Notice the time difference of approximately 1 second.
What you will see in the equallogic event logs will look similar to the following.
iqn.2001-05.com.equallogic:0-8a0906-0b273f402-e2f00171e7b516be-vol-development01-1g' from initiator '172.17.1.54:55272, iqn.1998-01.com.vmware:esx4' was closed. | Logout request was received from the initiator.
iqn.2001-05.com.equallogic:0-8a0906-0b273f402-e2f00171e7b516be-vol-development01-1g' from initiator '172.17.1.54:58958, iqn.1998-01.com.vmware:esx4' successful, using Jumbo Frame length.
After working with Dell, a level two came up with the following solution that has solved the issue for us.
Disable LRO in ESX v4/v5 Note:
After upgrading the ESXi host to 4.1 or 5.x and upgrading VMs to VMwareTools to 4.1 or 5.x, you may experience slow TCP performance on VMs running on the 4.1 or 5.x ESXi host. You can address this situation by disabling Large Receive Offload (LRO) on the ESXi host.
To disable LRO, follow this procedure:
- Log into the ESXi host or its vCenter with vSphere Client.
- Select the host > Configuration > Software:Advanced Settings.
- Select Net and scroll down slightly more than half way.
- Set the following parameters from 1 to 0:
Reboot the ESXi host to activate these changes.
Within VMware, the following command will query the current LRO value.
- esxcfg-advcfg -g /Net/TcpipDefLROEnabled
To set the LRO value to zero (disabled):
- esxcfg-advcfg -s 0 /Net/TcpipDefLROEnabled
A server reboot is required.
Your guest VMs should now have normal TCP networking performance.
As evident on both sides, something is triggering an event that is causing the software ISCSI initiator to logout of the volume but then immediately reconnect. There seems to be some disconnect between the Dell Mem Module and what VSphere is trying to do.