Welcome to bytebang » The blog about all and nothing » Transfer a virtual machine between two ESXi Hosts

Transfer a virtual machine between two ESXi Hosts

Oct 22 2014

The Problem

Recently i had to migrate a bunch of files from one VmWare Hypervisor to another one. It turned out that there are multiple ways how to do this. The setup is as follows: 2 Computers - both of them are running VmWare ESXi vSphere Hypervisor. The are connected to the same LAN  via a 1 Gigabit connenction.

Solutions

As stated before - there are multiple ways how to copy data between those two machines. In this article i am focusing on techniques that do not require thrid party software to perform this task. The following sections are describing my attempts to copy an ISO image from one Host (192.168.30.3) to another host (192.168.30.79). All of these attempts are solving the problem - but the performance is different. All tests where performed on two similar machines with no additional load on them.

To be able to reproduce these steps you have to turn on the ESXi Shell on the hypervisor to get access via SSH. Then you can access the consoile via Putty in WIndows or ssh under Linux.

login as: root
Using keyboard-interactive authentication.
Password:
The time and date of this login have been sent to the system logs.

VMware offers supported, powerful system administration tools.  Please
see www.vmware.com/go/sysadmintools for details.

The ESXi Shell can be disabled by an administrative user. See the
vSphere Security documentation for more information.
~ # 

Copy via SSH / SCP

The most obvious solution is to copy the data via SCP from one machine to another. Even if you have turned on the ESXi shell you will discover taht a simple copy operation is not possible because the firewall blocks outgoing connections on port 22. So the forst step is to enable the SSH client in the firewall:

Firewall_Settings.jpg

Once you have done this you can connect to the other machine via SSH/SCP and transfer some files:

So lets try to copy a 468 MB file from the source to the target:

#/vmfs/volumes/5444e609-c0e56465-2154-901b0e0f11ad/iso-images # scp root@192.168.30.3:/vmfs/volumes/53ce71c4-5f426b5e-94a4-901b0e307310/iso-images/acp_systemcd.iso .

Password:
acp_systemcd.iso                              100%  468MB   6.4MB/s   01:13

We see that it takes 1:13 to copy 468 MB of data. Since this is not very fast i tried to copy it into the opposite direction - instead of sucking it from the source i wanted to push it onto the target:

/vmfs/volumes/53ce71c4-5f426b5e-94a4-901b0e307310/iso-images # scp ./acp_systemcd.iso root@192.168.30.79:/vmfs/volumes/5444e609-c0e56465-2154-901b0e0f11ad/iso-images/

The authenticity of host '192.168.30.79 (192.168.30.79)' can't be established.
RSA key fingerprint is 50:de:1e:46:b6:5f:30:7e:31:51:66:00:d7:59:b5:f8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.30.79' (RSA) to the list of known hosts.
Password:
acp_systemcd.iso                                                                          100%  468MB   5.8MB/s   01:21

This also wasn't as fast as i would have expected it to be - the machines are connected with a 1 GBit connection! Then i had the following idea: Lets send the data compressed via scp - this should speed up the effective datatransfer a little bit. This time t tried to send a 3GB file over the wire. The compression is done on-the-fly ba adding the -C flag to the scp command.

/vmfs/volumes/53ce71c4-5f426b5e-94a4-901b0e307310/iso-images # scp -C ./Server2008R2_Eval.iso  root@192.168.30.79:/vmfs/volumes/5444e609-c0e56465-2154-901b0e0f11ad/iso-images/
Password:
Server2008R2_Eval.iso                                                                     100% 3053MB   3.6MB/s   14:02

WHOW! The transfer rate dropped to 4.16 MB/sek. OK my next guess was that the system is some kind of busy but since there where no VMs running this could not be the case. So if the machine is busy then everything should be slow ... so lets copy the same file again but without compression.

/vmfs/volumes/53ce71c4-5f426b5e-94a4-901b0e307310/iso-images # scp ./Server2008R2_Eval.iso  root@192.168.30.79:/vmfs/volumes/5444e609-c0e56465-2154-901b0e0f11ad/iso-images/
Password:
Server2008R2_Eval.iso                                                                     100% 3053MB   7.3MB/s   06:58

... and again i was disappointed (or confused). After a cup of coffe and several attempts later I came to the following conclusion:

Even if compression usually speeds up data transmissions ... in the VmWare ESXi Hypervisor this is not the case. SCP performs better without compression !

It takes about twice the time to transmit the same file compressed than uncompressed.

Copy via netcat

Some internet forums indicated that VmWare intentionally slows down scp operations on the admin-shell. Well - there are other possibilities how to get your data shipped. One of the most famous tools among hackers and system administrators is netcat - often reffered as the swiss army knife of networking. Fortunately netcat is installed out of the box on the ESXi hosts. So lets try to move the data by running netcat as a server on the destination machine. The tricky part was to find a port that could be used without the need to manipulate the firewall settings. I have coosen the port 8000 which is (according to a VMware Knowledgebase article) reserved for the vMotion service. Since i am not aware of the fact that my hosts are doing anything with vMotion this seemed to be a valuable choice.

On the receiving host i fired up netcat as a server that listens on port 8000 and writes everything it gets into the file acp_test_netcat.iso

# nc -v -l 8000 > /vmfs/volumes/system_datastore/iso-images/acp_test_netcat.iso

On the sending host i fired up netcat as a client that connects to the server and transmits everything within the file acp_systemcd.iso

#nc -v 192.168.30.79 8000 < ./acp_systemcd.iso

It took 213 seconds to copy 467 MB of data which means that netcat reaches 2.19 MB / sec. Since netcat utilizes neither a protocol to transfer the data nor a compression my (and the assertions of the internet forums) are confirmed: Some things are intentionally slowed down within the VMware ESXi Shell.

Copy via Python & wget

OK now it was time to figure out which programs are not crippled. Since VMware offers features to move data between two hosts i assumed that they slowed down everything in the standard shell that could be used for this purpose. To proove my theory I tried to to something that was not that usual: I searched for a way to write a server / client by myself. Then i stumbled over python. Python is a general purpose scripting language with a large standard library that usually enables you to run a webserver almost out of the box.

The other reasons to choose a python where:

  • It is a general purpuse language - So it is not that easy to cripple a certain part of it (lets say limit network speed) because before you run a script the python interpreter has no idea what you are gonna do.
  • It is used by VMware itself - So why would they taint their own tools.

Python 2.6 is installed out-of-the-box. So my idea was to use SimpleHTTPServer as a simple webserver on the senders side and thre preinstalled wget as http client on the receiving host. For all of you who are not aware of what SimpleHTTPServer does: This python class serves files from the current directory and below, directly mapping the directory structure to HTTP requests.

Unfortunately this class was not installed on my machines - so i had to download it befor i could start the server:

vmfs/volumes/53ce71c4-5f426b5e-94a4-901b0e307310/iso-images # wget http://svn.python.org/projects/python/branches/release26-maint/Lib/SimpleHTTPServer.py
/vmfs/volumes/53ce71c4-5f426b5e-94a4-901b0e307310/iso-images # python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...

A first attempt to access the python webserver with a standard browser showed me this beautiful picture:

Filelisting.jpg

SUCCESS - Now i just had to download the files on the client:

/vmfs/volumes/5444e609-c0e56465-2154-901b0e0f11ad/iso-images # wget http://192.168.30.3:8000/acp_systemcd.iso
Connecting to 192.168.30.3:8000 (192.168.30.3:8000)
acp_systemcd.iso     100% |*******************************|   467M  0:00:00 ETA

BIG SUCCESS: The entirely uncompressed download of 2 files lead to the following results:

Size Time Throughput
467 MB 1m:02s 6.22 MB/s
3 GB 6m:45s 7.58 MB/s
~20 GB~36m 9.25 MB/s

It seems that python is not restricted in terms of speed because it sometimes reaches speeds > 10 MB/sec and therefore it is sometimes able to outperform scp.

However: This result was not 100% reproduceable but it seems that this is the fastest way to get data out of the machine. And as an additional bonus you can simply download the desired files with a simple borwser on any machine that can connect to the host.

Copy via NFS-Share

As last attempt i tried to move the data from one host to the other by intermediately copying it to a NFS share (which is of course also connected via 1GBit to the hosts). In this scenario i connected an addtional datastore to the hosts. From previous experiments with VMware i remembered that a VM is able to run from a connected NFS storege. Of couese it is not as fast as it is from local disks but since the difference was not too big i hoped that NFS is a way to get my data transferred rather quickly. I created the new NFS share on a buffallo TerraStation and connected it via the management GUI to both VMWare systems. Once the additional datastorage is recognised by the Hypervisor it is automatically mounted into the /vmfs/volumes directory. In my case the volume was called Test.

/vmfs/volumes # ls
2187187c-a7760563-fab1-163290c0c43f  5ebabd7d-d5b1bb03-41bf-0a54d491997a
53ce71b8-524a85a0-b04a-901b0e307310  Test
53ce71c4-5f426b5e-94a4-901b0e307310  b890daea-77712a93
53ce71c9-ae797f15-cf51-901b0e307310  local_datastorage
53d113eb-62b09240-52c6-901b0e307310  local_systemstorage

/vmfs/volumes # cp local_systemstorage/iso-images/Server2008R2_Eval.iso Test/

I copied again a 3GB file from the local storage onto the mounted NFS share and it took 13 minutes to complete. This means that the transferspeed is about 3.9MB/sec, but this ist just the first half of the transfer. If you take into consideration that you have to copy the file from the NFS share to the target computer to complete this task then the effective transfer speed is about 1.85 MB/sec.

Summary

Especially when i copied small files via scp i discovered peaks in the speed up to 30 MB/s.

Fujitsu_ServerView.iso                   0%   32MB  19.7MB/s   06:44 ETA

But these speeds usually disappeared within a few seconds and then the data transfer rate settled at about 7.1 MB/s (I copied 100GB of data with scp in 4h:5m which translates into an average speed of about 7.1 MB/s.)

I have never encountered average speeds above 10.5 Mb/s. So where is this (intentionally placed?) bottleneck within the vSphere hypervisor? It seems that the bottleneck is located somewhere around the disk or within the disk-scheduler because if you just compress files on the local disk then the speed is also around 10 MB/s. To be more precise: the decompression of a 22.5GB tar archive (compressed with tar -xzf) took around 45 minutes which translates into a speed of ~8.5 MB/sec. The compression was sligtly faster - there i got a speed of about 10MB/sec. From a hypervisors point of view this would make sense because it should be first priority to serve the VMs and the maintainance jobs are normally not as important than running systems.

However: I was not able to demystify how exactly VMware slows down the internal tools but (for me) i confirmed the theory that some tools that are intentionally slowed down. I finally copied my data with uncompressed scp because speed was not the only issue. SCP encrypts the data that is sent over the wire.

Update: 2016-01-24 The lates migration of a vmWare between two hosts which are conencted via a new switch performed as it should. The average throughput was about 70 MB/sec (which is OK for a 1GBit connection).

So it *could* be that the bad results are somehow server / hardware specific.

Get Social


(c) 2024, by bytebang e.U. - Impressum - Datenschutz / Nutzungsbedingungen
-