vCenter & Distributed vSwitch on two ESXi hosts with a single NIC

I was doing some lab work the other day with two IBM Flex nodes that only had a single 10Gb NIC.

The vCenter for the environment was located on the afformentioned ESXi hosts and my plan was to use the Distributed vSwitch, rather than the Simple vSwitch.

If you ever tried moving a ESXi host to a Distributed vSwitch which hosts the vCenter, it easy when you have more than one NIC. Just move one of the NIC’s to the Distributed vSwitch,  and then change the network configuration for the vCenter.

But when you are trying to move a ESXi host with a single NIC (whitebox, demo equipment, etc) things get a little bit more complicated.

When you attempt to move the vCenter and the ESXi host to a new Distributed Portgroup, the vCenter loses its connection and the process is rolled back. But you are still stuck with the NIC on the Simple vSwitch. Status quo…

The best way to make this work is to:

  1. Move the ESXi host that doesn’t run the vCenter VM hto the Distributed vSwitch. Create VM traffic portgroups.
  2. Clone the vCenter VM and place it on the ESXi host that doesn’t run the vCenter VM.
  3. Connect the newly cloned vCenter VM to a Distributed Portgroup on the ESXi host (that was connected to the DVS previously)
  4. Turn off the original vCenter.
  5. Turn on the cloned vCenter and configure the network settings (accept the error about a previous network using the IP if using Microsoft Server)
  6. Move the existing host to the Distributed switch.

Now you have a working vCenter on hosts with single NICs with  a Distributed vSwitch.

Advertisements

Custom alarms for events in vCenter 5.x

Some customer have been asking if I know why some machines are failing at consolidating the snapshot in the end of the backup job. It seems as the job finishes, but the snapshot deletion fails, some times leaving behind a large snapshot, or even some “ghost” snapshots.  Sometimes the event isn’t noticed until days later, or even worse. when the datastore fills up.

When this happens, an event is logged for the virtual machine, stating that the VM’s disks consolidation fails:

Virtual machine {vm.name} disks consolidation failed on {host.name} in cluster {computeResource.name} in {datacenter.name}.

This is a perfect case for a custom alarm so the administrator can be informed when the consolidation failed.

  1. First you need a way to create custom alarms in vCenter. My main source of information is this handy document from the VMware communities (author hmundt): More fun with vSphere Alarms
  2. Second you need a list of event for the vSphere API. Veeam has been so kind to publish a list of events from the API for vSphere 5.0 which they make available for users for their great product Veeam One (and if anyone from Veeam reads this, an updated list for vSphere 5.1 will be much appreciated).
  3. Next you create a new alarm on the vCenter level, choose Virtual Machine, Event and for the Event trigger you just paste the vSphere API event text. In this case its:

com.vmware.vc.VmDiskFailedToConsolidateEvent

Next time a consolidation job fails an Alarm will light up that VM and bother all the people you added on the email notification list.

Of course this list can be used to watch for EVERY event know in the vSphere API and is very handy when you need to watch for a specific event in one of those troubleshooting sessions.

Upgrading a vCenter SQL Express database

The other day I got my hands on a full vCenter SQL 2005 SP2 Express database. The vCenter database filled up the 4GB allowed for SQL 2005 Express DBs.

So as the shop I was in had no SQL’s to work with, it was decided to upgrade to SQL 2008 R2 SP2 Express, which has a 10GB limit per database.

The environment was running on vSphere 5.0, and I had upgraded it recently from 4.1 to 5.0. There’s a quite an increase of tables between 4.1 and 5.0, so this will happen to most environments sooner or later.

Note this procedure will only work if you will still be using the same vCenter server as in the beginning. Not to be used for whole vCenter relocations.

So the way to do this is quite easy, and you don’t need to be a SQL admin. 🙂

You will need to break this procedure into 3 parts: 1) Preparation 2) Upgrade 3) Test

1) Preparation

  • ODBC connections: Make sure to check what the ODBC connection is configured to Integrated Windows or SQL.
  • Services: Make sure to check what user is used to run the Virtual Center Server service. Most likely System or a domain/local admin.
  • Name of the Database: I recommend not to change the name of the database. Most likely the name will end  at SQL*\SQLEXP_VIM.
  • Get the installation files for SQL 2008 R2 Express and also for SQL Server Management Studio Express.
  • Open up the SQL instance using SQL Management Studio, and note who the DBOwner is for each database that will be moved. If a SQL user note that down as well.

2) Upgrade

    1. Stop all vCenter related services
      • vSphere Web Client
      • VMware VirtualCenter Server Delayed
      • VMware VirtualCenter Management Webservices Delayed
      • VMware vSphere Update Manager Service.
      • VMware vSphere Profile-Driven Storage
      • vCenter Inventory Service
      • VMwareVCMSDS
    2. Put all stopped services to disabled.
      • This is done as you will need to restart the server after a SQL upgrade and you will not want the services to start when you do.
    3. Open up the old SQL 2005 Express database using the SQL Management Studio.
    4. Backup each database (e.g. if you got vCenter and Update Manager databases).
      • Right click the database, go to Tasks and select Backup. Backup to a known location.
    5. Go the the DATA folder for the SQL instance, for 32 bit  its in c:/Program Files/Microsoft SQL Server//…, and for 64 bits in c:/Program Files (x86)/….
      • There you will find all the database and log files for the vCenter server.
      • Names are most likely VIM_VCDB.ldf for logs, and VIM_VCDB.mdf for the database itself.
    6. Detach the database. Make sure you stopped the vCenter services.
      • Right click the database, go to Tasks and select Detach.
      • Move the database and log file to another location.
    7. Though you can upgrade 2005 Express to 2008 Express, I find it much “cleaner” to just uninstall 2005 and install a new SQL 2008 R2 Express instance
      • Remove the SQL 2005 Express instance. (you will need to turn off the SQL service)
    8. Restart
    9. Install a new SQL 2008 R2 Express instance.
      •  When installing a new database make sure you write down the sa account password and/or give a domain/computer account sysadmin privileges to the instance.
      • Make sure you name the instance as SQLEXP_VIM. Otherwise you will need to change a registry setting for the VirtualCenter service to start (pointing it to the new name).
    10. Just to make sure, restart again.
    11. Move the database and log file to the new folder for the 2008 R Express instance.
    12. Login to the instance using SQL Studio Manager.
    13. Right click databases and select Tasks->Attach. Attach the database. You don’t need to attach another log file when the pop-up appears, theres only 1 log file already associated with the database.
    14. Go to properties of the vCenter database and make sure the DBO (database owner) is the same one as on the 2005 instance.
      • You might need to add the user in the Login section of the instance.
    15. Create a new file using notepad, save it as connections.udl (must end in udl). Go to properties and to Connection. There you can try out the SQL connection. This is a handy tool to use with SQL connections test. This will be used in the next sections.
    16. Go to SQL Server Configuration Manager (should available in the Start menu).
      • Select SQL server network configuration and enable both Named pipes and TCP/IP.
      • Go to Properties on TCP/IP. Select IP Addresses and go to the bottom where you see a section called IPAll. Put in 1433 in TCP port. Push OK.
    17. Go to both ODBC managers (32bit and 64 bit: C:\Windows\SysWOW64 for 32bit and C:\Windows\system32 for 64bit, yes they have conflicting names…).
      • Make sure you have a connection to the database. 32 bit is for Update Manager.
      • The user that connects to the database, needs to be a user that has access to Database through the SQL Studio Manager. Best practice is a domain system account, that is a DBO on the vCenter database, and is the one that starts the vCenter service as well.
    18. Open SQL Studio Manager and open up the vCenter Database
    19. Put all the services to their former startup selection.
    20. Restart the server, or go through restarting the services. I find it easier just to restart it.

3) Test

    1. After restarting make sure the vCenter server service starts and all your performance data is showing.

Notes (stuff you should know about vCenter SQL Express databases):

  • Rollup jobs (the jobs that move performance data between week->month->year) are not running as a separate job, so you should not need to fix those. They are being run by the VirtualCenter service and are a part of the database (located in vCenter DB > Programmability > Stored Procedures). This is only the case for SQL Express instances.
  • I always recommend putting vCenter Databases on real SQL servers. But I’ve seen small environments of at least 100 machines run for years on an Express database (NOT SUPPORTED).
  • Most misconfigurations on SQL Express DB’s are user related. Double check the user that runs the VirtualCenter service, and who is the DBO, and ODBC connections.

KBs used in this blog post:

vCenter SQL Database Troubleshooting

As I have stated before, I love databases, especially the vCenter database. That’s why I really wanted to create another vCenter database post, this time regarding troubleshooting.

I recommend watching the video from my previous post, regarding how performance statistics are processed.

Starting with a small overview its good to know the vCenter database is quite a complex database and it’s not getting any simpler with each new version of vSphere.

  • In vCenter 2.5 it had 88 tables and 8 procedures
  • In vCenter 4.0 it had 196 tables and 12 procedures
  • In vCenter 5.0 it has 247 tables and 27 procedures

I have had my share of database troubleshooting, even though I live in Iceland 🙂 Mostly its because of server sprawl on vCenters starting on SQL Express.

The most common problems are:

  • Timeouts – Gaps in the Performance tab in the vSphere Client.
  • Slow response getting performance data.
  • Only real time data is available – or only Real-time and 1 day. Missing the week, month and year old performance statistics.
  • No space left for database growth.
  • Overloaded SQL server.

Timeouts:

  • If specific to one ESXi host – restart the hosts Management agent. KB article (1003490): Restart Management agents on ESXi and ESX.
  • If showing on every host  – a) Check if the SQL server is overloaded. CPU and Memory. b) Check the size of the first vpx_hist table, if larger than 10 million rows you will need to dig deeper. Most likely the rollup jobs aren’t working correctly (you should see a lot of other signs if they are not working, like older performance data not showing). You can also try to manually run the jobs.
  • Check and see when the rollup jobs last ran:

select max (sample_time) from vpx_sample_time2

  • If there is a long time since last run, your rollup jobs aren’t functioning correctly.
Slow response:
  • Most likely its because of fragmented vpx_hist tables. Go through the steps in this KB article (1003990): Defragmenting VirtualCenter performance data indexes on a Microsoft SQL database.
  • Large database. Truncate or purge (archive first).
Only Real-time data available:
  • When selecting older data than 1 day, you will get “Performance data is currently not available for this entity”. Culprit: Rollup jobs. Check and see the SQL Agent Service is Started and recreate the Rollup jobs.
  • Check to see when the last time a rollup job ran. Use this query:
select max (sample_time) from vpx_sample_time2
  • Manually run the rollup jobs – purge the tables – truncate data from the tables.

No space for database growth:

  • Most likely the database is configured for Full Recovery. This will eventually consume all the space on the SQL server. Please make sure to change the recovery mode to Simple but please note that this option will probably make you lose some days of performance data.
  • To check where the growth is occuring see this KB article (1028356): Determining where growth is occurring in the vCenter Server database. Also you can run these commands:

exec sp_spaceused vpx_hist_statx or

select count(*) from vpx_hist_statx.

  • Then you can purge data, KB Article (1025914): Purging old data from the database used by vCenter Server.
  • Or truncate the data. Use these commands:

truncate table vpx_hist_statx or

truncate table vpx_sample_timex

Overloaded SQL server:

  1. Use Perfmon on the SQL server to check specific SQL counters. Most common culprits are memory or IO bottlenecks.

So that probably made you more confused than you were before so this is the short version for vCenter SQL troubleshooting:

  1. Loss of performance data is only specific to one host – restart the Management agent. KB article (1003490): Restart Management agents on ESXi and ESX.
  2. Check if there is space for database growth – SQL server drives. KB article (1028356): Determining where growth is occurring in the vCenter Server database. When you found the guilty table, purge it. KB Article (1025914): Purging old data from the database used by vCenter Server. Or truncate it, see commands above.
  3. Check the health of the transaction logs. What is the recovery model for database? Should be Simple. KB article (1003980): Troubleshooting transaction logs on a Microsoft SQL database server.
  4. Slow response means fragmented tables. KB article (1003990): Defragmenting VirtualCenter performance data indexes on a Microsoft SQL database.
  5. Other good housekeeping tasks include: Check SQL servers resources, no extra applications on the vCenter, keep vCenter and SQL separate, avoid using statistic level higher then 2, do not use SQL Express, check for network congestion between vCenter and SQL.
  6. Put the vCenter database in your SQL preventive maintenance schedule – defrag, check for huge tables etc. Also look at this great post from Chris Wahl at wahlnetwork.com where he uses the SQL’s Maintenance plan to configure daily backups, and this post as well, where he configures Log shipping for a vCenter SQL database.

Other than that you can also see my previous post about vCenter SQL database performance considerations.

Note! There are multiple other vCenter SQL troubleshooting scenarios but they all include loss of service, locked database, ODBC errors in vpxd.log (wrong authentication (KB1006482)/permission(KB1003052), time sync(important for SQLs).

Recommended reading:

KB: Investigating the health of a vCenter Server database

VirtualCenter Database Maintenance – SQL http://www.vmware.com/files/pdf/vc_microsoft_sql_server.pdf (for vcenter 2.5… but still)

Industrial strength defrag from get-admin.com, great post for even more defragmentation.