Thursday, August 28, 2014

VMworld 2014 - Sights to See

San Francisco sights to see, within walking distance of the Moscone Center.

San Francisco Bay

Golden Gate Bridge

Alcatraz

Pier 39 and Fisherman's Warf

San Francisco City Hall

Chinatown

Grace Cathedral

Cable Car Museum

Coit Tower

Lombard Street

Ghirardelli Square

Tuesday, August 26, 2014

VMworld 2014 - Getting the Most out of vMotion - Architecture, Features, Debugging



Getting the Most out of vMotion - Architecture, Features, Debugging

"Getting the Most out of vMotion - Architecture, Features, Debugging vMotion is a key, widely adopted technology which enables the live migration of virtual machines on the vSphere platform. It enables critical datacenter workflows, including automated load-balancing with DRS and DPS, hardware maintenance, and the permanent migration of workloads. Each vSphere release introduces new vMotion functionality, and significant performance improvements to address key customer requests and enable new use cases. In this session, join engineers from the development and performance teams to get an insiders' view of vMotion architecture, cutting edge features, best practices, and tools for performance troubleshooting. Performance studies will be presented for some of the hot topics including Monster VM migrations, migrations over IPv6, and Metro migrations over Distributed/Federated storage deployments. Finally, take a sneak-peek into the future and performance directions for vMotion including long distance migrations and migration to public clouds".
  • Gabriel Tarasuk-Levin - Staff Engineer 2, VMware
  • Sreekanth Setty - Staff Engineer, VMware

vMotion

Transperent move of VM to another host.
vMotion requires shared storage
vMotion enables features like DRS and FT
vMotion Workflow
  1. skeleton vm on destination
  2. copy vm memory state - most complex portion of workflow
  3. quiesce vm on source
  4. transfer device state and transfer remaining memory changes
  5. resume vm on destination
  6. poweroff source vm
How does the memory copy work? Pretty complex. Uses iterative memory pre-copy. Cycle through memory pages, copy and monitor for changes (dirty pages). Repeat until converges on equality.

Storage vMotion

Flip side of vMotion, only care about disk state.
VM remains on same host.
Storage vMotion has similar workflow as vMotion
  1. skeleton vm on destination
  2. copy vm cold data, such as snapshots
  3. copy vm hot data content
  4. quiesce vm on source
  5. transfer device state and hand off memory state
  6. resume vm on destination
  7. free vm resources on source

vMotion without Shared Storage

Available since ESXi 5.1
Only available from the Web Client
Moving VM atomically to another host without shared storage!
Workflow looks like Storage vMotion
Current transfers cold data across management network - architectural decision that VMware is trying to fix
Works with any storage type (NFS, SAS, etc)
Technology written by presenter

Features

History:
  • vMotion (2003)
  • v5.0: Multi-NIC vMotion, Sun during Page send (DPS)
  • v5.1: vMotion without shared storage
  • v5.5: Metro VPLEX support, IPv6 improvements

Performance

Performance metrics:
  • migration time (memory/disk)
  • switch over time
  • application impact (throughput and latency)
Monster VM Migration Performance
  • 2 NICs shows significant benefit. 3rd NIC not so much (due to vMotion helper threads limitation).

vMotion Across Metro Distances

Metro has up to 10 ms round trip time
EMC VPLEX optimizes vMotion duration
VPLEX uses some caching features for syncing across Metro

What's Next for vMotion

  • vNIC improvements - such as 3rd NIC performance
  • support Array Replication with VVOLs
  • Long Distance vMotion
  • vMotion within/to the Hybrid Cloud (vCHS)

VMworld 2014 - Virtual SAN Best Practices for Monitoring and Troubleshooting


Presentation Notes


Virtual SAN

Virtual SAN - software-based storage built into ESXi
  • Aggregates local Flash and HDDs
  • Shared datastore for VM consumption
  • Distributed architecutre
  • Deeply integrated with VMware stack
VSAN GA with ESXi 5.5 Update 1

RVC

RVC - started as a VMware Labs "Fling"
  • Interactive command line, with lots of VSAN commands
  • Included in VC since 5.5 (windows and appliance)
  • Presents inventory as a file structure

HCL

Verify Hardware against VMware Compatibility Guide (VCG)
HCL Guides:
  • vSphere general compatibility guide (Servers, NICs, etc)
  • Virtual SAN compatilibyt guide - adpaters, Flash and HDDs
show adapters using RVC:
vsan.disk_info --show-adapters <cluster>/hosts/*
Virtual SAN HCL - http://vmware.re/vsanhcl
HCL steps:
When viewing HCL entry, also check the "Class" and performance is important

Network

Network - Misconfiguration Detected
  • VSAN requires 10GBe (or 1G dedicated)
  • Single L2 network among ESX hosts
  • IP Multicast
Show ESX configuration:
esxcli vsan cluster get
RVC: vsan.cluster_info <cluster>
Ensure all hosts have VSAN vmknic configured
WebClient: host -> manage -> networking -> vmkernel adapters
esxcli vsan network list
RVC: vsan.cluster_info <cluster>
Ensure VSAN vmknics are on right subnet
WebClient: host -> manage -> networking -> vmkernel adapters
esxcli ?
RVC: vsan.cluster_info <cluster>
Ensure Multicast is configured
tcpdump-uw -i <vmknic> udp port 23451
tcpdump-uw -i <vmknic> igmp

Issues

VM shows as non-compliant / inaccessible / orphaned
  • non-compliant - maybe one mirror down
  • inaccessible - really bad
  • orphaned - VC has forgotten about the VM
VSAN object accessible:
  • at least one RAID mirror is fully intact
  • quorum: more than 50% of components need to be available (witnesses count here)

RVC Reports

VSAN RVC state reports:
vsan.vm_object_info <vm>
vsan.disks_stats <cluster>
vsan.obj_status_report <cluster>
vsan.obj_status_report --filter-table 2/3 -print uuids <cluster>
vsan.cluster_info <cluster>
vsan.resync_dashboard <cluster>
vsan.check_state --refresh-state <cluster>
vsan.disks_stats <cluster>
vsan.check_limits <cluster>

Diagnostics

Use the vSphere Web Client - the C# desktop client doesn't show VSAN or VSAN errors
VM Provisioning Started Failing -
  • don't use: Cluster - Manage - Settings - Disk Management (where dissk were setup, it is not the right place to check disk health)
  • Use: monitor - virtual SAN - physical disk
Proactive approach, try creating vm on every host on the cluster:
web client: standard method
rvc: diagnostics.vm_create -d <statstore> -v <vmfolder> <cluster>
VMware believes in "Dog Fooding" - have many internal VSAN clusters running

Benchmarking

VSAN Observer (vsan.observer in RVC)
  • collects stats every 60 seconds
  • web interface
  • HOL Plug: check out VSAN Observer Hands On Labs
Outstanding IO chart in Observer is a good indicator that SSD speed is not sufficient (affects latency)
VSAN implements a priority traffic scheduler

Good References

Webinars on Monitoring/Troubleshooting:
VMware Blogs:
Community Blogs:

Useful References


VMworld 2014 - Virtualizing Databases and Doing It Right


Presentation Notes

Book:
  • Virtualizing SQL Server with VMware: Doing IT Right - by Michael Corey, Jeff Szastak, Michael Webster

Presentation covers Microsoft SQL and Oracle
Microsoft SQL people don't want to talk to Oracle people, and vice a versa - we are going to do it anyway
DBAs shouldn't care about the infrastructure. You don't care about the Cell Phone towers, you just expect them to work.
Number #1 issue that causes BSOD - drivers. With Virtualization, the driver depth is minimized
Virtualization is an incredible return on investment
Able to adjust resources with a click of the button (assuming there are additional shared virtual resources available to add)
If you adjust memory, you will need to restart the database to take advantage
"Any Resource, Any Server, At Any Time" in the (Pool)
Is your database too "Big" to virtualize? - doubt it
Virtualization has about a 5% overhead, but if your setup doesn't have at least 5% wiggle room, you are doing something wrong anyway
Management expectations - need to set correctly, and explain the costs and what it actually will take
If asked to meet an SLA, make sure it can be meet, and set the expectations. If you can't meet it, let management know as soon as possible.
Optimize optimize optimize - the defaults on servers, applications, etc are not tuned for performance
Read the documentation from all vendors!
Professional Association of SQL Server - join it if you are doing SQL Server - http://virtualization.sqlpass.org/
Oracle VMware users group - http://ioug.org/VMware/
SLAs:
  • two nines - 99%
  • three nines - 99.9%
  • four nines - 99.99%
  • five nines - 99.999%
If it doesn't perform will in physical, don't expect it to perform will in virtualized - garbage in, garbage out
Baseline, baseline, baseline - "there are no silver bullets"
slow storage array equates to slow database
When baselining, make sure your sample set is reasonable (seconds). A lot can happen in minutes.
SLOB (Silly Little Oracle Benchmark) - good free tool to look at
"Check It Before You Wreck It" -- Jeff Szastak
"Build New" - when migrating from Physical to Virtual, take the opportunity to "Build New"
VMware http://tsanet.org/ - hardware or software - VMware can setup support calls that can include Oracle
If your OS and database don't know they are virtualized, don't tell them
Understand your workload types - if you don't know, how can you tune and configure??
  • Also allows you to combine VMs that may have offset time workloads (gaining more for my investment)
Seperate development from test from production environments
Trivia:
  • First use of the word "Nerd" - Doctor Seusse (If I ran to the zoo)
  • Americans eat the most food on Super Bowl Sunday
Have more VMs than less. Giant VMs with everything in it are harder to manage then smaller VMs. Better resource management and tuning options.
Storage - Spindle count and RAID configurations still rule
Know where your bottlenecks are at
VMFS vs RDM - perform about the same, valid reasons for using either
  • VMware recommends VMFS unless you have a really good reason
Thin Provision - first write penalty - use Thick Eager Zeroed for performance
Microosft recommends 1 datafile per CPU
PVSCSI adapters are high-performance - use them
80% of issues are performance or storage misconfiguration
  1. 1 issue is not enough spindles to support the app
vCPUs - count hyper threading as only .2 of a CPU (nearly nothing) when doing your calculations
Recommendation: 1-1 Ratio physical cores to vCPUs
Ntirety Rule - for SQL server
NUMA - Non-Uniform memory Access - size VMs to fit within a socket realm (ex. 128GB with 4 sockets would be <32GB optimal performance)
vNUMA (exposed to OS) is better than yNUMA (interleaving)
Swapping:
  • Guest VM
  • ESXi host level
Ballooning abd memory compression slows things down
  • Ballooning is good, don't shut it off, but there is a performance hit when it kicks in
How many VMs can fit on a host? As many as fit within active memory
Memory Reservations can lock out memory for critical VMs
Jumbo Frames are good, if you use them correctly
  • Have to configure on ESX, network switch, and application
  • If there is a constriction point, breaks down. Don't set Jumbo Frames larger than smallest bottleneck in the network path
  • Use VMXNET3 - reduces physical CPU overhead)
WSFC - Cluster Validation Wizard - should run before you call Microsoft Support
blog:

Sunday, August 24, 2014

VMworld 2014 - Backpack SWAG

Backpack SWAG
As silly as this sounds, the VMworld backpack SWAG is significantly better quality than last year's, and that makes me happy.  My old laptop bag was getting pretty old, so now I have a decent replacement.

VMworld 2014

VMworld 2014
Arrived at VMworld 2014 - excited to see what new things await us...