Hi Andreas
I think I can rule out "normal" operations. - The same user or task or script deleted all VMs on 3 hosts with 6 datastores before resetting the partitiontables, reformatting the vmfs-volumes and resetting the ESXi configs. When he or ite was done 3 hosts rebooted and came up without network config and blank datastores.
On at least one of those hosts random Windows files were written into the section where we would expect VMFS metadata - I found a MFT-mirror in the place where I would expect the .vh.sf
Weird - really weird - I need to find out if the commands were entered manually via ssh or wether it was a cronjob or interaction via some API.
Here is another example:
2013-08-05T12:58:01.192Z [64FC2B90 verbose 'Vmsvc' opID=9042E070-00000053] Released Vm Id: 4.
2013-08-05T12:58:01.193Z [64FC2B90 verbose 'HostsvcPlugin' opID=9042E070-00000053] RemoveEntry '4'
2013-08-05T12:58:01.193Z [64FC2B90 verbose 'HostsvcPlugin' opID=9042E070-00000053] RemoveEntry succeeded
2013-08-05T12:58:01.193Z [64FC2B90 verbose 'ResourcePool ha-root-pool' opID=9042E070-00000053] Removed child 4 from pool
2013-08-05T12:58:01.193Z [64FC2B90 verbose 'HostsvcPlugin' opID=9042E070-00000053] Security domain hostd4 not found
2013-08-05T12:58:01.195Z [64C8FB90 info 'GuestFileTransferImpl'] VmOperationListener: unregister notification received for VM: 4
2013-08-05T12:58:01.195Z [64C8FB90 info 'GuestFileTransferImpl'] VmOperationListener succeeded
2013-08-05T12:58:01.196Z [64C8FB90 info 'Hbrsvc'] Replicator: UnregisterListener triggered for config VM 4
2013-08-05T12:58:01.196Z [64C8FB90 verbose 'Statssvc'] EntityRemovedListener: Deleting stats for entity 4
2013-08-05T12:58:01.210Z [64A58B90 info 'Libs'] Vix: [6796 foundryVMPowerOps.c:973]: FoundryVMPowerStateChangeCallback: /vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx, vmx/execState/val = poweredOff.
2013-08-05T12:58:01.352Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/564d3d8a-33cb-3aeb-1385-5c2b15747102.vmem'.
2013-08-05T12:58:01.352Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/564d3d8a-33cb-3aeb-1385-5c2b15747102.vmem.lck'.
2013-08-05T12:58:01.872Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.kstats'.
2013-08-05T12:58:01.873Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.kstats32'.
2013-08-05T12:58:01.873Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.kstats64'.
2013-08-05T12:58:01.873Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor'.
2013-08-05T12:58:01.874Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor32'.
2013-08-05T12:58:01.874Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor32.1'.
2013-08-05T12:58:01.874Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor64'.
2013-08-05T12:58:01.874Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor64.1'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.perf'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.perf_kstats'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor32.0'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor64.0'.
2013-08-05T12:58:01.875Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/status'.
2013-08-05T12:58:01.876Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vprintproxy.log'.
2013-08-05T12:58:01.876Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/autoinst.flp'.
2013-08-05T12:58:01.876Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/autoinst.iso'.
2013-08-05T12:58:01.876Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/quicklook-cache.png'.
2013-08-05T12:58:01.877Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname-7d38e0c8.vswp'.
2013-08-05T12:58:01.877Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-stats.log'.
2013-08-05T12:58:01.888Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware.log'.
2013-08-05T12:58:01.890Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-6.log'.
2013-08-05T12:58:01.909Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-7.log'.
2013-08-05T12:58:01.911Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-5.log'.
2013-08-05T12:58:01.912Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-4.log'.
2013-08-05T12:58:01.914Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-3.log'.
2013-08-05T12:58:01.915Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/vmware-2.log'.
2013-08-05T12:58:01.916Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmxf'.
2013-08-05T12:58:01.917Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.nvram'.
2013-08-05T12:58:01.921Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmsd'.
2013-08-05T12:58:01.922Z [64A58B90 info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx'.
2013-08-05T12:58:01.932Z [64FC2B90 info 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Failed to unset VM medatadata: FileIO error: Could not find file : /vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname-aux.xml.tmp.
2013-08-05T12:58:01.934Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] DeleteVmDirectory: Deleting vm dir (as superuser) '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname'
2013-08-05T12:58:01.934Z [64F81B90 verbose 'Hostsvc::DatastoreSystem'] Datastore-Vdisk refresh: scheduling thread
2013-08-05T12:58:01.934Z [64FC2B90 info 'TaskManager' opID=9042E070-00000053] Task Completed : haTask-4-vim.ManagedEntity.destroy-104631949 Status success
2013-08-05T12:58:01.934Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Close Handle called
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Shutting down VMDB service...
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Unregistering callback...
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] ...done
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Unsubscribed from events.
2013-08-05T12:58:01.935Z [64FC2B90 info 'Libs' opID=9042E070-00000053] Vix: [20705 foundryVM.c:10650]: Error VIX_E_INVALID_ARG in VixVM_CancelOps(): One of the parameters was invalid
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Canceled outstanding Foundry operations.
2013-08-05T12:58:01.935Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Released VM handle.
2013-08-05T12:58:01.936Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Closed VM handle.
2013-08-05T12:58:01.936Z [64FC2B90 info 'ha-eventmgr' opID=9042E070-00000053] Event 272 : Removed replacedname (Backup) on blablabla.local from ha-datacenter
2013-08-05T12:58:01.936Z [64FC2B90 info 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] State Transition (VM_STATE_DELETING -> VM_STATE_GONE)
2013-08-05T12:58:01.936Z [64FC2B90 verbose 'ha-host' opID=9042E070-00000053] ModeMgr::End: op = normal, current = normal, count = 5
2013-08-05T12:58:01.936Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx' opID=9042E070-00000053] Destroy VM complete
2013-08-05T12:58:01.937Z [64FC2B90 info 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx'] Virtual machine object cleanup
2013-08-05T12:58:01.937Z [64FC2B90 verbose 'vm:/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/replacedname.vmx'] Closed VM handle.
pam_per_user: create_subrequest_handle(): doing map lookup for user "root"
pam_per_user: create_subrequest_handle(): creating new subrequest (user="root", service="system-auth-generic")
Accepted password for user root from 10.49.2.58
2013-08-05T12:58:16.889Z [FFF10A90 info 'Vimsvc'] [Auth]: User root
2013-08-05T12:58:16.889Z [FFF10A90 info 'ha-eventmgr'] Event 273 : User root@10.49.2.58 logged in
pam_per_user: create_subrequest_handle(): doing map lookup for user "root"
pam_per_user: create_subrequest_handle(): creating new subrequest (user="root", service="system-auth-generic")
Accepted password for user root from 10.49.2.58
2013-08-05T12:58:17.228Z [64840B90 info 'Vimsvc'] [Auth]: User root
2013-08-05T12:58:17.228Z [64840B90 info 'ha-eventmgr'] Event 274 : User root@10.49.2.58 logged in
2013-08-05T12:58:17.370Z [FFFB4B90 verbose 'Default'] CloseSession called for session id=520a9289-5eb6-acb7-50ab-b1042027528d
2013-08-05T12:58:17.370Z [FFFB4B90 info 'ha-eventmgr'] Event 275 : User root logged out
...
I have not seen a hostd log entry like:
info 'Libs'] SNAPSHOT: SnapshotDeleteFile deleted '/vmfs/volumes/518ccb24-ba8547cb-132a-90b11c496b72/A4/replacedname/gmon.monitor32'.
before.
This issue is keeping me busy since a few weeks - in that time I have restored. about 100 critical VMs from unreadable vmfs-volumes.
Monday ESXi host number 7, 8 and 9 were affected. It almost always happens following this pattern:
the hosts lose their network config, reboot and come up with empty datastores or unpartitioned volumes.
Good practice for me by the way but the customer is getting sick of it - this is the first time I was able find the IP of the host that seems to be responsible - needless to say that the IP is no longer active.