流程-虚拟机疏散流程¶
注意:疏散机制在高版本中才可以使用,低版本中有 bug,使用前需要确认资源池版本,不应该低于6.4.3.0.0.0,可以参考链接5.版本缺陷记录 上面有版本缺陷记录。
什么情况下会选择疏散¶
当硬件维护或者其他原因导致的计算节点宕机时,我们可以将虚拟机疏散到其他计算节点,使之保持运行中。
使用命令 nova service-list | grep sh03-compute-10e114e56e89 查看nova-compute服务是否处于 down 状态,如果处于 down 状态,可以使用 疏散命令。
具体使用方法¶
1)在疏散之前,需要把故障机器管理的卷迁移走
如果要疏散虚机挂载的卷正好被故障机器管理,则疏散过程中会出现错误。
使用命令:
在控制节点执行
cinder --os-volume-api-version 3.33 list --filters host=sh03-compute-10e114e56e89 --all
查看是否有管理的卷,如果有,使用如下命令迁移卷,目标节点是正常节点。
cinder-manage --config-file /etc/cinder/cinder.conf volume update_host --currenthost sh03-compute-10e114e56e89@ceph --newhost sh03-compute-10e114e56e141@ceph
cinder-manage --config-file /etc/cinder/cinder.conf backup update_backup_host --currenthost sh03-compute-10e114e56e89 --newhost sh03-compute-10e114e56e141
注意:上述命令中尤其是主机名参数一定不能错,因为该命令参数出错也不会报错。
2) 再次验证故障机器上是否有管理的卷
cinder --os-volume-api-version 3.33 list --filters host=sh03-compute-10e114e56e89 --all
输出空,则说明成功迁移走了卷。
3)两种疏散方式
一种是单台疏散,即{{nova evacuate}}命令,另一种是使用命令{{nova host-evacuate}}疏散整个计算节点上的虚机。后者首先查询计算节点上的虚机列表,然后循环执行evacuate操作。
[root@gd02-control-11e115e64e13 ~]# nova help evacuate
usage: nova evacuate [--password <password>] [--force] <server> [<host>]
Evacuate server from failed host.
Positional arguments:
<server> Name or ID of server.
<host> Name or ID of the target host. If no host is
specified, the scheduler will choose one.
Optional arguments:
--password <password> Set the provided admin password on the evacuated
server. Not applicable if the server is on shared
storage.
--force Force to not verify the scheduler if a host is
provided. (Supported by API versions '2.29' -
'2.latest')
[root@gd02-control-11e115e64e13 ~]# nova help host-evacuate
usage: nova host-evacuate [--target_host <target_host>] [--force] <host>
Evacuate all instances from failed host.
Positional arguments:
<host> The hypervisor hostname (or pattern) to search
for. WARNING: Use a fully qualified domain name
if you only want to evacuate from a specific
host.
Optional arguments:
--target_host <target_host> Name of target host. If no host is specified
the scheduler will select a target.
--force Force to not verify the scheduler if a host is
provided. (Supported by API versions '2.29' -
'2.latest')
示例演示:
a.单个疏散¶
[root@hb02-other-172e28e8e132 ~]# nova evacuate ea24caf2-5e99-43f5-838f-164bb16891c0 hb02-other-172e28e8e139
ERROR (BadRequest): Compute service of hb02-other-172e28e8e137 is still in use. (HTTP 400) (Request-ID: req-683aee56-af5a-40eb-88c6-c18ebe4890df)
[root@hb02-other-172e28e8e132 ~]# nova hypervisor-list
+--------------------------------------+-------------------------+-------+---------+
| ID | Hypervisor hostname | State | Status |
+--------------------------------------+-------------------------+-------+---------+
| 29571b62-cd25-42bf-8bee-79a60db54032 | hb02-other-172e28e8e137 | down | enabled |
| 4f62f8d2-9fc5-4555-b362-8e99dc421181 | hb02-other-172e28e8e138 | up | enabled |
| a1a2e14b-ffeb-4a56-895b-56969a1e4de6 | hb02-other-172e28e8e139 | up | enabled |
+--------------------------------------+-------------------------+-------+---------+
[root@hb02-other-172e28e8e132 ~]# nova list --all --host hb02-other-172e28e8e137
+--------------------------------------+---------+----------------------------------+--------+------------+-------------+----------------------+
| ID | Name | Tenant ID | Status | Task State | Power State | Networks |
+--------------------------------------+---------+----------------------------------+--------+------------+-------------+----------------------+
| ea24caf2-5e99-43f5-838f-164bb16891c0 | dawei_2 | 9e5b5032812940d0830fe674517d5f66 | ACTIVE | - | Running | test1=192.168.101.15 |
+--------------------------------------+---------+----------------------------------+--------+------------+-------------+----------------------+
[root@hb02-other-172e28e8e132 ~]# nova evacuate ea24caf2-5e99-43f5-838f-164bb16891c0 hb02-other-172e28e8e139
[root@hb02-other-172e28e8e132 ~]# nova list --all --host hb02-other-172e28e8e139
+--------------------------------------+---------+----------------------------------+--------+------------+-------------+----------------------+
| ID | Name | Tenant ID | Status | Task State | Power State | Networks |
+--------------------------------------+---------+----------------------------------+--------+------------+-------------+----------------------+
| ea24caf2-5e99-43f5-838f-164bb16891c0 | dawei_2 | 9e5b5032812940d0830fe674517d5f66 | ACTI
VE | - | Running | test1=192.168.101.15 |
+--------------------------------------+---------+----------------------------------+--------+------------+-------------+----------------------+
b.批量疏散¶
[root@sh03-control-10e114e56e42 ~]# nova host-evacuate --target_host sh03-compute-10e114e56e141 sh03-compute-10e114e56e89 --force
+--------------------------------------+-------------------+---------------+
| Server UUID | Evacuate Accepted | Error Message |
+--------------------------------------+-------------------+---------------+
| 4df2296a-536e-44a3-afcd-3750408856ee | True | |
| b4244c8d-00fe-4811-8238-c194dbd55376 | True | |
| 84136335-f480-4208-97cb-6e8d8542210d | True | |
| d98e1e61-4046-4a6e-b9f8-f13c90500026 | True | |
| 99f81672-0e37-4e06-949b-16b3412a7388 | True | |
| 5d7c6262-ab95-4bc7-8405-774f982b183a | True | |
| 06523349-ce43-4304-b6b1-948d7c639f7b | True | |
| 45868bf9-0440-477f-ba42-8029a5f824fe | True | |
| 571a77a0-653d-4dca-8a97-c7e675c10fa5 | True | |
| 54215bcc-2ec6-4e5a-ba31-2b01dff2e0d2 | True | |
| 4dfcd1d8-1e7f-46f7-8dbb-e571ba54268e | True | |
| ff0d4f74-1c89-4bd9-ace8-f1986abb96a8 | True | |
| fa804181-1073-4c5b-a774-2e49188556c6 | True | |
| 44c4b9b9-65e3-4b12-9a7c-105ec950e835 | True | |
| a455c316-786c-485a-87e8-d481736d747a | True | |
| 30bedfc4-dd75-4288-b9f7-3e9488370d08 | True | |
| 78a83bfc-3038-41b6-9ec0-7883960db0bc | True | |
| 69aafa11-4cab-4b1e-bf62-b98fdfe059f3 | True | |
| 8e8fb657-9e1c-492e-bbd2-ce3bf940164a | True | |
| 6a66fad7-87c6-47c7-8cf2-1fff99ce43b2 | True | |
| 1a726189-a5f7-474f-9d9f-5c5587fade74 | True | |
+--------------------------------------+-------------------+---------------+
注意:上述疏散命令是将故障机器上所有机器疏散到目标节点,因此需要确认目标节点资源是否充足,目标节点的vcpu和内存要评估好,否则的就需要先手动疏散一部分,然后再整机疏散。
小结¶
- evacuate 相当于根据数据库中instance的信息,在另一台host上创建一台一样的虚机,注意,共享存储的疏散才有意义。如果是本地盘,当物理机宕机时,本地盘无法迁移。
- nova client有 {{nova evacuate}}和{{nova host-evacuate}}两个command,后者是循环对每个server执行evacuate。