Author Archives: iaastalk

Sail Internet review

As the very first few guys installed Sail wireless internet service, I got way faster upload speed around 200Mbit/s, Download is around 300Mbit/s, still a lot faster than Comcast currently I have, especially considering the cost and no cap.

It took about an hour for the guy getting antenna&pole installed on rooftop. Ethernet cable goes from existing vent into the attic,  the guy from Sail helped me wire the cable into my central panel where I have hub for Ethernet cables going to other floors and rooms.

So it’s painless to switch from Comcast cable to Sail, simply unplug cable from cable modem and plugin the new ethernet cable from the POE adapter, which supplies power to the antenna on rooftop.

Not sure if there is any management interface/UI for end user to tell wireless status like signal strength etc.

The major drawback is network delay is way higher,  greater than 20ms, comparing to comcast cable which is usually bellow 10ms. It may not be best option for online gamers, or delay sensitive application. Personally download & upload speed are my priority, so no big deal.

Overall installation experience is very smooth, highly recommend if it’s available in your location.

I will see how things will go as more people start adopting Sail wireless service. I would assume the bandwidth is shared. So far so good..

Nova Placement ApI / NoValidHost

To find the gap when placement api db out of sync or keep receiving NoValidHost  error. There is great chance resources in placement db is out of track.   One change in nova scheduler is a memory.disk.cpu filters are there any more, instead nova scheduler ask placement api for a list HV candidates by flavor first, then nova scheduler continues the filtering like compute_filter etc…

DB query to get basic capacity details:

select p.uuid, p.name hypervisor_hostname,
(select i.total from inventories i where p.id=i.resource_provider_id and i.resource_class_id=0) vcpus,
(select sum(used) from allocations a where a.resource_provider_id=p.id and a.resource_class_id=0) vcpus_used,
(select i.total from inventories i where p.id=i.resource_provider_id and i.resource_class_id=1) memory_mb,
(select sum(used) from allocations a where a.resource_provider_id=p.id and a.resource_class_id=1) memory_mb_used,
(select i.total from inventories i where p.id=i.resource_provider_id and i.resource_class_id=2) local_gb,
(select sum(used) from allocations a where a.resource_provider_id=p.id and a.resource_class_id=2) local_gb_used,
(select count(id) from allocations a where a.resource_provider_id=p.id and a.resource_class_id=2) running_vms
from resource_providers p order by p.name;

 

 

combine nova db + placement

assume running this query in placement api db and  nova cellx db name is nova_cell_1

select x.hypervisor_hostname, x.vcpus p_vcpus, y.vcpus n_vcpus, x.vcpus_used p_vcpus_used,y.vcpus_used n_vcpus_used, x.memory_mb p_memory_mb, y.memory_mb n_memory_mb, x.memory_mb_used p_memory_mb_used, y.memory_mb_used n_memory_mb_used, x.local_gb p_local_gb, y.local_gb n_local_gb, x.local_gb_used p_local_gb_used, y.local_gb_used n_local_gb_used, x.running_vms p_running_vms, y.running_vms n_running_vms from
(select p.name hypervisor_hostname,
(select i.total from inventories i where p.id=i.resource_provider_id and i.resource_class_id=0) vcpus,
(select sum(used) from allocations a where a.resource_provider_id=p.id and a.resource_class_id=0) vcpus_used,
(select i.total from inventories i where p.id=i.resource_provider_id and i.resource_class_id=1) memory_mb,
(select sum(used) from allocations a where a.resource_provider_id=p.id and a.resource_class_id=1) memory_mb_used,
(select i.total from inventories i where p.id=i.resource_provider_id and i.resource_class_id=2) local_gb,
(select sum(used) from allocations a where a.resource_provider_id=p.id and a.resource_class_id=2) local_gb_used,
(select count(id) from allocations a where a.resource_provider_id=p.id and a.resource_class_id=2) running_vms
from resource_providers p) x
join
(select hypervisor_hostname, vcpus, vcpus_used, memory_mb, memory_mb_used, local_gb, local_gb_used, running_vms from nova_cell_1.compute_nodes where deleted=0 ) y
on x.hypervisor_hostname=y.hypervisor_hostname
order by x.hypervisor_hostname;

 

Any utility to rebuild db?

Hackintosh 8700k z370n rx580 10.13.6

hardware list

Motherboard : Gigabyte Z370N Wifi ITX

CPU:  Intel i7-8700K

CPU Fan: Le Grand Macho RT

Memory: CORSAIR – Vengeance LPX 32GB (2PK x 16GB) 2.6 GHz DDR4 (certified compatible with  z370n)

WIFI+Bluetooth board:  Broadcom BCM94352Z DW1560 6XRYC 802.11 AC 867 Mbps Bluetooth 4.0

Graphics : Sapphire NITRO+ Radeon RX 580 8GD5

PSU: CORSAIR RMx Series RM550x

Case: Fractal Design Define Nano S Black Silent Mini ITX Mini Tower

Keyboard: Apple magic keysboard & touchpad 2

Disk: XPG SX8200 PCIe NVMe Gen3x4 M.2 2280 480GB SSD (ASX8200NP-480GT-C) w/ Black XPG Heatsink

OS:  High Sierra 10.13.6

Replace wifi module on motherboard with DW1560 .

installed 10.13.6 without rx580  following guide at  https://hackintosher.com/builds/gigabyte-z370n-wifi-itx-hackintosh-guide-4k-htpc-build/  (system id 18.1 , with internal graphics )

After a successful install changed to 18.3 before shutdown

quoted:

“If using a dedicated Nvidia or AMD graphics card change ig-platform-id to 0x59120003 and use iMac 18,2/iMac 18,3”

disable internal graphics and change initial graphics to pcie-1  in BIOS

shutdown

(Do not try to boot into OS before everything competes. It will not work.)

install rx580  card

connect monitor to hdmi or DP, both working fine for me.

boot up

RX580 worked OOB, not patch applied. (make sure antenna is installed, otherwise connection will be very flaky even very close)

Before ordering parts please make sure:

  1. case is tall enough to hold the fan
  2. 8700 or 8700K both are ok,  depending on how much cheaper …
  3. no need for standalone graphics card if not playing game… Not sure how much it helps for video editing

 

 

 

 

Set up Spice Console

Basic Flow

Obtain an URL for the web based spice console

openstack console url show –spice UUID

+-------+-----------------------------------------------------------------------------------------------------+
| Field | Value                                                                                               |
+-------+-----------------------------------------------------------------------------------------------------+
| type  | spice-html5                                                                                         |
| url   | http://yyy.xxx.com/spice_auto.html?token=9654cb37-000-000-000-fbf30f17293b |
+-------+-----------------------------------------------------------------------------------------------------+

Stick the above url into a browser which will load spice console written in javascript, connecting back to spicehtml5proxy, which in turn forward traffics to the corresponding HV.

Obtain access url and token

request path

Client → Nova API → RPC (get_spice_console)→ Cell → RPC (get_spice_console) → HV

response path

Client ←(url+Token) ←  Nova API  ← Cell (Save token → connection info  into Console Auth Store with a TTL) ←  (host, port, url+token, token )  ← HV

On browser:

URL allows loading app

round 1.

browser → (url with token) → vip → spicehtml5proxy ( from default.web)

browser  ← vip ← (spice web based client) ← spicehtml5proxy

2.

browser ← → (websocket based spice traffics + token) ← →vip  ← →  spicehtml5proxy (obtain connection info by token from Console Auth) ←  (forward traffics)  → HV

* python >=2.74 is required on controller

Nova Controller setting

[default]
web = /usr/share/spice-html5 (this is default location from where all web contents get loaded.)
[vnc]
enabled = False
[spice]
agent_enabled = True (optional)
enabled = True
html5proxy_base_url =  http://os-vnc-vip-b01.ccg23.paypalc3.com/spice_auto.html (URL end user can reach from browser, VIP etc...)
#server_listen  (used by HV only)
#server_proxyclient_address  (used by HV only)
html5proxy_host = controller IP proxy listen at, should be reachable from VIP
html5proxy_host = port proxy listening at
[console]
allowed_origins=http://os-vnc-vip-b01.ccg23.paypalc3.com (To be save in case LB modified Origin or Host)
token_ttl =  set up token ttl here

Hypervisor setting

HV will return connection details like where to connect back from proxy including host and port.

Token is also generated on HV, and cell manager add it into console Auth later, so is full access url with token, so html5proxy_base_url is required on every HV as well.

[vnc]
enabled = False
[spice]
agent_enabled = True (optional)
enabled = True
server_listen = IP reachable form controller
server_proxyclient_address  =  hostname or IP reachable form controller
server_listen & server_proxyclient_address needs to be identical
html5proxy_base_url = (URL end user can reach from browser, VIP etc...)

Console-auth & token store backend

By default Console-auth save everything in local dictionary. So each spicehtml5proxy or cell manager ( when authorize a new token) may talk to different console-auth which maintains it’s own independent token cache.

Fortunately token store is implemented with oslo_cache library  with caching back-end configurable.

In production a memcachd should be used so that console-auth can scale.

[cache]
enabled = True
memcache_servers = .....
[consoleauth]
token_ttl =  set up token ttl here

more details regarding  oslo cache configuration

https://docs.openstack.org/oslo.cache/latest/configuration/index.html

3 VIP/LB

Last, but not the least, Vip/LB has to  support either websocket, if doing L7, or just plain TCP,  since spice traffics requires upgrading http to websocket.

Openstack Nova Live Migration Security SASL

Live migration set up

block based migration or leaving it with default

# /etc/nova/nova.conf

block_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_TUNNELLED,VIR_MIGRATE_NON_SHARED_INC,VIR_MIGRATE_AUTO_CONVERGE

====TCP based======

/etc/nova/nova.conf

——————

..

[libvirt]

live_migration_uri = “qemu+tcp://%s/system”

..

/etc/libvirt/libvirtd.conf

——————

listen_tls = 0

listen_tcp = 1

unix_sock_group = “libvirt”

unix_sock_ro_perms = “0777”

unix_sock_rw

_perms = “0770”

auth_unix_ro = “none”

auth_unix_rw = “none”

auth_tcp = “none”

/etc/default/libvirtd

——————————

start_libvirtd=”yes”

libvirtd_opts=”-l”

*restart both libvirt-bin & nova-compute *

testing

virsh -c qemu+tcp://hostname_peer/system hostname

====TCP+SASL ======

install

apt update

apt install sasl2-bin

/etc/nova/nova.conf

——————

..

[libvirt]

live_migration_uri = “qemu+tcp://%s/system”

..

/etc/libvirt/libvirtd.conf

——————

listen_tls = 0

listen_tcp = 1

unix_sock_group = “libvirt”

unix_sock_ro_perms = “0777”

unix_sock_rw

_perms = “0770”

auth_unix_ro = “none”

auth_unix_rw = “none”

auth_tcp = “sasl”

/etc/default/libvirtd

——————————

start_libvirtd=”yes”

libvirtd_opts=”-l”

/etc/sasl2/libvirt.conf

—————————————

mech_list: digest-md5

sasldb_path: /etc/sasldb2

create user test and give nova as password

———

saslpasswd2 -a libvirt test

sasldblistusers2 -f /etc/sasldb2

enable libvirt client auto authenticate without prompt

——

/etc/libvirt/auth.conf

——————————

[credentials-defgrp]

authname=test

password=nova

[auth-libvirt-default]

credentials=defgrp

*restart both libvirt-bin & nova-compute *

test, should just work, with not prompt for user&password

virsh -c qemu+tcp://hostname_peer/system hostname

=====================

====TLS full validation ======

key tools

apt-get install gnutls-bin

/etc/nova/nova.conf

——————

..

[libvirt]

live_migration_uri = “qemu+tls://%s/system”

..

/etc/libvirt/libvirtd.conf

——————

listen_tls = 1

tls_no_verify_certificate = 0

tls_no_verify_address = 0

listen_tcp = 0

unix_sock_group = “libvirt”

unix_sock_ro_perms = “0777”

unix_sock_rw

_perms = “0770”

auth_unix_ro = “none”

auth_unix_rw = “none”

auth_tls = “none”

/etc/default/libvirtd

——————————

start_libvirtd=”yes”

libvirtd_opts=”-l”

generate key pairs & certs following

https://wiki.libvirt.org/page/TLSSetup

service nova-compute restart

virsh -c qemu+tls://venus-2/system hostname

virsh -c qemu+tls://venus-6/system hostname

*restart both libvirt-bin & nova-compute *

test, should just work

virsh -c qemu+tls://hostname_peer/system hostname

======================

===TLS no validation ======

key tools

apt-get install gnutls-bin

/etc/nova/nova.conf

——————

..

[libvirt]

live_migration_uri = “qemu+tls://%s/system”

..

/etc/libvirt/libvirtd.conf

——————

listen_tls = 1

tls_no_verify_certificate = 1

tls_no_verify_address = 1

listen_tcp = 0

unix_sock_group = “libvirt”

unix_sock_ro_perms = “0777”

unix_sock_rw

_perms = “0770”

auth_unix_ro = “none”

auth_unix_rw = “none”

auth_tls = “none”

/etc/default/libvirtd

——————————

start_libvirtd=”yes”

libvirtd_opts=”-l”

generate key pairs & certs following

https://wiki.libvirt.org/page/TLSSetup

service libvirt-bin restart

service nova-compute restart

virsh -c qemu+tls://host-1/system hostname

virsh -c qemu+tls://host-2/system hostname

*restart both libvirt-bin & nova-compute *

testing

virsh -c “qemu+tls://host-1/system?no_verify=1” hostname

virsh -c “qemu+tls://host-2/system?no_verify=1” hostname

 

========TLS no validation + SASL========

following TLS no validation

a few changes

/etc/libvirt/libvirtd.conf

——————

listen_tls = 1

tls_no_verify_certificate = 1

tls_no_verify_address = 1

listen_tcp = 0

unix_sock_group = “libvirt”

unix_sock_ro_perms = “0777”

unix_sock_rw

_perms = “0770”

auth_unix_ro = “none”

auth_unix_rw = “none”

auth_tls = “sasl”

/etc/sasl2/libvirt.conf

—————————————

mech_list: digest-md5

sasldb_path: /etc/sasldb2

  • scram-sha-1 requires properly signed certs

create user test and give nova as password

———

saslpasswd2 -a libvirt test

sasldblistusers2 -f /etc/sasldb2

libvirt client auto authenticate without prompt

——

/etc/libvirt/auth.conf

——————————

[credentials-defgrp]

authname=test

password=nova

[auth-libvirt-default]

credentials=defgrp

*restart both libvirt-bin & nova-compute *

testing

virsh -c “qemu+tls://host-1/system?no_verify=1” hostname

====ssh tunneling ======

No libvirt change required

only nova.conf and generate keys for user nova

nova.conf

live_migration_uri = “qemu+ssh://nova@%s/system?no_verify=1&keyfile=/var/lib/nova/.ssh/id_rsa”

see user nova  and path to its private key file is specified in the url

add public keys to authenticated hosts files.

Note: nova user login shell required.. further research required to limit set of binaries..

Enable live migration using local disk with Nova

Instructions:

  • Enable key based authentication on destination HV, allowing root login from source into destination using key.
  • Modify nova configuration to pass extra arguments to libvirt.
  • restart only nova-compute
  • launching live migration from nova CLI.

Verify /etc/nova/nova.conf on main controller if no cell, or /etc/nova/nova.conf on cell controllers to make sure using uuid as VM name, otherwise nova will run into issue complaining no VM with name ‘instance-xxxx’, in case primary key changed in nova database. Please make sure instance_name_template is using uuid in default section:

Force using uuid as instance name

#/etc/nova/nova.conf

instance_name_template=%(uuid)s

# requires restarting nova-conductor on main controllers or Cell controllers, depending where change is made.

Verify both short name and fqdn of destination are resolvable, otherwise add to local hosts file /etc/hosts
in case hostname can not be properly resolved
#/etc/hosts

192.168.1.100  test1 test1.localdomain.com
Generate key pairs for user root on source HV.
generate key pair for user root
# enable key based root access on destination
# this is done on source
# generate keys
ssh-keygen -t rsa

Copy public keys to remote destination
Transfer public key to destination
# copy keys to destination
ssh-copy-id root@test1

Verify if we can ssh to destination with short name as root to make sure key based authentication working, and same time accepting and persist destination host id on source.
Verify key based authentication and accept host key on destination
ssh root@test1

Modify /etc/nova/nova.conf to enable live migration over ssh tunneling and local drive based live migration.
# /etc/nova/nova.conf
[DEFAULT]

live_migration_uri=qemu+ssh://%s/system
live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_NON_SHARED_INC
block_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER,VIR_MIGRATE_LIVE,VIR_MIGRATE_NON_SHARED_INC

Restart nova-compute

Restart nova-compute service
service nova-compute restart

Now, migration through Nova CLI should work.

Nova Cli
nova live-migration –block-migrate a0a4f38d-ee65-4126-a90f-f4dcbb12d7e6 hv2
VM state will be changed to migrating, and back to active again when process completes.
Concern:

Test migration across cell is not supported.

Havana nova-compute has bug dealing mount with nbd. This requires code change:
Code fix in Havana version of Nova-compute
nova/virt/disk/mount/nbd.py
# added self.partition = 1 at line 72

if not os.path.exists(‘/sys/block/nbd0’):
LOG.error(_(‘nbd module not loaded’))
self.error = _(‘nbd unavailable: module not loaded’)
return None
devices = self._detect_nbd_devices()
random.shuffle(devices)
device = self._find_unused(devices)
if not device:
# really want to log this info, not raise
self.error = _(‘No free nbd devices’)
return None
self.partition = 1
return os.path.join(‘/dev’, device)
….

# then restart nova-compute

quick qmailtoaster migrate to Postfix

assume you are running example.com with qmailtoaster.

  1. install latest iredmail which significantly simplify the whole process of installation and configuration.
  2. dump and import vpopmail  into new db.
  3. copy  /home/vpopmail/domains/* to /var/vmail1
  4. chown -R vmail:vmail /var/vmail1
  5. run following to move all the accounts into postfix vmail.mailbox table.
  6. done

 

INSERT INTO vmail.mailbox ( username      , password                               , name      ,language  , storagebasedirectory   , storagenode   , maildir                , quota , domain           , transport , department , rank     , employeeid , isadmin , isglobaladmin , enablesmtp , enablesmtpsecured , enablepop3 , enablepop3secured , enableimap , enableimapsecured , enabledeliver , enablelda , enablemanagesieve , enablemanagesievesecured , enablesieve , enablesievesecured , enableinternal , enabledoveadm , `enablelib-storage` , `enableindexer-worker` , enablelmtp , enabledsync , enablesogo  , lastlogindate       , lastloginipv4 , lastloginprotocol , settings                  , passwordlastchange  , created , modified, expired               , active , local_part )
SELECT                    concat(r.pw_name,'@example.com') , r.pw_passwd , r.pw_gecos, 'en_US'  , '/var/vmail'           , 'vmail1'      , substring(r.pw_dir,24) ,     0 ,'example.com' , ''        , ''         , 'normal' , ''       ,       0 ,             0 ,          1 ,                 1 ,          1 ,                 1 ,          1 ,                 1 ,             1 ,         1 ,                 1 ,                        1 ,           1 ,                  1 ,              1 ,             1 ,                 1 ,                    1 ,          1 ,           1 ,          1  , '1970-01-01 01:01:01' ,           0 , ''                , '' , '1970-01-01 01:01:01' , now()   , now()   , '9999-12-31 00:00:00' ,      1 , r.pw_name
FROM vpopmail.example_com r;

Havana message reorder fix

Probably a few people are still running Havana. A critical messaging related issue is message reordering.

Nova RPC sending back results in multiple messages which may go through different connections and reach a node running master queue owner through different routes in Rabbitmq cluster.

RPC results may become corrupted and incomplete.

You will see at least two messages will be sent out for a single RPC result. Each message was sent using a connection grabbed from pool on the fly.

nova/openstack/common/rpc/amqp.py

    
   def _process_data(self, ctxt, version, method, namespace, args):
        ctxt.update_store()

        try:
            rval = self.proxy.dispatch(ctxt, version, method, namespace,
                                       **args)
            # Check if the result was a generator
            if inspect.isgenerator(rval):
                for x in rval:
                    ctxt.reply(x, None, connection_pool=self.connection_pool)
            else:
                ctxt.reply(rval, None, connection_pool=self.connection_pool)
            # This final None tells multicall that it is done.
            ctxt.reply(ending=True, connection_pool=self.connection_pool) 

class RpcContext(rpc_common.CommonRpcContext):

    def reply(self, reply=None, failure=None, ending=False,
              connection_pool=None, log_failure=True):
        if self.msg_id:
            msg_reply(self.conf, self.msg_id, self.reply_q, connection_pool,
                      reply, failure, ending, log_failure)
            if ending:
                self.msg_id = None
def msg_reply(conf, msg_id, reply_q, connection_pool, reply=None,
              failure=None, ending=False, log_failure=True):
    ....
    with ConnectionContext(conf, connection_pool) as conn:
    ....
        if reply_q:
            msg['_msg_id'] = msg_id
            conn.direct_send(reply_q, rpc_common.serialize_msg(msg))
        else:
            conn.direct_send(msg_id, rpc_common.serialize_msg(msg))

A quick fix is to force all messages for the same reply going through same connection.

class RpcContext(rpc_common.CommonRpcContext):
    def reply2(self, reply=None, connection_pool=None):
        if self.msg_id:
            msg_reply2(self.conf,
                                             self.msg_id,
                                             self.reply_q,
                                             connection_pool,
                                             reply)
            self.msg_id = None



def msg_reply2(conf, msg_id, reply_q,
                                     connection_pool, reply=None):
    def reply_msg(content, ending, conn):
        msg = {'result': content, 'failure': None}
        if ending:
            msg['ending'] = True
        _add_unique_id(msg)
        if reply_q:
            msg['_msg_id'] = msg_id
            conn.direct_send(reply_q, rpc_common.serialize_msg(msg))
        else:
            conn.direct_send(msg_id, rpc_common.serialize_msg(msg))

    with ConnectionContext(conf, connection_pool) as conn:
    # Check if the result was a generator
        if inspect.isgenerator(reply):
            for x in reply:
                reply_msg(x, False, conn)
        else:
            reply_msg(reply, False, conn)
        reply_msg(None, True, conn)



class ProxyCallback(_ThreadPoolWithWait):

    def _process_data(self, ctxt, version, method, namespace, args):
        """Process a message in a new thread.

        If the proxy object we have has a dispatch method
        (see rpc.dispatcher.RpcDispatcher), pass it the version,
        method, and args and let it dispatch as appropriate.  If not, use
        the old behavior of magically calling the specified method on the
        proxy we have here.
        """
        ctxt.update_store()
        try:
            rval = self.proxy.dispatch(ctxt, version, method, namespace,
                                       **args)
            ctxt.reply2(rval, self.connection_pool)
        except rpc_common.ClientException as e:
            LOG.debug(_('Expected exception during message handling (%s)') %
                      e._exc_info[1])
            ctxt.reply(None, e._exc_info,
                       connection_pool=self.connection_pool,
                       log_failure=False)
        except Exception:
            # sys.exc_info() is deleted by LOG.exception().
            exc_info = sys.exc_info()
            LOG.error(_('Exception during message handling'),
                      exc_info=exc_info)
            ctxt.reply(None, exc_info, connection_pool=self.connection_pool)


A more robust fix should be reconstruct results based on sequence number and total number of messages. then fix alwasy need to handle timeout etc.

Kilo has already fixed this issue by returning result in one message.

nova suspend issue

nova suspend ended up all error.

look at libvirt log seeing error :  qemuMigrationUpdateJobStatus:946 : operation failed: domain save job: unexpectedly failed

Found images were generated in the following directory

/var/lib/libvirt/qemu/save

/var is mounted on a separate partition with limited space. run out of save on /var again…

create a soft link to a different