- 12 Jan, 2022 1 commit
-
-
Harald Jensås authored
On neutron routed provider networks IP allocation is deferred until 'binding:host_id' is set. When ironic creates neutron ports it first creates the port, then updates the port setting binding information. When using IPv6 networking ironic adds additional address allocations to ensure network chain-booting will succeed. When address allocation is deferred on port create ironic cannot detect that IPv6 is used and does not add the required additional addresses. This change ensures the 'port' object is updated after the port update setting the port binding required for neutron to allocate the address. This allows ironic to correctly detect IPv6 is used, and it will add the required IP address allocations. Story: 2009773 Task: 44254 Change-Id: I863dd4ab9615a9ce3b3dcb8798af674ac9966bf2 (cherry picked from commit 3404dc91)
-
- 09 Dec, 2021 1 commit
-
-
Julia Kreger authored
We're seeing OOM events in CI, hopefully this helps. Change-Id: Id8c0e4830011ca2fa526df461ed5b9b01f769cf9
-
- 08 Nov, 2021 1 commit
-
-
Aija Jauntēva authored
As with WS-Man iDRAC API setting boot device requires creating BIOS job and there can be only 1 open job per subsystem present in iDRAC, there is validation to check that the job queue is empty before continuing setting boot device. This does not work well for cases when using autoupdatescheduler that creates `Repository Update` job that stays Scheduled until executed and then followed by new Scheduled `Repository Update` job. This patch allows non-BIOS jobs to be present in the queue when setting boot device. This will still fail for cases when there are BIOS jobs present. In such cases should consider moving to idrac-redfish that does not create BIOS or any other job to set boot device. Story: 2009251 Task: 43437 Change-Id: I91e9ba3024a85897aeead21cede57464294b409b (cherry picked from commit b1d08ae8)
-
- 14 Sep, 2021 1 commit
-
-
Aija Jauntēva authored
set_power_state has returned to the caller immediately without confirming the system has reached the requested state. This fixes that by synchronously waiting until the target state has been read before returning. That bug can cause instance workload deployments to fail on Dell EMC PowerEdge server models on which IPA ramdisk soft power off fails and ironic employs its OOB fallback strategy. After an otherwise successful deployment, the node is active, but is powered off. No error is reported in last_error. If the subsequent instance workflow expects the system to be powered on into the operating system, it fails. Story: 2009204 Task: 43261 Change-Id: I3112a22149c07e5508f26c79f33d09aeb905c308 (cherry picked from commit 2a0fd1d1)
-
- 03 Aug, 2021 1 commit
-
-
Dmitry Tantsur authored
Also fix the documentation to use the correct paths and versions. Change-Id: I7f004d40c1b8c617f9a456216df091e44d69693f (cherry picked from commit 294046be)
-
- 29 Jul, 2021 1 commit
-
-
Zuul authored
-
- 27 Jul, 2021 1 commit
-
-
Riccardo Pittau authored
All the tox jobs are based on openstack-tox, we should convert ironic-tox-unit-with-driver-libs too. Change-Id: I20836d586edccfb8cd8fed1f3a89f1497ff96943 (cherry picked from commit 475af371)
-
- 26 Jul, 2021 1 commit
-
-
Dmitry Tantsur authored
Actually, it's only available on RHEL/CentOS 8, but I hope the other distributions will catch up. Change-Id: I53314b8f16fd7b965c58370e33ab83501e7cb067 (cherry picked from commit 3199d289)
-
- 05 Jul, 2021 1 commit
-
-
Dmitry Tantsur authored
In order to avoid potential cache coherency issues when using a globally cached AgentClient, e.g. with TSL certificates from the IPA, cache the AgentClient on a per task basis. Co-Authored-By:
Arne Wiebalck <arne.wiebalck@cern.ch> Conflicts: ironic/drivers/modules/agent.py ironic/drivers/modules/agent_base.py ironic/drivers/modules/ansible/deploy.py ironic/drivers/modules/iscsi_deploy.py ironic/tests/unit/drivers/modules/test_agent.py Story: #2009004 Task: #42678 Change-Id: I0c458c8d9ae673181beb6d85c2ee68235ccef239 (cherry picked from commit fcb6a109)
-
- 29 Jun, 2021 1 commit
-
-
Dhuldev Valekar authored
Fixes an issue of powering off with the ``idrac-wsman`` management interface while the execution of a clear job queue cleaning step is proceeding. Prior to this fix, the clean step would fail when powering off a node. Story: 2008988 Task: 42641 Change-Id: Ib4ab755c806f028d97379b80a8c27d6ade63cba1 (cherry picked from commit 741a4d8a)
-
- 21 Jun, 2021 1 commit
-
-
Julia Kreger authored
The instance_uuid handling on the detailed node information endpoint of the api (/v1/nodes/detail?instance_uuid=<uuid>), which is used by services such as Nova for explicit node status lookups, previously had special conditional logic surrounding it which skipped the inclusion of the API requestor project-id, from being incorporated into the database query. Ultimately, this allowed an authenticated user to obtain a partially redacted node entry where sensitive informational fields were scrubbed from the response payload. With this fix, queries for an explicit instance_uuid now follow the standard path inside the Ironic API to the database which includes inclusion of a requestor Project-ID if required by configured policy. Change-Id: I9bfa5a54e02c8a1e9c8cad6b9acdbad6ab62bef3 Story: 2008976 Task: 42620 (cherry picked from commit be3c153d)
-
- 09 Jun, 2021 1 commit
-
-
Aija Jauntēva authored
- Re-usable helper created to avoid duplication. - Although there is only one manager for system in known iDRAC systems still iterate through collection for future changes. - Restructured exception raising and error logging for better feedback. - Removed some unit tests to avoid duplication that is covered by method specific unit tests Change-Id: I03fdb48e47c9557c207a20ee876eccf3f3459d9f (cherry picked from commit 39cd751a)
-
- 02 Jun, 2021 2 commits
-
-
Zuul authored
-
Julia Kreger authored
The branch was never updated during the release and we have some jobs (and developers) which could pull in master branch IPA as a result in testing and whatnot. In order to prevent this possible case and confusion, the branch should be set on the job. Change-Id: I8034bff4068463fe9b28fe721cdb9bb58367728e
-
- 31 May, 2021 1 commit
-
-
Julia Kreger authored
Yes, project conundrum is a code-name for our transition to OFTC as we do not want to put anything into freenode IRC which may abruptly result in the channel being siezed or shutdown by the new owners/operators of freenode Change-Id: I45c07e0b2138f6643f865d58155c64317114fd02 See: http://lists.openstack.org/pipermail/openstack-discuss/2021-May/022718.html (cherry picked from commit d5971fdf)
-
- 19 May, 2021 1 commit
-
-
LinPeiWen authored
The openstack Ussuri and Victoria versions no longer support the Centos7 and pyrhon2 environment packages. Correct the missing problems in the latest document Change-Id: I60787243fdc6ed2741522355ec79970bdb912f41 (cherry picked from commit 35dea078) (cherry picked from commit 77be4c6c)
-
- 10 May, 2021 1 commit
-
-
Riccardo Pittau authored
Since ironic-python-agent-bulder has stable branches starting from wallaby, we need to point all the older branches to stable/wallaby, unless they're already pinned to an older version. Change-Id: I90a0d4d75fb4581805f11e79ca7185cfdb66f77a
-
- 07 May, 2021 1 commit
-
-
Dmitry Tantsur authored
If the agent accepts a command, but is unable to reply to Ironic (which sporadically happens before of the eventlet's TLS implementation), we currently retry the request and fail because the command is already executing. Ironic now detects this situation by checking the list of executing commands after receiving a connection error. If the requested command is last one, we assume that the command request succeeded. Ideally, we should pass a request ID to IPA and then compare it. Such a change would affect the API contract between the agent and Ironic and thus would not be backportable. Change-Id: I2ea21c9ec440fa7ddf8578cf7b34d6d0ebbb5dc8 (cherry picked from commit abfe383c)
-
- 05 May, 2021 1 commit
-
-
Dmitry Tantsur authored
InvalidImageRef is a kind of InvalidParameterValue and can happen during validation, causing a traceback now. Change-Id: I5f10fe7240e74d337f991bbd1a5220cc4e713de7 (cherry picked from commit 47398edd)
-
- 21 Apr, 2021 1 commit
-
-
Dmitry Tantsur authored
Also make sure the pregenerated flag is always reset. Change-Id: I73aaa803d3eb84ddac59a778e998836a645217eb (cherry picked from commit c6e8281f)
-
- 15 Apr, 2021 2 commits
- 14 Apr, 2021 1 commit
-
-
Armstrong Liu authored
Some higher versions of grub2 (e.g. 2.05 or 2.06-rc1) use grub.cfg-01-MAC, while another lower versions of grub2 (e.g. 2.04) use MAC.conf, so we generate both paths in order to be compatible with both. For more information we can see: https://docs.oracle.com/en/operating-systems/oracle-linux/7/install/ol7-install-prepare.html#ol7-install-pxe-boot-uefi Change-Id: Icbdd284de38b8e54c52cdbba709bba0e644c35cd Signed-off-by:
Armstrong Liu <vpbvmw651078@gmail.com> (cherry picked from commit 8f474bfe)
-
- 09 Apr, 2021 2 commits
-
-
Steve Baker authored
Calculating the ipmitool `-N` and `-R` arguments from ironic.conf [ipmi] `command_retry_timeout` and `min_command_interval` now takes into account the 1 second interval increment that ipmitool adds on each retry event. Failure-path ipmitool run duration will now be just less than `command_retry_timeout` instead of much longer. Change-Id: Ia3d8d85497651290c62341ac121e2aa438b4ac50 (cherry picked from commit 1de3db3b)
-
Aija Jauntēva authored
Instead of using process_event('fail') use error_handlers, otherwise in case of failure node gets stuck and fails because of timeout, instead of failing earlier due to step failure. And improve coverage to test this error handling and also happy paths. Story: 2008307 Task: 41197 Change-Id: I1e957c2b526abc37920212b6431b11eedc9f89be (cherry picked from commit 83ce7c42)
-
- 01 Apr, 2021 1 commit
-
-
Bob Fournier authored
The fix for https://storyboard.openstack.org/#!/story/2008252 synced the boot mode after changing the boot device, because Supermicro nodes reset the boot mode if not included in the boot device set. However this can cause a problem on Dell nodes when changing the mode uefi->bios or bios->uefi. Restrict the syncing of the boot mode to Supermicro. Story: 2008712 Task: 42046 Change-Id: I9f305cb3f33766c1c93cf4347368b1ce025fc635 (cherry picked from commit 8bd25a98)
-
- 24 Mar, 2021 1 commit
-
-
Zuul authored
-
- 23 Mar, 2021 1 commit
-
-
Zuul authored
-
- 21 Mar, 2021 1 commit
-
-
Steve Baker authored
Currently if the baremetal boot mode is unknown and the driver doesn't support setting the boot mode then the error is logged and deployment continues. However if the BMC doesn't support getting or setting the boot mode then setting the boot mode raises an error which results in the deploy failing. This is the case for HPE Gen9 baremetal, which doesn't have a 'BootSourceOverrideMode' attribute in its system Boot field, and raises a 400 iLO.2.14.UnsupportedOperation in response to setting the boot mode. This is raised from set_boot_mode as a RedfishError. This change raises UnsupportedDriverExtension exception when the 'mode' attribute is missing from the 'boot' field, allowing the deployment to continue. Change-Id: I360ff8180be252de21f5fcd2208947087e332a39 (cherry picked from commit 9f221a7d)
-
- 15 Mar, 2021 1 commit
-
-
Riccardo Pittau authored
Tinycore 12 requires some more RAM than its predecessor. Change-Id: Ibced843f34c78af7780bfe1ade833208b458bb8b (cherry picked from commit 807a6d2b)
-
- 08 Mar, 2021 2 commits
-
-
Arne Wiebalck authored
In order to reduce the load on the database backend, only lazy-load a node's ports, portgroups, volume_connectors, and volume_targets. With the power-sync as the main user, this change should reduce the number of DB operations by two thirds roughly. Change-Id: Id9a9a53156f7fd866d93569347a81e27c6f0673c (cherry picked from commit 82cab603)
-
Arne Wiebalck authored
Restore test symmetry. Change-Id: I54a9fed73e366a30545c3cd1982588d2f544d228 (cherry picked from commit 61c5b3fd)
-
- 05 Mar, 2021 1 commit
-
-
Zuul authored
-
- 03 Mar, 2021 1 commit
-
-
Jason Anderson authored
There are some Ironic execution workflows where there is not an easy way to retry, such as when attempting to hand off the processing of an async task to a conductor. Task handoff can require releasing a lock on the node, so the next entity processing the task can acquire the lock itself. However, this is vulnerable to race conditions, as there is no uniform retry mechanism built in to such handoffs. Consider the continue_node_deploy/clean logic, which does this: method = 'continue_node_%s' % operation # Need to release the lock to let the conductor take it task.release_resources() getattr(rpc, method)(task.context, uuid, topic=topic If another process obtains a lock between the releasing of resources and the acquiring of the lock during the continue_node_* operation, and holds the lock longer than the max attempt * interval window (which defaults to 3 seconds), then the handoff will never complete. Beyond that, because there is no proper queue for processes waiting on the lock, there is no fairness, so it's also possible that instead of one long lock being held, the lock is obtained and held for a short window several times by other competing processes. This manifests as nodes occasionally getting stuck in the "DEPLOYING" state during a deploy. For example, a user may attempt to open or access the serial console before the deploy is complete--the serial console process obtains a lock and starves the conductor of the lock, so the conductor cannot finish the deploy. It's also possible a long heartbeat or badly-timed sequence of heartbeats could do the same. To fix this, this commit introduces the concept of a "patient" lock, which will retry indefinitely until it doesn't encounter the NodeLocked exception. This overrides any retry behavior. .. note:: There may be other cases where such a lock is desired. Story: #2008323 Change-Id: I9937fab18a50111ec56a3fd023cdb9d510a1e990 (cherry picked from commit bfc2ad56)
-
- 02 Mar, 2021 1 commit
-
-
Julia Kreger authored
The agent command exec model is based upon an incoming heartbeat, however heartbeats are independent and commands can take a long time. For example, software RAID setup in CI can encounter this. From an IPA log: [-] Picked root device /dev/md0 for node c6ca0af2-baec-40d6-879d-cbb5c751aafb based on root device hints {'name': '/dev/md0'} [-] Attempting to download image from http://199.204.45.248:3928/agent_images/ c6ca0af2-baec-40d6-879d-cbb5c751aafb [-] Executing command: standby.get_partition_uuids with args: {} execute_command /usr/local/lib/python3.6/site-packages/ironic_python_agent/extensions/base.py:255 [-] Tried to execute standby.get_partition_uuids, agent is still executing Command name: execute_deploy_step, params: {'step': {'interface': 'deploy', 'step': 'write_image', 'args': {'image_info': {'id': 'cb9e199a-af1b-4a6f-b00e-f284008b8046', 'urls': ['http://199.204.45.248:3928/agent_images/c6ca0af2-baec-40d6-879d-cbb5c751aafb'], 'disk_format': 'raw', 'container_format': 'bare', 'stream_raw_images': True, 'os_hash_algo': 'sha512', 'os_hash_value':<trimed> This was with code built on master, using master images. Inside the conductor log, it notes that it is likely an out of date agent because only AgentAPIError is evaluated, however any API error is evaluated this way. In reality, we need to explicitly flag *when* we have an error that is because we've tried to soon as something is already being worked upon. The result, is to evaluate and return an exception indicating work is already in flight. Update - It looks like, the original fix to prevent busy agent recognition did not fully detect all cases as getting steps is a command which can get skipped by accident with a busy agent, under certain circumstances. Change I5d86878b5ed6142ed2630adee78c0867c49b663f in ironic-python-agent also changed the string that was being checked for the previous handling, where we really should have just made the string we were checking lower case in ironic. Oh well! This should fix things right up. Story: 2008167 Task: 41175 Change-Id: Ia169640b7084d17d26f22e457c7af512db6d21d6 (cherry picked from commit 545dc210)
-
- 28 Feb, 2021 1 commit
-
-
Dmitry Tantsur authored
Change-Id: Id5fcd4cc1f73b80e8a9e9d2c50e2e4e1667c01cb (cherry picked from commit 7abac806)
-
- 24 Feb, 2021 1 commit
-
-
Dmitry Tantsur authored
The fixed configdrive_use_object_store requires them. Change-Id: Ie7323ae107c7f801be010353c7c4f3b8a43c3a1a (cherry picked from commit 5533077c)
-
- 23 Feb, 2021 1 commit
-
-
Dmitry Tantsur authored
When it is set to True, we try to write text data to a binary file, which is not possible in Python 3. The issue has been "helpfully" hidden by the fact that we use bytes in unit tests, as well as by lack of CI coverage. Change-Id: Ibbf90dcbcb36a5f7cf084a44a221c0c5c003b95a (cherry picked from commit 73bdebd1)
-
- 18 Feb, 2021 2 commits
-
-
Zuul authored
-
Dmitry Tantsur authored
384M no longer works reliably with newer tinyIPA. Change-Id: I7e48b2e682dc0d5e6109e17b0e73ee9763a29d23 (cherry picked from commit 414f0ca2)
-