- 18 Aug, 2022 1 commit
-
-
Jan Beulich authored
-
- 15 Aug, 2022 8 commits
-
-
Jan Beulich authored
The last "wildcard" use of either function went away with f5917558 ("IOMMU/PCI: don't let domain cleanup continue when device de-assignment failed"). Don't allow them to be called this way anymore. Besides simplifying the code this also fixes two bugs: 1) When seg != -1, the outer loops should have been terminated after the first iteration, or else a device with the same BDF but on another segment could be found / returned. Reported-by:
Rahul Singh <rahul.singh@arm.com> 2) When seg == -1 calling get_pseg() is bogus. The function (taking a u16) would look for segment 0xffff, which might exist. If it exists, we might then find / return a wrong device. In pci_get_pdev_by_domain() also switch from using the per-segment list to using the per-domain one, with the exception of the hardware domain (see the code comment there). While there also constify "pseg" and drop "pdev"'s already previously unnecessary initializer. Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Rahul Singh <rahul.singh@arm.com> Tested-by:
Rahul Singh <rahul.singh@arm.com> master commit: 8cf6e0738906fc269af40135ed82a07815dd3b9c master date: 2022-08-12 08:34:33 +0200
-
Jan Beulich authored
Commit 68f5aac012b9 ("build: suppress future GNU ld warning about RWX load segments") didn't quite cover all the cases: Apparently I missed ones in the building of 32-bit helper objects because of only looking at incremental builds (where those wouldn't normally be re-built). Clone the workaround there to the specific Makefile in question. Reported-by:Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by:
Jan Beulich <jbeulich@suse.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: 3eb1865ae305772b558757904d81951e31de43de master date: 2022-08-11 17:45:12 +0200
-
Ross Lagerwall authored
This should only be called for the boot CPU to avoid calling _init code after it has been unloaded. Fixes: 062868a5a8b4 ("x86/amd: Work around CLFLUSH ordering on older parts") Signed-off-by:Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: 31b41ce858c8bd5159212d40969f8e0b7124bbf0 master date: 2022-08-11 17:44:26 +0200
-
Andrew Cooper authored
The PBRSB_NO bit indicates that the CPU is not vulnerable to the Post-Barrier RSB speculative vulnerability. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: b874e47eb13feb75be3ee7b5dc4ae9c97d80d774 master date: 2022-08-11 16:19:50 +0100
-
Anthony PERARD authored
"-sdl" is deprecated upstream since 6695e4c0fd9e ("softmmu/vl: Deprecate the -sdl and -curses option"), QEMU v6.2, and the option is removed by 707d93d4abc6 ("ui: Remove deprecated options "-sdl" and "-curses""), in upcoming QEMU v7.1. Instead, use "-display sdl", available since 1472a95bab1e ("Introduce -display argument"), before QEMU v1.0. Signed-off-by:Anthony PERARD <anthony.perard@citrix.com> Reviewed-by:
Jason Andryuk <jandryuk@gmail.com> master commit: 41fcb3af8ad6d4c9f65a9d72798e6d18afec55ac master date: 2022-08-11 11:47:11 +0200
-
Dario Faggioli authored
Right now, affinity for dom0 vCPUs is setup in two steps. This is a problem as, at least in Credit2, unit_insert() sees and uses the "intermediate" affinity, and place the vCPUs on CPUs where they cannot be run. And this in turn results in boot hangs, if the "dom0_nodes" parameter is used. Fix this by setting up the affinity properly once and for all, in sched_init_vcpu() called by create_vcpu(). Note that, unless a soft-affinity is explicitly specified for dom0 (by using the relaxed mode of "dom0_nodes") we set it to the default, which is all CPUs, instead of computing it basing on hard affinity (if any). This is because hard and soft affinity should be considered as independent user controlled properties. In fact, if we dor derive dom0's soft-affinity from its boot-time hard-affinity, such computed value will continue to be used even if later the user changes the hard-affinity. And this could result in the vCPUs behaving differently than what the user wanted and expects. Fixes: dafd936d ("Make credit2 the default scheduler") Reported-by:
Olaf Hering <ohering@suse.de> Signed-off-by:
Dario Faggioli <dfaggioli@suse.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: c79e4d209be3ed2a6b8e97c35944786ed2a66b94 master date: 2022-08-11 11:46:22 +0200
-
Jason Andryuk authored
commit e4647427 ("x86/intel: Expose MSR_ARCH_CAPS to dom0") started exposing MSR_ARCH_CAPS to dom0. More bits in MSR_ARCH_CAPS have since been defined, but they haven't been exposed. Update the list to allow them through. As one example, this allows a Linux Dom0 to know that it has the appropriate microcode via FB_CLEAR. Notably, and with the updated microcode, this changes dom0's /sys/devices/system/cpu/vulnerabilities/mmio_stale_data changes from: "Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown" to: "Mitigation: Clear CPU buffers; SMT Host state unknown" This exposes the MMIO Stale Data and Intel Branch History Injection (BHI) controls as well as the page size change MCE issue bit. Fixes: commit 2ebe8fe9b7e0 ("x86/spec-ctrl: Enumeration for MMIO Stale Data controls") Fixes: commit cea9ae062295 ("x86/spec-ctrl: Enumeration for new Intel BHI controls") Fixes: commit 59e89cda ("x86/vtx: Disable executable EPT superpages to work around CVE-2018-12207") Signed-off-by:
Jason Andryuk <jandryuk@gmail.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: e83cd54611fec5b7a539fa1281a14319143490e6 master date: 2022-08-09 16:35:25 +0100
-
Andrew Cooper authored
There is a corner case where a VT-x guest which manages to reliably trigger non-fatal #MC's could evade the rogue RSB speculation protections that were supposed to be in place. This is a lack of defence in depth; Xen does not architecturally execute more RET than CALL instructions, so an attacker would have to locate a different gadget (e.g. SpectreRSB) first to execute a transient path of excess RET instructions. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: e570e8d520ab542d8d35666b95cb3a0125b7b110 master date: 2022-08-05 12:16:24 +0100
-
- 03 Aug, 2022 11 commits
-
-
Dmytro Semenets authored
When shutting down (or rebooting) the platform, Xen will call stop_cpu() on all the CPUs but one. The last CPU will then request the system to shutdown/restart. On platform using PSCI, stop_cpu() will call PSCI CPU off. Per the spec (section 5.5.2 DEN0022D.b), the call could return DENIED if the Trusted OS is resident on the CPU that is about to be turned off. As Xen doesn't migrate off the trusted OS (which BTW may not be migratable), it would be possible to hit the panic(). In the ideal situation, Xen should migrate the trusted OS or make sure the CPU off is not called. However, when shutting down (or rebooting) the platform, it is pointless to try to turn off all the CPUs (per section 5.10.2, it is only required to put the core in a known state). So solve the problem by open-coding stop_cpu() in halt_this_cpu() and not call PSCI CPU off. Signed-off-by:
Dmytro Semenets <dmytro_semenets@epam.com> Acked-by:
Julien Grall <jgrall@amazon.com> (cherry picked from commit ee11f092b515bf3c926eaad053d12d3f2b6e593e)
-
Bertrand Marquis authored
SMCC_WORKAROUND_3 is handling both Spectre v2 and spectre BHB. So when a guest is asking if we support workaround 1, tell yes if we apply workaround 3 on exception entry as it handles it. This will allow guests not supporting Spectre BHB but impacted by spectre v2 to still handle it correctly. The modified behaviour is coherent with what the Linux kernel does in KVM for guests. While there use ARM_SMCCC_SUCCESS instead of 0 for the return code value for workaround detection to be coherent with Workaround 2 handling. Signed-off-by:
Bertrand Marquis <bertrand.marquis@arm.com> Acked-by:
Julien Grall <jgrall@amazon.com> (cherry picked from commit af570d1c90f1ed6040d724732f6c582383782e90)
-
Jiamei Xie authored
virt_vtimer_save() will calculate the next deadline when the vCPU is scheduled out. At the moment, Xen will use the following equation: virt_timer.cval + virt_time_base.offset - boot_count The three values are 64-bit and one (cval) is controlled by domain. In theory, it would be possible that the domain has started a long time after the system boot. So virt_time_base.offset - boot_count may be a large numbers. This means a domain may inadvertently set a cval so the result would overflow. Consequently, the deadline would be set very far in the future. This could result to loss of timer interrupts or the vCPU getting block "forever". One way to solve the problem, would be to separately 1) compute when the domain was created in ns 2) convert cval to ns 3) Add 1 and 2 together The first part of the equation never change (the value is set/known at domain creation). So take the opportunity to store it in domain structure. Signed-off-by:
Jiamei Xie <jiamei.xie@arm.com> Reviewed-by:
Julien Grall <jgrall@amazon.com> Reviewed-by:
Bertrand Marquis <bertrand.marquis@arm.com> (cherry picked from commit 6655eb81092a94e065fdcd0b47a1b1d69dc4e54c)
-
Hongda Deng authored
When vGIC performs irouter registers emulation, to get the target vCPU via virq conveniently, Xen doesn't store the irouter value directly, instead it will use the value (affinities) in irouter to calculate the target vCPU, and then save the target vCPU in irq rank->vcpu[offset]. When vGIC tries to get the target vCPU, it first calculates the target vCPU index via int target = read_atomic(&rank->vcpu[virq & INTERRUPT_RANK_MASK]); and then it gets the target vCPU via v->domain->vcpu[target]; When vGIC tries to store irouter for one virq, the target vCPU index in the rank is computed as offset &= virq & INTERRUPT_RANK_MASK; finally it gets the target vCPU via d->vcpu[read_atomic(&rank->vcpu[offset])]; There is a difference between them while getting the target vCPU index in the rank. Actually (virq & INTERRUPT_RANK_MASK) would already get the target vCPU index in the rank, it's wrong to add '&' before '=' when calculate the offset. For example, the target vCPU index in the rank should be 6 for virq 38, but vGIC will get offset=0 when vGIC stores the irouter for this virq, and finally vGIC will access the wrong target vCPU index in the rank when updating the irouter. Fixes: 5d495f43 ("xen/arm: vgic: Optimize the way to store the target vCPU in the rank") Signed-off-by:
Hongda Deng <Hongda.Deng@arm.com> Reviewed-by:
Julien Grall <jgrall@amazon.com> (cherry picked from commit 800f21499e0ec112771ce1e94490ca5811578bc2)
-
Julien Grall authored
Write to SCTLR_EL2/HSCTLR may not be visible until the next context synchronization. When initializing the CPU, we want the update to take effect right now. So add an isb afterwards. Spec references: - AArch64: D13.1.2 ARM DDI 0406C.d - AArch32 v8: G8.1.2 ARM DDI 0406C.d - AArch32 v7: B5.6.3 ARM DDI 0406C.d Signed-off-by:Julien Grall <jgrall@amazon.com> Reviewed-by:
Michal Orzel <michal.orzel@arm.com> Reviewed-by:
Bertrand Marquis <bertrand.marquis@arm.com> (cherry picked from commit 25424d1a6b7b7e875230aba77c2f044a4883e49a)
-
Michal Orzel authored
The correct erratum ID should be 834220. Fixes: 0a7ba293 ("xen/arm: arm64: Add Cortex-A57 erratum 834220 workaround") Signed-off-by:
Michal Orzel <michal.orzel@arm.com> Acked-by:
Julien Grall <jgrall@amazon.com> (cherry picked from commit a6f7ed5fc7d5fb5001ef82db99d34bc8a85fc2b6)
-
Michal Orzel authored
Value of macro MIDR_IMPLEMENTOR_MASK exceeds the range of integer and can lead to overflow. Currently there is no issue as it is used in an expression implicitly casted to u32 in MIDR_IS_CPU_MODEL_RANGE. To avoid possible problems, fix the macro. Signed-off-by:
Michal Orzel <michal.orzel@arm.com> Link: https://lore.kernel.org/r/20220426070603.56031-1-michal.orzel@arm.com Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Acked-by:
Julien Grall <jgrall@amazon.com> (cherry picked from commit aa1cba100bff84b211f27639bd6efeaf7e701bcc)
-
Alex Bennée authored
When we introduced FEAT_LPA to QEMU's -cpu max we discovered older kernels had a bug where the physical address was copied directly from ID_AA64MMFR0_EL1.PARange field. The early cpu_init code of Xen commits the same error by blindly copying across the max supported range. Unsurprisingly when the page tables aren't set up for these greater ranges hilarity ensues and the hypervisor crashes fairly early on in the boot-up sequence. This happens when we write to the control register in enable_mmu(). Attempt to fix this the same way as the Linux kernel does by gating PARange to the maximum the hypervisor can handle. I also had to fix up code in p2m which panics when it sees an "invalid" entry in PARange. Signed-off-by:
Alex Bennée <alex.bennee@linaro.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Julien Grall <julien@xen.org> Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> Cc: Bertrand Marquis <bertrand.marquis@arm.com> Tested-by:
Luca Fancellu <luca.fancellu@arm.com> Acked-by:
Julien Grall <jgrall@amazon.com> (cherry picked from commit 407b13a71e324aba76b11e5f66f59ce4a304a088)
-
Rahul Singh authored
When Xen boots on the platform that implements the GIC 600, ITS MAPC_LPI_OFF uncorrectable command error issue is observed. As per the GIC-600 TRM (Revision: r1p6) MAPC_LPI_OFF command error can be reported if the MAPC command has tried to map a collection to a core that does not have LPIs enabled. The definition of GICR.EnableLPIs also suggests enabling the LPIs before sending any ITS command that involves LPIs 0b0 LPI support is disabled. Any doorbell interrupt generated as a result of a write to a virtual LPI register must be discarded, and any ITS translation requests or commands involving LPIs in this Redistributor are ignored. 0b1 LPI support is enabled. To fix the MAPC command error issue, enable the LPIs using GICR_CTLR.EnableLPIs before mapping the collection table. gicv3_enable_lpis() is using writel_relaxed(), write to the GICR_CTLR register may not be visible before gicv3_its_setup_collection() send the MAPC command. Use wmb() after writel_relaxed() to make sure register write to enable LPIs is visible. Signed-off-by:Rahul Singh <rahul.singh@arm.com> Acked-by:
Julien Grall <jgrall@amazon.com> Reviewed-by:
Bertrand Marquis <bertrand.marquis@arm.com> (cherry picked from commit 95604873ccf56eb81e96ed0dc8b4dec3278f40ca)
-
Edwin Török authored
The latest Intel manual now says the X2APIC reserved range is only 0x800 to 0x8ff (NOT 0xbff). This changed between SDM 68 (Nov 2018) and SDM 69 (Jan 2019). The AMD manual documents 0x800-0x8ff too. There are non-X2APIC MSRs in the 0x900-0xbff range now: e.g. 0x981 is IA32_TME_CAPABILITY, an architectural MSR. The new MSR in this range appears to have been introduced in Icelake, so this commit should be backported to Xen versions supporting Icelake. Signed-off-by:
Edwin Török <edvin.torok@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: 13316827faadbb4f72ae6c625af9938d8f976f86 master date: 2022-07-27 12:57:10 +0200
-
Roger Pau Monné authored
Introduce support in libxl for fetching the default backend trusted option for disk and nic devices. Users can set LIBXL_{DISK,NIC}_BACKEND_UNTRUSTED environment variable to notify libxl of whether the backends for disk and nic devices should be trusted. Such information is passed into the frontend so it can take the appropriate measures. This is part of XSA-403. Signed-off-by:Roger Pau Monné <roger.pau@citrix.com> Signed-off-by:
Anthony PERARD <anthony.perard@citrix.com>
-
- 27 Jul, 2022 8 commits
-
-
Luca Fancellu authored
In common/memory.c the ifdef code surrounding ptdom_max_order is using HAS_PASSTHROUGH instead of CONFIG_HAS_PASSTHROUGH, fix the problem using the correct macro. Fixes: e0d44c1f ("build: convert HAS_PASSTHROUGH use to Kconfig") Signed-off-by:
Luca Fancellu <luca.fancellu@arm.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: 5707470bf3103ebae43697a7ac2faced6cd35f92 master date: 2022-07-26 08:33:46 +0200
-
Jan Beulich authored
Passing -mno-sse alone is not enough: The compiler may still find (questionable) reasons to use MMX insns. In particular with gcc12 use of MOVD+PUNPCKLDQ+MOVQ was observed in an apparent attempt to auto- vectorize the storing of two adjacent zeroes, 32 bits each. Reported-by:
ChrisD <chris@dalessio.org> Signed-off-by:
Jan Beulich <jbeulich@suse.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: 6fe2e39a0243bddba60f83b77b972a5922d25eb8 master date: 2022-07-20 15:48:49 +0200
-
Jan Beulich authored
Already ISE rev 044 added text to this effect; rev 045 further dropped leftover earlier text indicating the contrary: - ENQCMD requires the low 32 bits of the memory operand to be clear, - ENDCMDS requires bits 20...30 of the memory operand to be clear. Fixes: d2738596 ("x86emul: support ENQCMD insns") Signed-off-by:
Jan Beulich <jbeulich@suse.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: d620c66bdbe5510c3bae89be8cc7ca9a2a6cbaba master date: 2022-07-20 15:46:48 +0200
-
Jan Beulich authored
While a number of issues we previously had with pre-release gcc12 were fixed in the final release, we continue to have one issue (with multiple instances) when doing release builds (i.e. at higher optimization levels): The compiler takes issue with subtracting (always 1 in our case) from artifical labels (expressed as array) marking the end of certain regions. This isn't an unreasonable position to take. Simply hide the "array-ness" by casting to an integer type. To keep things looking consistently, apply the same cast also on the respective expressions dealing with the starting addresses. (Note how efi_arch_memory_setup()'s l2_table_offset() invocations avoid a similar issue by already having the necessary casts.) In is_xen_fixed_mfn() further switch from __pa() to virt_to_maddr() to better match the left sides of the <= operators. Reported-by:
Charles Arnold <carnold@suse.com> Signed-off-by:
Jan Beulich <jbeulich@suse.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: 9723507daf2120131410c91980d4e4d9b0d0aa90 master date: 2022-07-19 08:37:29 +0200
-
Jan Beulich authored
There are command line controls for this and the default also isn't "always enable when hardware supports it", which logging should take into account. Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: fdbf8bdfebc2ed323c521848f642cc4f6b8cb662 master date: 2022-07-19 08:36:53 +0200
-
Jan Beulich authored
Move the "credit expired" loop exit to the middle of the loop, immediately after "return true". This way having reached the goal on the last iteration would be reported as success to the caller, rather than as "timed out". Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Anthony PERARD <anthony.perard@citrix.com> master commit: d8f8cb8bdd02fad3b6986ae93511f750fa7f7e6a master date: 2022-07-18 17:48:18 +0200
-
Juergen Gross authored
In case of maxmem != memsize the E820 map of the PVH stubdom is wrong, as it is missing the RAM above memsize. Additionally the memory map should only specify the Xen special pages as reserved. Signed-off-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
Anthony PERARD <anthony.perard@citrix.com> master commit: 134d53f577076d4f26091e25762f27cc3c73bf58 master date: 2022-07-12 15:25:20 +0200
-
Jan Beulich authored
While in principle possible also under other conditions as long as other parallel operations potentially consuming memory aren't "locked out", in particular with IOMMU large page mappings used in Dom0 (for PV when in strict mode; for PVH when not sharing page tables with HAP) ballooning out of individual pages can actually lead to less free memory available afterwards. This is because to split a large page, one or more page table pages are necessary (one per level that is split). When rebooting a guest I've observed freemem() to fail: A single page was required to be ballooned out (presumably because of heap fragmentation in the hypervisor). This ballooning out of a single page of course went fast, but freemem() then found that it would require to balloon out another page. This repeating just another time leads to the function to signal failure to the caller - without having come anywhere near the designated 30s that the whole process is allowed to not make any progress at all. Convert from a simple retry count to actually calculating elapsed time, subtracting from an initial credit of 30s. Don't go as far as limiting the "wait_secs" value passed to libxl_wait_for_memory_target(), though. While this leads to the overall process now possibly taking longer (if the previous iteration ended very close to the intended 30s), this compensates to some degree for the value passed really meaning "allowed to run for this long without making progress". Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Anthony PERARD <anthony.perard@citrix.com> master commit: e58370df76eacf1f7ca0340e9b96430c77b41a79 master date: 2022-07-12 15:25:00 +0200
-
- 26 Jul, 2022 1 commit
-
-
Jan Beulich authored
When this logic was moved, it was moved across the point where nx is updated to hold the new type for the page. IOW originally it was equivalent to using x (and perhaps x would better have been used), but now it isn't anymore. Switch to using x, which then brings things in line again with the slightly earlier comment there (now) talking about transitions _from_ writable. I have to confess though that I cannot make a direct connection between the reported observed behavior of guests leaving several pages around with pending general references and the change here. Repeated testing, nevertheless, confirms the reported issue is no longer there. This is CVE-2022-33745 / XSA-408. Reported-by:
Charles Arnold <carnold@suse.com> Fixes: 8cc5036bc385 ("x86/pv: Fix ABAC cmpxchg() race in _get_page_type()") Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: a9949efb288fd6e21bbaf9d5826207c7c41cda27 master date: 2022-07-26 14:54:34 +0200
-
- 12 Jul, 2022 11 commits
-
-
Andrew Cooper authored
Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier. To mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue an IBPB on each entry to Xen, to flush the BTB. Due to performance concerns, dom0 (which is trusted in most configurations) is excluded from protections by default. Therefore: * Use STIBP by default on Zen2 too, which now means we want it on by default on all hardware supporting STIBP. * Break the current IBPB logic out into a new function, extending it with IBPB-at-entry logic. * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable it by default when IBPB-at-entry is providing sufficient safety. If all PV guests on the system are trusted, then it is recommended to boot with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal perf improvement. This is part of XSA-407 / CVE-2022-23825. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit d8cb7e0f069e0f106d24941355b59b45a731eabe)
-
Andrew Cooper authored
... as instructed in the Branch Type Confusion whitepaper. This is part of XSA-407. Signed-off-by:Andrew Cooper <andrew.cooper3@citrix.com> (cherry picked from commit 9deaf2d932f08c16c6b96a1c426e4b1142c0cdbe)
-
Andrew Cooper authored
BTC_NO indicates that hardware is not succeptable to Branch Type Confusion. Zen3 CPUs don't suffer BTC. This is part of XSA-407. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 76cb04ad64f3ab9ae785988c40655a71dde9c319)
-
Andrew Cooper authored
We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs, but as we've talked about using it in other cases too, arrange to support it generally. However, this is also very expensive in some cases, so we're going to want per-domain controls. Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and DOM masks as appropriate. Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to to patch the code blocks. For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks, so no "else lfence" is necessary. VT-x will use use the MSR host load list, so doesn't need any code in the VMExit path. For the IST path, we can't safely check CPL==0 to skip a flush, as we might have hit an entry path before it's IBPB. As IST hitting Xen is rare, flush irrespective of CPL. A later path, SCF_ist_sc_msr, provides Spectre-v1 safety. For the PV paths, we know we're interrupting CPL>0, while for the INTR paths, we can safely check CPL==0. Only flush when interrupting guest context. An "else lfence" is needed for safety, but we want to be able to skip it on unaffected CPUs, so the block wants to be an alternative, which means the lfence has to be inline rather than UNLIKELY() (the replacement block doesn't have displacements fixed up for anything other than the first instruction). As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to shrink the logic marginally. Update the comments to specify this new dependency. This is part of XSA-407. Signed-off-by:Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 53a570b285694947776d5190f591a0d5b9b18de7)
-
Andrew Cooper authored
We are shortly going to add a conditional IBPB in this path. Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering it after we're done with its contents. %rbx is available for use, and the more normal register to hold preserved information in. With %rax freed up, use it instead of %rdx for the RSB tmp register, and for the adjustment to spec_ctrl_flags. This leaves no use of %rdx, except as 0 for the upper half of WRMSR. In practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in the foreseeable future, so update the macro entry requirements to state this dependency. This marginal optimisation can be revisited if circumstances change. No practical change. This is part of XSA-407. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit e9b8d31981f184c6539f91ec54bd9cae29cdae36)
-
Andrew Cooper authored
We are about to introduce the use of IBPB at different points in Xen, making opt_ibpb ambiguous. Rename it to opt_ibpb_ctxt_switch. No functional change. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit a8e5ef079d6f5c88c472e3e620db5a8d1402a50d)
-
Andrew Cooper authored
We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes ambiguous. No functional change. This is part of XSA-407. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 76d6a36f645dfdbad8830559d4d52caf36efc75e)
-
Andrew Cooper authored
We are shortly going to need to context switch new bits in both the vcpu and S3 paths. Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw into d->arch.spec_ctrl_flags to accommodate. No functional change. This is part of XSA-407. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 5796912f7279d9348a3166655588d30eae9f72cc)
-
Anthony PERARD authored
libxl__xs_directory() can potentially return NULL without setting `n`. As `n` isn't initialised, we need to check libxl__xs_directory() return value before checking `n`. Otherwise, `n` might be non-zero with `bdfs` NULL which would lead to a segv. Fixes: 57bff091 ("libxl: add 'name' field to 'libxl_device_pci' in the IDL...") Reported-by:
"G.R." <firemeteor@users.sourceforge.net> Signed-off-by:
Anthony PERARD <anthony.perard@citrix.com> Reviewed-by:
Juergen Gross <jgross@suse.com> Tested-by:
"G.R." <firemeteor@users.sourceforge.net> master commit: d778089ac70e5b8e3bdea0c85fc8c0b9ed0eaf2f master date: 2022-07-12 08:38:51 +0200
-
Anthony PERARD authored
Missing prototype of asprintf() without _GNU_SOURCE. Signed-off-by:
Anthony PERARD <anthony.perard@citrix.com> Reviewed-by:
Henry Wang <Henry.Wang@arm.com> master commit: d693b22733044d68e9974766b5c9e6259c9b1708 master date: 2022-07-12 08:38:35 +0200
-
Andrew Cooper authored
Support controling the PV/HVM suboption of msr-sc/rsb/md-clear, which previously wasn't possible. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: 27357c394ba6e1571a89105b840ce1c6f026485c master date: 2022-07-11 15:21:35 +0100
-