- 12 Jul, 2022 13 commits
-
-
Andrew Cooper authored
Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier. To mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue an IBPB on each entry to Xen, to flush the BTB. Due to performance concerns, dom0 (which is trusted in most configurations) is excluded from protections by default. Therefore: * Use STIBP by default on Zen2 too, which now means we want it on by default on all hardware supporting STIBP. * Break the current IBPB logic out into a new function, extending it with IBPB-at-entry logic. * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable it by default when IBPB-at-entry is providing sufficient safety. If all PV guests on the system are trusted, then it is recommended to boot with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal perf improvement. This is part of XSA-407 / CVE-2022-23825. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit d8cb7e0f069e0f106d24941355b59b45a731eabe)
-
Andrew Cooper authored
... as instructed in the Branch Type Confusion whitepaper. This is part of XSA-407. Signed-off-by:Andrew Cooper <andrew.cooper3@citrix.com> (cherry picked from commit 9deaf2d932f08c16c6b96a1c426e4b1142c0cdbe)
-
Andrew Cooper authored
BTC_NO indicates that hardware is not succeptable to Branch Type Confusion. Zen3 CPUs don't suffer BTC. This is part of XSA-407. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 76cb04ad64f3ab9ae785988c40655a71dde9c319)
-
Andrew Cooper authored
We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs, but as we've talked about using it in other cases too, arrange to support it generally. However, this is also very expensive in some cases, so we're going to want per-domain controls. Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and DOM masks as appropriate. Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to to patch the code blocks. For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks, so no "else lfence" is necessary. VT-x will use use the MSR host load list, so doesn't need any code in the VMExit path. For the IST path, we can't safely check CPL==0 to skip a flush, as we might have hit an entry path before it's IBPB. As IST hitting Xen is rare, flush irrespective of CPL. A later path, SCF_ist_sc_msr, provides Spectre-v1 safety. For the PV paths, we know we're interrupting CPL>0, while for the INTR paths, we can safely check CPL==0. Only flush when interrupting guest context. An "else lfence" is needed for safety, but we want to be able to skip it on unaffected CPUs, so the block wants to be an alternative, which means the lfence has to be inline rather than UNLIKELY() (the replacement block doesn't have displacements fixed up for anything other than the first instruction). As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to shrink the logic marginally. Update the comments to specify this new dependency. This is part of XSA-407. Signed-off-by:Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 53a570b285694947776d5190f591a0d5b9b18de7)
-
Andrew Cooper authored
We are shortly going to add a conditional IBPB in this path. Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering it after we're done with its contents. %rbx is available for use, and the more normal register to hold preserved information in. With %rax freed up, use it instead of %rdx for the RSB tmp register, and for the adjustment to spec_ctrl_flags. This leaves no use of %rdx, except as 0 for the upper half of WRMSR. In practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in the foreseeable future, so update the macro entry requirements to state this dependency. This marginal optimisation can be revisited if circumstances change. No practical change. This is part of XSA-407. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit e9b8d31981f184c6539f91ec54bd9cae29cdae36)
-
Andrew Cooper authored
We are about to introduce the use of IBPB at different points in Xen, making opt_ibpb ambiguous. Rename it to opt_ibpb_ctxt_switch. No functional change. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit a8e5ef079d6f5c88c472e3e620db5a8d1402a50d)
-
Andrew Cooper authored
We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes ambiguous. No functional change. This is part of XSA-407. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 76d6a36f645dfdbad8830559d4d52caf36efc75e)
-
Andrew Cooper authored
We are shortly going to need to context switch new bits in both the vcpu and S3 paths. Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw into d->arch.spec_ctrl_flags to accommodate. No functional change. This is part of XSA-407. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 5796912f7279d9348a3166655588d30eae9f72cc)
-
Andrew Cooper authored
Support controling the PV/HVM suboption of msr-sc/rsb/md-clear, which previously wasn't possible. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 27357c394ba6e1571a89105b840ce1c6f026485c)
-
Andrew Cooper authored
This will help parsing a sub-option which has boolean and non-boolean options available. First, rework 'int val' into 'bool has_neg_prefix'. This inverts it's value, but the resulting logic is far easier to follow. Second, reject anything of the form 'no-$FOO=' which excludes ambiguous constructs such as 'no-$foo=yes' which have never been valid. This just leaves the case where everything is otherwise fine, but parse_bool() can't interpret the provided string. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Juergen Gross <jgross@suse.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 382326cac528dd1eb0d04efd5c05363c453e29f4)
-
Andrew Cooper authored
STIBP and PSFD are slightly weird bits, because they're both implied by other bits in MSR_SPEC_CTRL. Add fine grain controls for them, and take the implications into account when setting IBRS/SSBD. Rearrange the IBPB text/variables/logic to keep all the MSR_SPEC_CTRL bits together, for consistency. However, AMD have a hardware hint CPUID bit recommending that STIBP be set unilaterally. This is advertised on Zen3, so follow the recommendation. Furthermore, in such cases, set STIBP behind the guest's back for now. This has negligible overhead for the guest, but saves a WRMSR on vmentry. This is the only default change. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com> (cherry picked from commit fef244b179c06fcdfa581f7d57fa6e578c49ff50)
-
Andrew Cooper authored
Back at the time of the original Spectre-v2 fixes, it was recommended to clear MSR_SPEC_CTRL when going idle. This is because of the side effects on the sibling thread caused by the microcode IBRS and STIBP implementations which were retrofitted to existing CPUs. However, there are no relevant cross-thread impacts for the hardware IBRS/STIBP implementations, so this logic should not be used on Intel CPUs supporting eIBRS, or any AMD CPUs; doing so only adds unnecessary latency to the idle path. Furthermore, there's no point playing with MSR_SPEC_CTRL in the idle paths if SMT is disabled for other reasons. Fixes: 8d03080d2a33 ("x86/spec-ctrl: Cease using thunk=lfence on AMD") Signed-off-by:Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com> (cherry picked from commit ffc7694e0c99eea158c32aa164b7d1e1bb1dc46b)
-
Andrew Cooper authored
This was an oversight from when unpriv-mmio was introduced. Fixes: 8c24b70fedcb ("x86/spec-ctrl: Add spec-ctrl=unpriv-mmio") Signed-off-by:Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> (cherry picked from commit 4cdb519d797c19ebb8fadc5938cdb47479d5a21b)
-
- 16 Jun, 2022 3 commits
-
-
Andrew Cooper authored
Per Xen's support statement, PCI passthrough should be to trusted domains because the overall system security depends on factors outside of Xen's control. As such, Xen, in a supported configuration, is not vulnerable to DRPW/SBDR. However, users who have risk assessed their configuration may be happy with the risk of DoS, but unhappy with the risk of cross-domain data leakage. Such users should enable this option. On CPUs vulnerable to MDS, the existing mitigations are the best we can do to mitigate MMIO cross-domain data leakage. On CPUs fixed to MDS but vulnerable MMIO stale data leakage, this option: * On CPUs susceptible to FBSDP, mitigates cross-domain fill buffer leakage using FB_CLEAR. * On CPUs susceptible to SBDR, mitigates RNG data recovery by engaging the srb-lock, previously used to mitigate SRBDS. Both mitigations require microcode from IPU 2022.1, May 2022. This is part of XSA-404. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com> (cherry picked from commit 8c24b70fedcb52633b2370f834d8a2be3f7fa38e)
-
Andrew Cooper authored
The three *_NO bits indicate non-susceptibility to the SSDP, FBSDP and PSDP data movement primitives. FB_CLEAR indicates that the VERW instruction has re-gained it's Fill Buffer flushing side effect. This is only enumerated on parts where VERW had previously lost it's flushing side effect due to the MDS/TAA vulnerabilities being fixed in hardware. FB_CLEAR_CTRL is available on a subset of FB_CLEAR parts where the Fill Buffer clearing side effect of VERW can be turned off for performance reasons. This is part of XSA-404. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com> (cherry picked from commit 2ebe8fe9b7e0d36e9ec3cfe4552b2b197ef0dcec)
-
Andrew Cooper authored
Currently, VERW flushing to mitigate MDS is boot time conditional per domain type. However, to provide mitigations for DRPW (CVE-2022-21166), we need to conditionally use VERW based on the trustworthiness of the guest, and the devices passed through. Remove the PV/HVM alternatives and instead issue a VERW on the return-to-guest path depending on the SCF_verw bit in cpuinfo spec_ctrl_flags. Introduce spec_ctrl_init_domain() and d->arch.verw to calculate the VERW disposition at domain creation time, and context switch the SCF_verw bit. For now, VERW flushing is used and controlled exactly as before, but later patches will add per-domain cases too. No change in behaviour. This is part of XSA-404. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com> (cherry picked from commit e06b95c1d44ab80da255219fc9f1e2fc423edcb6)
-
- 10 Jun, 2022 1 commit
-
-
Jan Beulich authored
While PGT_pae_xen_l2 will be zapped once the type refcount of an L2 page reaches zero, it'll be retained as long as the type refcount is non- zero. Hence any checking against the requested type needs to either zap the bit from the type or include it in the used mask. Fixes: 9186e96b199e ("x86/pv: Clean up _get_page_type()") Signed-off-by:Jan Beulich <jbeulich@suse.com> Reviewed-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: c2095ac76be0f4a1940346c9ffb49fb967345060 master date: 2022-06-10 10:21:06 +0200
-
- 09 Jun, 2022 7 commits
-
-
Andrew Cooper authored
There are legitimate uses of WC mappings of RAM, e.g. for DMA buffers with devices that make non-coherent writes. The Linux sound subsystem makes extensive use of this technique. For such usecases, the guest's DMA buffer is mapped and consistently used as WC, and Xen doesn't interact with the buffer. However, a mischevious guest can use WC mappings to deliberately create non-coherency between the cache and RAM, and use this to trick Xen into validating a pagetable which isn't actually safe. Allocate a new PGT_non_coherent to track the non-coherency of mappings. Set it whenever a non-coherent writeable mapping is created. If the page is used as anything other than PGT_writable_page, force a cache flush before validation. Also force a cache flush before the page is returned to the heap. This is CVE-2022-26364, part of XSA-402. Reported-by:
Jann Horn <jannh@google.com> Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
George Dunlap <george.dunlap@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: c1c9cae3a9633054b177c5de21ad7268162b2f2c master date: 2022-06-09 14:23:37 +0200
-
Andrew Cooper authored
On pre-CLFLUSHOPT AMD CPUs, CLFLUSH is weakely ordered with everything, including reads and writes to the address, and LFENCE/SFENCE instructions. This creates a multitude of problematic corner cases, laid out in the manual. Arrange to use MFENCE on both sides of the CLFLUSH to force proper ordering. This is part of XSA-402. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: 062868a5a8b428b85db589fa9a6d6e43969ffeb9 master date: 2022-06-09 14:23:07 +0200
-
Andrew Cooper authored
Subsequent changes will want a fully flushing version. Use the new helper rather than opencoding it in flush_area_local(). This resolves an outstanding issue where the conditional sfence is on the wrong side of the clflushopt loop. clflushopt is ordered with respect to older stores, not to younger stores. Rename gnttab_cache_flush()'s helper to avoid colliding in name. grant_table.c can see the prototype from cache.h so the build fails otherwise. This is part of XSA-402. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: 9a67ffee3371506e1cbfdfff5b90658d4828f6a2 master date: 2022-06-09 14:22:38 +0200
-
Andrew Cooper authored
Changeset 55f97f49 ("x86: Change cache attributes of Xen 1:1 page mappings in response to guest mapping requests") attempted to keep the cacheability consistent between different mappings of the same page. The reason wasn't described in the changelog, but it is understood to be in regards to a concern over machine check exceptions, owing to errata when using mixed cacheabilities. It did this primarily by updating Xen's mapping of the page in the direct map when the guest mapped a page with reduced cacheability. Unfortunately, the logic didn't actually prevent mixed cacheability from occurring: * A guest could map a page normally, and then map the same page with different cacheability; nothing prevented this. * The cacheability of the directmap was always latest-takes-precedence in terms of guest requests. * Grant-mapped frames with lesser cacheability didn't adjust the page's cacheattr settings. * The map_domain_page() function still unconditionally created WB mappings, irrespective of the page's cacheattr settings. Additionally, update_xen_mappings() had a bug where the alias calculation was wrong for mfn's which were .init content, which should have been treated as fully guest pages, not Xen pages. Worse yet, the logic introduced a vulnerability whereby necessary pagetable/segdesc adjustments made by Xen in the validation logic could become non-coherent between the cache and main memory. The CPU could subsequently operate on the stale value in the cache, rather than the safe value in main memory. The directmap contains primarily mappings of RAM. PAT/MTRR conflict resolution is asymmetric, and generally for MTRR=WB ranges, PAT of lesser cacheability resolves to being coherent. The special case is WC mappings, which are non-coherent against MTRR=WB regions (except for fully-coherent CPUs). Xen must not have any WC cacheability in the directmap, to prevent Xen's actions from creating non-coherency. (Guest actions creating non-coherency is dealt with in subsequent patches.) As all memory types for MTRR=WB ranges inter-operate coherently, so leave Xen's directmap mappings as WB. Only PV guests with access to devices can use reduced-cacheability mappings to begin with, and they're trusted not to mount DoSs against the system anyway. Drop PGC_cacheattr_{base,mask} entirely, and the logic to manipulate them. Shift the later PGC_* constants up, to gain 3 extra bits in the main reference count. Retain the check in get_page_from_l1e() for special_pages() because a guest has no business using reduced cacheability on these. This reverts changeset 55f97f49 This is CVE-2022-26363, part of XSA-402. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
George Dunlap <george.dunlap@citrix.com> master commit: ae09597da34aee6bc5b76475c5eea6994457e854 master date: 2022-06-09 14:22:08 +0200
-
Andrew Cooper authored
... rather than opencoding the PAT/PCD/PWT attributes in __PAGE_HYPERVISOR_* constants. These are going to be needed by forthcoming logic. No functional change. This is part of XSA-402. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: 1be8707c75bf4ba68447c74e1618b521dd432499 master date: 2022-06-09 14:21:38 +0200
-
Andrew Cooper authored
_get_page_type() suffers from a race condition where it incorrectly assumes that because 'x' was read and a subsequent a cmpxchg() succeeds, the type cannot have changed in-between. Consider: CPU A: 1. Creates an L2e referencing pg `-> _get_page_type(pg, PGT_l1_page_table), sees count 0, type PGT_writable_page 2. Issues flush_tlb_mask() CPU B: 3. Creates a writeable mapping of pg `-> _get_page_type(pg, PGT_writable_page), count increases to 1 4. Writes into new mapping, creating a TLB entry for pg 5. Removes the writeable mapping of pg `-> _put_page_type(pg), count goes back down to 0 CPU A: 7. Issues cmpxchg(), setting count 1, type PGT_l1_page_table CPU B now has a writeable mapping to pg, which Xen believes is a pagetable and suitably protected (i.e. read-only). The TLB flush in step 2 must be deferred until after the guest is prohibited from creating new writeable mappings, which is after step 7. Defer all safety actions until after the cmpxchg() has successfully taken the intended typeref, because that is what prevents concurrent users from using the old type. Also remove the early validation for writeable and shared pages. This removes race conditions where one half of a parallel mapping attempt can return successfully before: * The IOMMU pagetables are in sync with the new page type * Writeable mappings to shared pages have been torn down This is part of XSA-401 / CVE-2022-26362. Reported-by:Jann Horn <jannh@google.com> Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
George Dunlap <george.dunlap@citrix.com> master commit: 8cc5036bc385112a82f1faff27a0970e6440dfed master date: 2022-06-09 14:21:04 +0200
-
Andrew Cooper authored
Various fixes for clarity, ahead of making complicated changes. * Split the overflow check out of the if/else chain for type handling, as it's somewhat unrelated. * Comment the main if/else chain to explain what is going on. Adjust one ASSERT() and state the bit layout for validate-locked and partial states. * Correct the comment about TLB flushing, as it's backwards. The problem case is when writeable mappings are retained to a page becoming read-only, as it allows the guest to bypass Xen's safety checks for updates. * Reduce the scope of 'y'. It is an artefact of the cmpxchg loop and not valid for use by subsequent logic. Switch to using ACCESS_ONCE() to treat all reads as explicitly volatile. The only thing preventing the validated wait-loop being infinite is the compiler barrier hidden in cpu_relax(). * Replace one page_get_owner(page) with the already-calculated 'd' already in scope. No functional change. This is part of XSA-401 / CVE-2022-26362. Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by:
George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
George Dunlap <george.dunlap@citrix.com> master commit: 9186e96b199e4f7e52e033b238f9fe869afb69c7 master date: 2022-06-09 14:20:36 +0200
-
- 12 Apr, 2022 1 commit
-
-
Jan Beulich authored
-
- 08 Apr, 2022 7 commits
-
-
Roger Pau Monné authored
Track whether symbols belong to ignored sections in order to avoid applying relocations referencing those symbols. The address of such symbols won't be resolved and thus the relocation will likely fail or write garbage to the destination. Return an error in that case, as leaving unresolved relocations would lead to malfunctioning payload code. Signed-off-by:
Roger Pau Monné <roger.pau@citrix.com> Tested-by:
Bjoern Doebel <doebel@amazon.de> Reviewed-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Ross Lagerwall <ross.lagerwall@citrix.com> master commit: 9120b5737f517fe9d2a3936c38d3a2211630323b master date: 2022-04-08 10:27:11 +0200
-
Roger Pau Monné authored
A side effect of ignoring such sections is that symbols belonging to them won't be resolved, and that could make relocations belonging to other sections that reference those symbols fail. For example it's likely to have an empty .altinstr_replacement with symbols pointing to it, and marking the section as ignored will prevent the symbols from being resolved, which in turn will cause any relocations against them to fail. In order to solve this do not ignore sections with 0 size, only ignore sections that don't have the SHF_ALLOC flag set. Special case such empty sections in move_payload so they are not taken into account in order to decide whether a livepatch can be safely re-applied after a revert. Fixes: 98b728a7 ('livepatch: Disallow applying after an revert') Signed-off-by:
Roger Pau Monné <roger.pau@citrix.com> Tested-by:
Bjoern Doebel <doebel@amazon.de> Reviewed-by:
Ross Lagerwall <ross.lagerwall@citrix.com> master commit: 0dc1f929e8fed681dec09ca3ea8de38202d5bf30 master date: 2022-04-08 10:24:10 +0200
-
Jan Beulich authored
4.14 doesn't know of this format specifier extension yet. Fixes: 47188b2f ("vpci/msix: fix PBA accesses") Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com>
-
Andrew Cooper authored
c/s 1a914256 increased the AMD max leaf from 0x8000001c to 0x80000021, but did not adjust anything in the calculate_*_policy() chain. As a result, on hardware supporting these leaves, we read the real hardware values into the raw policy, then copy into host, and all the way into the PV/HVM default policies. All 4 of these leaves have enable bits (first two by TopoExt, next by SEV, next by PQOS), so any software following the rules is fine and will leave them alone. However, leaf 0x8000001d takes a subleaf input and at least two userspace utilities have been observed to loop indefinitely under Xen (clearly waiting for eax to report "no more cache levels"). Such userspace is buggy, but Xen's behaviour isn't great either. In the short term, clobber all information in these leaves. This is a giant bodge, but there are complexities with implementing all of these leaves properly. Fixes: 1a914256 ("x86/cpuid: support LFENCE always serialising CPUID bit") Link: https://github.com/QubesOS/qubes-issues/issues/7392 Reported-by:
fosslinux <fosslinux@aussies.space> Reported-by:
Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> master commit: d4012d50082c2eae2f3cbe7770be13b9227fbc3f master date: 2022-04-07 11:36:45 +0100
-
Jan Beulich authored
Despite the comment there infinite recursion was still possible, by flip-flopping between two domains. This is because prev_dom is derived from the DID found in the context entry, which was already updated by the time error recovery is invoked. Simply introduce yet another mode flag to prevent rolling back an in-progress roll-back of a prior mapping attempt. Also drop the existing recursion prevention for having been dead anyway: Earlier in the function we already bail when prev_dom == domain. Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") Signed-off-by:Jan Beulich <jbeulich@suse.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com> master commit: 99d829dba1390b98a3ca07b365713e62182ee7ca master date: 2022-04-07 12:31:16 +0200
-
Jan Beulich authored
First there's a printk() which actually wrongly uses pdev in the first place: We want to log the coordinates of the (perhaps fake) device acted upon, which may not be pdev. Then it was quite pointless for eb19326a328d ("VT-d: prepare for per- device quarantine page tables (part I)") to add a domid_t parameter to domain_context_unmap_one(): It's only used to pass back here via me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again. Finally there's the invocation of domain_context_mapping_one(), which needs to be passed the correct domain ID. Avoid taking that path when pdev is NULL and the quarantine state is what would need restoring to. This means we can't security-support non-PCI-Express devices with RMRRs (if such exist in practice) any longer; note that as of trhe 1st of the two commits referenced below assigning them to DomU-s is unsupported anyway. Fixes: 8f41e481b485 ("VT-d: re-assign devices directly") Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining") Coverity ID: 1503784 Reported-by:Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com> master commit: 608394b906e71587f02e6662597bc985bad33a5a master date: 2022-04-07 12:30:19 +0200
-
Jan Beulich authored
If get_iommu_domid() in domain_context_unmap_one() fails, we better wouldn't clear the context entry in the first place, as we're then unable to issue the corresponding flush. However, we have no need to look up the DID in the first place: What needs flushing is very specifically the DID that was in the context entry before our clearing of it. Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Roger Pau Monné <roger.pau@citrix.com> master commit: 445ab9852d69d8957467f0036098ebec75fec092 master date: 2022-04-07 12:29:03 +0200
-
- 07 Apr, 2022 8 commits
-
-
Roger Pau Monné authored
Prevent the assembler from creating a .note.gnu.property section on the output objects, as it's not useful for firmware related binaries, and breaks the resulting rombios image. This requires modifying the cc-option Makefile macro so it can test assembler options (by replacing the usage of the -S flag with -c) and also stripping the -Wa, prefix if present when checking for the test output. Signed-off-by:
Roger Pau Monné <roger.pau@citrix.com> Acked-by:
Anthony PERARD <anthony.perard@citrix.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: e270af94280e6a9610705ebc1fdd1d7a9b1f8a98 master date: 2022-04-04 12:30:07 +0100
-
Roger Pau Monné authored
Do so right in firmware/Rules.mk, like it's done for other compiler flags. Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT') Signed-off-by:Roger Pau Monné <roger.pau@citrix.com> Reviewed-by:
Anthony PERARD <anthony.perard@citrix.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: 7225f6e0cd3afd48b4d61c43dd8fead0f4c92193 master date: 2022-04-04 12:30:00 +0100
-
Jason Andryuk authored
I've observed this failed assertion: libxl_event.c:2057: libxl__ao_inprogress_gc: Assertion `ao' failed. AFAICT, this is happening in qmp_proxy_spawn_outcome where sdss->qmp_proxy_spawn.ao is NULL. The out label of spawn_stub_launch_dm() calls qmp_proxy_spawn_outcome(), but it is only in the success path that sdss->qmp_proxy_spawn.ao gets set to the current ao. qmp_proxy_spawn_outcome() should instead use sdss->dm.spawn.ao, which is the already in-use ao when spawn_stub_launch_dm() is called. The same is true for spawn_qmp_proxy(). With this, move sdss->qmp_proxy_spawn.ao initialization to spawn_qmp_proxy() since its use is for libxl__spawn_spawn() and it can be initialized along with the rest of sdss->qmp_proxy_spawn. Fixes: 83c84503 ("libxl: use vchan for QMP access with Linux stubdomain") Signed-off-by:
Jason Andryuk <jandryuk@gmail.com> Reviewed-by:
Anthony PERARD <anthony.perard@citrix.com> master commit: d62a34423a1a98aefd7c30e22d2d82d198f077c8 master date: 2022-04-01 17:01:57 +0100
-
Jason Andryuk authored
If domain_soft_reset_cb can't rename the save file, it doesn't call initiate_domain_create() and calls domcreate_complete(). Skipping initiate_domain_create() means dcs->console_wait is uninitialized and all 0s. We have: domcreate_complete() libxl__xswait_stop() libxl__ev_xswatch_deregister(). The uninitialized slotnum 0 is considered valid (-1 is the invalid sentinel), so the NULL pointer path to passed to xs_unwatch() which segfaults. libxl__ev_xswatch_deregister:watch w=0x12bc250 wpath=(null) token=0/0: deregister slotnum=0 Move dcs->console_xswait initialization into the callers of initiate_domain_create, do_domain_create() and do_domain_soft_reset(), so it is initialized along with the other dcs state. Fixes: c57e6ebd ("(lib)xl: soft reset support") Signed-off-by:Jason Andryuk <jandryuk@gmail.com> Reviewed-by:
Anthony PERARD <anthony.perard@citrix.com> master commit: d2ecf97f911fc00a85b34b70ca311b5d355a9756 master date: 2022-04-01 17:01:57 +0100
-
Jason Andryuk authored
commit babde47a "introduce a 'passthrough' configuration option to xl.cfg..." moved the pci list parsing ahead of the global pci option parsing. This broke the global pci configuration options since they need to be set first so that looping over the pci devices assigns their values. Move the global pci options ahead of the pci list to restore their function. Fixes: babde47a ("introduce a 'passthrough' configuration option to xl.cfg...") Signed-off-by:
Jason Andryuk <jandryuk@gmail.com> Acked-by:
Anthony PERARD <anthony.perard@citrix.com> master commit: e45ad0b1b0bd6a43f59aaf4a6f86d88783c630e5 master date: 2022-03-31 19:48:12 +0100
-
Roger Pau Monné authored
Map the PBA in order to access it from the MSI-X read and write handlers. Note that previously the handlers would pass the physical host address into the {read,write}{l,q} handlers, which is wrong as those expect a linear address. Map the PBA using ioremap when the first access is performed. Note that 32bit arches might want to abstract the call to ioremap into a vPCI arch handler, so they can use a fixmap range to map the PBA. Reported-by:Jan Beulich <jbeulich@suse.com> Signed-off-by:
Roger Pau Monné <roger.pau@citrix.com> Reviewed-by:
Jan Beulich <jbeulich@suse.com> Tested-by:
Alex Olson <Alex.Olson@starlab.io> master commit: b4f21160601155762a4d014db9623af921fec959 master date: 2022-03-09 16:21:01 +0100
-
Lasse Collin authored
This might matter, for example, if the underlying type of enum xz_check was a signed char. In such a case the validation wouldn't have caught an unsupported header. I don't know if this problem can occur in the kernel on any arch but it's still good to fix it because some people might copy the XZ code to their own projects from Linux instead of the upstream XZ Embedded repository. This change may increase the code size by a few bytes. An alternative would have been to use an unsigned int instead of enum xz_check but using an enumeration looks cleaner. Link: https://lore.kernel.org/r/20211010213145.17462-3-xiang@kernel.org Signed-off-by:
Lasse Collin <lasse.collin@tukaani.org> Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4f8d7abaa413 Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Luca Fancellu <luca.fancellu@arm.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: 0a21660515c24f09c4ee060ce0bb42e4b2e6b6fa master date: 2022-03-07 09:08:54 +0100
-
Lasse Collin authored
With valid files, the safety margin described in lib/decompress_unxz.c ensures that these buffers cannot overlap. But if the uncompressed size of the input is larger than the caller thought, which is possible when the input file is invalid/corrupt, the buffers can overlap. Obviously the result will then be garbage (and usually the decoder will return an error too) but no other harm will happen when such an over-run occurs. This change only affects uncompressed LZMA2 chunks and so this should have no effect on performance. Link: https://lore.kernel.org/r/20211010213145.17462-2-xiang@kernel.org Signed-off-by:
Lasse Collin <lasse.collin@tukaani.org> Signed-off-by:
Gao Xiang <hsiangkao@linux.alibaba.com> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83d3c4f22a36 Signed-off-by:
Jan Beulich <jbeulich@suse.com> Reviewed-by:
Luca Fancellu <luca.fancellu@arm.com> Acked-by:
Andrew Cooper <andrew.cooper3@citrix.com> master commit: 10454f381f9157bce26d5db15e07e857b317b4af master date: 2022-03-07 09:08:08 +0100
-