1. 18 Aug, 2022 1 commit
  2. 15 Aug, 2022 8 commits
    • Jan Beulich's avatar
      PCI: simplify (and thus correct) pci_get_pdev{,_by_domain}() · 2abe83f9
      Jan Beulich authored
      The last "wildcard" use of either function went away with f5917558
      
      
      ("IOMMU/PCI: don't let domain cleanup continue when device de-assignment
      failed"). Don't allow them to be called this way anymore. Besides
      simplifying the code this also fixes two bugs:
      
      1) When seg != -1, the outer loops should have been terminated after the
         first iteration, or else a device with the same BDF but on another
         segment could be found / returned.
      
      Reported-by: default avatarRahul Singh <rahul.singh@arm.com>
      
      2) When seg == -1 calling get_pseg() is bogus. The function (taking a
         u16) would look for segment 0xffff, which might exist. If it exists,
         we might then find / return a wrong device.
      
      In pci_get_pdev_by_domain() also switch from using the per-segment list
      to using the per-domain one, with the exception of the hardware domain
      (see the code comment there).
      
      While there also constify "pseg" and drop "pdev"'s already previously
      unnecessary initializer.
      
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarRahul Singh <rahul.singh@arm.com>
      Tested-by: default avatarRahul Singh <rahul.singh@arm.com>
      master commit: 8cf6e0738906fc269af40135ed82a07815dd3b9c
      master date: 2022-08-12 08:34:33 +0200
      2abe83f9
    • Jan Beulich's avatar
      build/x86: suppress GNU ld 2.39 warning about RWX load segments · 3fd9a7d5
      Jan Beulich authored
      
      
      Commit 68f5aac012b9 ("build: suppress future GNU ld warning about RWX
      load segments") didn't quite cover all the cases: Apparently I missed
      ones in the building of 32-bit helper objects because of only looking at
      incremental builds (where those wouldn't normally be re-built). Clone
      the workaround there to the specific Makefile in question.
      
      Reported-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Acked-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      master commit: 3eb1865ae305772b558757904d81951e31de43de
      master date: 2022-08-11 17:45:12 +0200
      3fd9a7d5
    • Ross Lagerwall's avatar
      x86/amd: only call setup_force_cpu_cap for boot CPU · 9123e60c
      Ross Lagerwall authored
      
      
      This should only be called for the boot CPU to avoid calling _init code
      after it has been unloaded.
      
      Fixes: 062868a5a8b4 ("x86/amd: Work around CLFLUSH ordering on older parts")
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      master commit: 31b41ce858c8bd5159212d40969f8e0b7124bbf0
      master date: 2022-08-11 17:44:26 +0200
      9123e60c
    • Andrew Cooper's avatar
      x86/spec-ctrl: Enumeration for PBRSB_NO · 940fc00e
      Andrew Cooper authored
      
      
      The PBRSB_NO bit indicates that the CPU is not vulnerable to the Post-Barrier
      RSB speculative vulnerability.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      master commit: b874e47eb13feb75be3ee7b5dc4ae9c97d80d774
      master date: 2022-08-11 16:19:50 +0100
      940fc00e
    • Anthony PERARD's avatar
      tools/libxl: Replace deprecated -sdl option on QEMU command line · e6a760b8
      Anthony PERARD authored
      
      
      "-sdl" is deprecated upstream since 6695e4c0fd9e ("softmmu/vl:
      Deprecate the -sdl and -curses option"), QEMU v6.2, and the option is
      removed by 707d93d4abc6 ("ui: Remove deprecated options "-sdl" and
      "-curses""), in upcoming QEMU v7.1.
      
      Instead, use "-display sdl", available since 1472a95bab1e ("Introduce
      -display argument"), before QEMU v1.0.
      
      Signed-off-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      Reviewed-by: default avatarJason Andryuk <jandryuk@gmail.com>
      master commit: 41fcb3af8ad6d4c9f65a9d72798e6d18afec55ac
      master date: 2022-08-11 11:47:11 +0200
      e6a760b8
    • Dario Faggioli's avatar
      xen/sched: setup dom0 vCPUs affinity only once · 0f7eff5e
      Dario Faggioli authored
      Right now, affinity for dom0 vCPUs is setup in two steps. This is a
      problem as, at least in Credit2, unit_insert() sees and uses the
      "intermediate" affinity, and place the vCPUs on CPUs where they cannot
      be run. And this in turn results in boot hangs, if the "dom0_nodes"
      parameter is used.
      
      Fix this by setting up the affinity properly once and for all, in
      sched_init_vcpu() called by create_vcpu().
      
      Note that, unless a soft-affinity is explicitly specified for dom0 (by
      using the relaxed mode of "dom0_nodes") we set it to the default, which
      is all CPUs, instead of computing it basing on hard affinity (if any).
      This is because hard and soft affinity should be considered as
      independent user controlled properties. In fact, if we dor derive dom0's
      soft-affinity from its boot-time hard-affinity, such computed value will
      continue to be used even if later the user changes the hard-affinity.
      And this could result in the vCPUs behaving differently than what the
      user wanted and expects.
      
      Fixes: dafd936d
      
       ("Make credit2 the default scheduler")
      Reported-by: default avatarOlaf Hering <ohering@suse.de>
      Signed-off-by: default avatarDario Faggioli <dfaggioli@suse.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      master commit: c79e4d209be3ed2a6b8e97c35944786ed2a66b94
      master date: 2022-08-11 11:46:22 +0200
      0f7eff5e
    • Jason Andryuk's avatar
      x86: Expose more MSR_ARCH_CAPS to hwdom · 2a362668
      Jason Andryuk authored
      commit e4647427 ("x86/intel: Expose MSR_ARCH_CAPS to dom0") started
      exposing MSR_ARCH_CAPS to dom0.  More bits in MSR_ARCH_CAPS have since
      been defined, but they haven't been exposed.  Update the list to allow
      them through.
      
      As one example, this allows a Linux Dom0 to know that it has the
      appropriate microcode via FB_CLEAR.  Notably, and with the updated
      microcode, this changes dom0's
      /sys/devices/system/cpu/vulnerabilities/mmio_stale_data changes from:
      
        "Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown"
      
      to:
      
        "Mitigation: Clear CPU buffers; SMT Host state unknown"
      
      This exposes the MMIO Stale Data and Intel Branch History Injection
      (BHI) controls as well as the page size change MCE issue bit.
      
      Fixes: commit 2ebe8fe9b7e0 ("x86/spec-ctrl: Enumeration for MMIO Stale Data controls")
      Fixes: commit cea9ae062295 ("x86/spec-ctrl: Enumeration for new Intel BHI controls")
      Fixes: commit 59e89cda
      
       ("x86/vtx: Disable executable EPT superpages to work around CVE-2018-12207")
      Signed-off-by: default avatarJason Andryuk <jandryuk@gmail.com>
      Acked-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      master commit: e83cd54611fec5b7a539fa1281a14319143490e6
      master date: 2022-08-09 16:35:25 +0100
      2a362668
    • Andrew Cooper's avatar
      x86/spec-ctrl: Use IST RSB protection for !SVM systems · 4e351880
      Andrew Cooper authored
      
      
      There is a corner case where a VT-x guest which manages to reliably trigger
      non-fatal #MC's could evade the rogue RSB speculation protections that were
      supposed to be in place.
      
      This is a lack of defence in depth; Xen does not architecturally execute more
      RET than CALL instructions, so an attacker would have to locate a different
      gadget (e.g. SpectreRSB) first to execute a transient path of excess RET
      instructions.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      master commit: e570e8d520ab542d8d35666b95cb3a0125b7b110
      master date: 2022-08-05 12:16:24 +0100
      4e351880
  3. 03 Aug, 2022 11 commits
    • Dmytro Semenets's avatar
      xen: arm: Don't use stop_cpu() in halt_this_cpu() · 48b67651
      Dmytro Semenets authored
      
      
      When shutting down (or rebooting) the platform, Xen will call stop_cpu()
      on all the CPUs but one. The last CPU will then request the system to
      shutdown/restart.
      
      On platform using PSCI, stop_cpu() will call PSCI CPU off. Per the spec
      (section 5.5.2 DEN0022D.b), the call could return DENIED if the Trusted
      OS is resident on the CPU that is about to be turned off.
      
      As Xen doesn't migrate off the trusted OS (which BTW may not be
      migratable), it would be possible to hit the panic().
      
      In the ideal situation, Xen should migrate the trusted OS or make sure
      the CPU off is not called. However, when shutting down (or rebooting)
      the platform, it is pointless to try to turn off all the CPUs (per
      section 5.10.2, it is only required to put the core in a known state).
      
      So solve the problem by open-coding stop_cpu() in halt_this_cpu() and
      not call PSCI CPU off.
      
      Signed-off-by: default avatarDmytro Semenets <dmytro_semenets@epam.com>
      Acked-by: default avatarJulien Grall <jgrall@amazon.com>
      (cherry picked from commit ee11f092b515bf3c926eaad053d12d3f2b6e593e)
      48b67651
    • Bertrand Marquis's avatar
      xen/arm: Advertise workaround 1 if we apply 3 · a0b823dc
      Bertrand Marquis authored
      
      
      SMCC_WORKAROUND_3 is handling both Spectre v2 and spectre BHB.
      So when a guest is asking if we support workaround 1, tell yes if we
      apply workaround 3 on exception entry as it handles it.
      
      This will allow guests not supporting Spectre BHB but impacted by
      spectre v2 to still handle it correctly.
      The modified behaviour is coherent with what the Linux kernel does in
      KVM for guests.
      
      While there use ARM_SMCCC_SUCCESS instead of 0 for the return code value
      for workaround detection to be coherent with Workaround 2 handling.
      
      Signed-off-by: default avatarBertrand Marquis <bertrand.marquis@arm.com>
      Acked-by: default avatarJulien Grall <jgrall@amazon.com>
      (cherry picked from commit af570d1c90f1ed6040d724732f6c582383782e90)
      a0b823dc
    • Jiamei Xie's avatar
      xen/arm: avoid overflow when setting vtimer in context switch · 48e7440e
      Jiamei Xie authored
      
      
      virt_vtimer_save() will calculate the next deadline when the vCPU is
      scheduled out. At the moment, Xen will use the following equation:
      
        virt_timer.cval + virt_time_base.offset - boot_count
      
      The three values are 64-bit and one (cval) is controlled by domain. In
      theory, it would be possible that the domain has started a long time
      after the system boot. So virt_time_base.offset - boot_count may be a
      large numbers.
      
      This means a domain may inadvertently set a cval so the result would
      overflow. Consequently, the deadline would be set very far in the
      future. This could result to loss of timer interrupts or the vCPU
      getting block "forever".
      
      One way to solve the problem, would be to separately
         1) compute when the domain was created in ns
         2) convert cval to ns
         3) Add 1 and 2 together
      
      The first part of the equation never change (the value is set/known at
      domain creation). So take the opportunity to store it in domain structure.
      
      Signed-off-by: default avatarJiamei Xie <jiamei.xie@arm.com>
      Reviewed-by: default avatarJulien Grall <jgrall@amazon.com>
      Reviewed-by: default avatarBertrand Marquis <bertrand.marquis@arm.com>
      (cherry picked from commit 6655eb81092a94e065fdcd0b47a1b1d69dc4e54c)
      48e7440e
    • Hongda Deng's avatar
      arm/vgic-v3: fix virq offset in the rank when storing irouter · 3050769a
      Hongda Deng authored
      When vGIC performs irouter registers emulation, to get the target vCPU
      via virq conveniently, Xen doesn't store the irouter value directly,
      instead it will use the value (affinities) in irouter to calculate the
      target vCPU, and then save the target vCPU in irq rank->vcpu[offset].
      
      When vGIC tries to get the target vCPU, it first calculates the target
      vCPU index via
        int target = read_atomic(&rank->vcpu[virq & INTERRUPT_RANK_MASK]);
      and then it gets the target vCPU via
        v->domain->vcpu[target];
      
      When vGIC tries to store irouter for one virq, the target vCPU index
      in the rank is computed as
        offset &= virq & INTERRUPT_RANK_MASK;
      finally it gets the target vCPU via
        d->vcpu[read_atomic(&rank->vcpu[offset])];
      
      There is a difference between them while getting the target vCPU index
      in the rank. Actually (virq & INTERRUPT_RANK_MASK) would already get
      the target vCPU index in the rank, it's wrong to add '&' before '=' when
      calculate the offset.
      
      For example, the target vCPU index in the rank should be 6 for virq 38,
      but vGIC will get offset=0 when vGIC stores the irouter for this virq,
      and finally vGIC will access the wrong target vCPU index in the rank
      when updating the irouter.
      
      Fixes: 5d495f43
      
       ("xen/arm: vgic: Optimize the way to store the target vCPU in the rank")
      Signed-off-by: default avatarHongda Deng <Hongda.Deng@arm.com>
      Reviewed-by: default avatarJulien Grall <jgrall@amazon.com>
      (cherry picked from commit 800f21499e0ec112771ce1e94490ca5811578bc2)
      3050769a
    • Julien Grall's avatar
      xen/arm: head: Add missing isb after writing to SCTLR_EL2/HSCTLR · 0d362e5e
      Julien Grall authored
      
      
      Write to SCTLR_EL2/HSCTLR may not be visible until the next context
      synchronization. When initializing the CPU, we want the update to take
      effect right now. So add an isb afterwards.
      
      Spec references:
          - AArch64: D13.1.2 ARM DDI 0406C.d
          - AArch32 v8: G8.1.2 ARM DDI 0406C.d
          - AArch32 v7: B5.6.3 ARM DDI 0406C.d
      
      Signed-off-by: default avatarJulien Grall <jgrall@amazon.com>
      Reviewed-by: default avatarMichal Orzel <michal.orzel@arm.com>
      Reviewed-by: default avatarBertrand Marquis <bertrand.marquis@arm.com>
      (cherry picked from commit 25424d1a6b7b7e875230aba77c2f044a4883e49a)
      0d362e5e
    • Michal Orzel's avatar
      xen/arm: traps: Fix reference to invalid erratum ID · 6f650400
      Michal Orzel authored
      The correct erratum ID should be 834220.
      
      Fixes: 0a7ba293
      
       ("xen/arm: arm64: Add Cortex-A57 erratum 834220 workaround")
      Signed-off-by: default avatarMichal Orzel <michal.orzel@arm.com>
      Acked-by: default avatarJulien Grall <jgrall@amazon.com>
      (cherry picked from commit a6f7ed5fc7d5fb5001ef82db99d34bc8a85fc2b6)
      6f650400
    • Michal Orzel's avatar
      xen/arm: Avoid overflow using MIDR_IMPLEMENTOR_MASK · 04818518
      Michal Orzel authored
      
      
      Value of macro MIDR_IMPLEMENTOR_MASK exceeds the range of integer
      and can lead to overflow. Currently there is no issue as it is used
      in an expression implicitly casted to u32 in MIDR_IS_CPU_MODEL_RANGE.
      To avoid possible problems, fix the macro.
      
      Signed-off-by: default avatarMichal Orzel <michal.orzel@arm.com>
      Link: https://lore.kernel.org/r/20220426070603.56031-1-michal.orzel@arm.com
      
      
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarJulien Grall <jgrall@amazon.com>
      (cherry picked from commit aa1cba100bff84b211f27639bd6efeaf7e701bcc)
      04818518
    • Alex Bennée's avatar
      xen/arm: p2m don't fall over on FEAT_LPA enabled hw · fe02a534
      Alex Bennée authored
      
      
      When we introduced FEAT_LPA to QEMU's -cpu max we discovered older
      kernels had a bug where the physical address was copied directly from
      ID_AA64MMFR0_EL1.PARange field. The early cpu_init code of Xen commits
      the same error by blindly copying across the max supported range.
      
      Unsurprisingly when the page tables aren't set up for these greater
      ranges hilarity ensues and the hypervisor crashes fairly early on in
      the boot-up sequence. This happens when we write to the control
      register in enable_mmu().
      
      Attempt to fix this the same way as the Linux kernel does by gating
      PARange to the maximum the hypervisor can handle. I also had to fix up
      code in p2m which panics when it sees an "invalid" entry in PARange.
      
      Signed-off-by: Alex Bennée's avatarAlex Bennée <alex.bennee@linaro.org>
      Cc: Richard Henderson <richard.henderson@linaro.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Julien Grall <julien@xen.org>
      Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
      Cc: Bertrand Marquis <bertrand.marquis@arm.com>
      Tested-by: default avatarLuca Fancellu <luca.fancellu@arm.com>
      Acked-by: default avatarJulien Grall <jgrall@amazon.com>
      (cherry picked from commit 407b13a71e324aba76b11e5f66f59ce4a304a088)
      fe02a534
    • Rahul Singh's avatar
      arm/its: enable LPIs before mapping the collection table · 271e9e86
      Rahul Singh authored
      
      
      When Xen boots on the platform that implements the GIC 600, ITS
      MAPC_LPI_OFF uncorrectable command error issue is observed.
      
      As per the GIC-600 TRM (Revision: r1p6) MAPC_LPI_OFF command error can
      be reported if the MAPC command has tried to map a collection to a core
      that does not have LPIs enabled. The definition of GICR.EnableLPIs
      also suggests enabling the LPIs before sending any ITS command that
      involves LPIs
      
      0b0 LPI support is disabled. Any doorbell interrupt generated as a
          result of a write to a virtual LPI register must be discarded,
          and any ITS translation requests or commands involving LPIs in
          this Redistributor are ignored.
      
      0b1 LPI support is enabled.
      
      To fix the MAPC command error issue, enable the LPIs using
      GICR_CTLR.EnableLPIs before mapping the collection table.
      
      gicv3_enable_lpis() is using writel_relaxed(), write to the GICR_CTLR
      register may not be visible before gicv3_its_setup_collection() send the
      MAPC command. Use wmb() after writel_relaxed() to make sure register
      write to enable LPIs is visible.
      
      Signed-off-by: default avatarRahul Singh <rahul.singh@arm.com>
      Acked-by: default avatarJulien Grall <jgrall@amazon.com>
      Reviewed-by: default avatarBertrand Marquis <bertrand.marquis@arm.com>
      (cherry picked from commit 95604873ccf56eb81e96ed0dc8b4dec3278f40ca)
      271e9e86
    • Edwin Török's avatar
      x86/msr: fix X2APIC_LAST · 89fe6d0e
      Edwin Török authored
      
      
      The latest Intel manual now says the X2APIC reserved range is only
      0x800 to 0x8ff (NOT 0xbff).
      This changed between SDM 68 (Nov 2018) and SDM 69 (Jan 2019).
      The AMD manual documents 0x800-0x8ff too.
      
      There are non-X2APIC MSRs in the 0x900-0xbff range now:
      e.g. 0x981 is IA32_TME_CAPABILITY, an architectural MSR.
      
      The new MSR in this range appears to have been introduced in Icelake,
      so this commit should be backported to Xen versions supporting Icelake.
      
      Signed-off-by: default avatarEdwin Török <edvin.torok@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      master commit: 13316827faadbb4f72ae6c625af9938d8f976f86
      master date: 2022-07-27 12:57:10 +0200
      89fe6d0e
    • Roger Pau Monné's avatar
      tools/libxl: env variable to signal whether disk/nic backend is trusted · 6689cab2
      Roger Pau Monné authored
      
      
      Introduce support in libxl for fetching the default backend trusted
      option for disk and nic devices.
      
      Users can set LIBXL_{DISK,NIC}_BACKEND_UNTRUSTED environment variable
      to notify libxl of whether the backends for disk and nic devices
      should be trusted.  Such information is passed into the frontend so it
      can take the appropriate measures.
      
      This is part of XSA-403.
      
      Signed-off-by: default avatarRoger Pau Monné <roger.pau@citrix.com>
      Signed-off-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      6689cab2
  4. 27 Jul, 2022 8 commits
    • Luca Fancellu's avatar
      common/memory: Fix ifdefs for ptdom_max_order · d77bb6e5
      Luca Fancellu authored
      In common/memory.c the ifdef code surrounding ptdom_max_order is
      using HAS_PASSTHROUGH instead of CONFIG_HAS_PASSTHROUGH, fix the
      problem using the correct macro.
      
      Fixes: e0d44c1f
      
       ("build: convert HAS_PASSTHROUGH use to Kconfig")
      Signed-off-by: default avatarLuca Fancellu <luca.fancellu@arm.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      master commit: 5707470bf3103ebae43697a7ac2faced6cd35f92
      master date: 2022-07-26 08:33:46 +0200
      d77bb6e5
    • Jan Beulich's avatar
      x86: also suppress use of MMX insns · 5e3a9b45
      Jan Beulich authored
      
      
      Passing -mno-sse alone is not enough: The compiler may still find
      (questionable) reasons to use MMX insns. In particular with gcc12 use
      of MOVD+PUNPCKLDQ+MOVQ was observed in an apparent attempt to auto-
      vectorize the storing of two adjacent zeroes, 32 bits each.
      
      Reported-by: default avatarChrisD <chris@dalessio.org>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Acked-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      master commit: 6fe2e39a0243bddba60f83b77b972a5922d25eb8
      master date: 2022-07-20 15:48:49 +0200
      5e3a9b45
    • Jan Beulich's avatar
      x86emul: add memory operand low bits checks for ENQCMD{,S} · a5361f91
      Jan Beulich authored
      Already ISE rev 044 added text to this effect; rev 045 further dropped
      leftover earlier text indicating the contrary:
      - ENQCMD requires the low 32 bits of the memory operand to be clear,
      - ENDCMDS requires bits 20...30 of the memory operand to be clear.
      
      Fixes: d2738596
      
       ("x86emul: support ENQCMD insns")
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Acked-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      master commit: d620c66bdbe5510c3bae89be8cc7ca9a2a6cbaba
      master date: 2022-07-20 15:46:48 +0200
      a5361f91
    • Jan Beulich's avatar
      x86: deal with gcc12 release build issues · d09c4272
      Jan Beulich authored
      
      
      While a number of issues we previously had with pre-release gcc12 were
      fixed in the final release, we continue to have one issue (with multiple
      instances) when doing release builds (i.e. at higher optimization
      levels): The compiler takes issue with subtracting (always 1 in our
      case) from artifical labels (expressed as array) marking the end of
      certain regions. This isn't an unreasonable position to take. Simply
      hide the "array-ness" by casting to an integer type. To keep things
      looking consistently, apply the same cast also on the respective
      expressions dealing with the starting addresses. (Note how
      efi_arch_memory_setup()'s l2_table_offset() invocations avoid a similar
      issue by already having the necessary casts.) In is_xen_fixed_mfn()
      further switch from __pa() to virt_to_maddr() to better match the left
      sides of the <= operators.
      
      Reported-by: default avatarCharles Arnold <carnold@suse.com>
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Acked-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      master commit: 9723507daf2120131410c91980d4e4d9b0d0aa90
      master date: 2022-07-19 08:37:29 +0200
      d09c4272
    • Jan Beulich's avatar
      x86/spec-ctrl: correct per-guest-type reporting of MD_CLEAR · 9ab8e95d
      Jan Beulich authored
      
      
      There are command line controls for this and the default also isn't "always
      enable when hardware supports it", which logging should take into account.
      
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      master commit: fdbf8bdfebc2ed323c521848f642cc4f6b8cb662
      master date: 2022-07-19 08:36:53 +0200
      9ab8e95d
    • Jan Beulich's avatar
      xl: move freemem()'s "credit expired" loop exit · bfbcae44
      Jan Beulich authored
      
      
      Move the "credit expired" loop exit to the middle of the loop,
      immediately after "return true". This way having reached the goal on the
      last iteration would be reported as success to the caller, rather than
      as "timed out".
      
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      master commit: d8f8cb8bdd02fad3b6986ae93511f750fa7f7e6a
      master date: 2022-07-18 17:48:18 +0200
      bfbcae44
    • Juergen Gross's avatar
      tools/init-xenstore-domain: fix memory map for PVH stubdom · 6e542a83
      Juergen Gross authored
      
      
      In case of maxmem != memsize the E820 map of the PVH stubdom is wrong,
      as it is missing the RAM above memsize.
      
      Additionally the memory map should only specify the Xen special pages
      as reserved.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Reviewed-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      master commit: 134d53f577076d4f26091e25762f27cc3c73bf58
      master date: 2022-07-12 15:25:20 +0200
      6e542a83
    • Jan Beulich's avatar
      xl: relax freemem()'s retry calculation · 6f814c37
      Jan Beulich authored
      
      
      While in principle possible also under other conditions as long as other
      parallel operations potentially consuming memory aren't "locked out", in
      particular with IOMMU large page mappings used in Dom0 (for PV when in
      strict mode; for PVH when not sharing page tables with HAP) ballooning
      out of individual pages can actually lead to less free memory available
      afterwards. This is because to split a large page, one or more page
      table pages are necessary (one per level that is split).
      
      When rebooting a guest I've observed freemem() to fail: A single page
      was required to be ballooned out (presumably because of heap
      fragmentation in the hypervisor). This ballooning out of a single page
      of course went fast, but freemem() then found that it would require to
      balloon out another page. This repeating just another time leads to the
      function to signal failure to the caller - without having come anywhere
      near the designated 30s that the whole process is allowed to not make
      any progress at all.
      
      Convert from a simple retry count to actually calculating elapsed time,
      subtracting from an initial credit of 30s. Don't go as far as limiting
      the "wait_secs" value passed to libxl_wait_for_memory_target(), though.
      While this leads to the overall process now possibly taking longer (if
      the previous iteration ended very close to the intended 30s), this
      compensates to some degree for the value passed really meaning "allowed
      to run for this long without making progress".
      
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      master commit: e58370df76eacf1f7ca0340e9b96430c77b41a79
      master date: 2022-07-12 15:25:00 +0200
      6f814c37
  5. 26 Jul, 2022 1 commit
    • Jan Beulich's avatar
      x86/mm: correct TLB flush condition in _get_page_type() · 221f6a97
      Jan Beulich authored
      
      
      When this logic was moved, it was moved across the point where nx is
      updated to hold the new type for the page. IOW originally it was
      equivalent to using x (and perhaps x would better have been used), but
      now it isn't anymore. Switch to using x, which then brings things in
      line again with the slightly earlier comment there (now) talking about
      transitions _from_ writable.
      
      I have to confess though that I cannot make a direct connection between
      the reported observed behavior of guests leaving several pages around
      with pending general references and the change here. Repeated testing,
      nevertheless, confirms the reported issue is no longer there.
      
      This is CVE-2022-33745 / XSA-408.
      
      Reported-by: default avatarCharles Arnold <carnold@suse.com>
      Fixes: 8cc5036bc385 ("x86/pv: Fix ABAC cmpxchg() race in _get_page_type()")
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Reviewed-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      master commit: a9949efb288fd6e21bbaf9d5826207c7c41cda27
      master date: 2022-07-26 14:54:34 +0200
      221f6a97
  6. 12 Jul, 2022 11 commits
    • Andrew Cooper's avatar
      x86/spec-ctrl: Mitigate Branch Type Confusion when possible · 0a5387a0
      Andrew Cooper authored
      
      
      Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier.  To
      mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
      an IBPB on each entry to Xen, to flush the BTB.
      
      Due to performance concerns, dom0 (which is trusted in most configurations) is
      excluded from protections by default.
      
      Therefore:
       * Use STIBP by default on Zen2 too, which now means we want it on by default
         on all hardware supporting STIBP.
       * Break the current IBPB logic out into a new function, extending it with
         IBPB-at-entry logic.
       * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
         it by default when IBPB-at-entry is providing sufficient safety.
      
      If all PV guests on the system are trusted, then it is recommended to boot
      with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
      perf improvement.
      
      This is part of XSA-407 / CVE-2022-23825.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      (cherry picked from commit d8cb7e0f069e0f106d24941355b59b45a731eabe)
      0a5387a0
    • Andrew Cooper's avatar
      x86/spec-ctrl: Enable Zen2 chickenbit · 5457a687
      Andrew Cooper authored
      
      
      ... as instructed in the Branch Type Confusion whitepaper.
      
      This is part of XSA-407.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      (cherry picked from commit 9deaf2d932f08c16c6b96a1c426e4b1142c0cdbe)
      5457a687
    • Andrew Cooper's avatar
      x86/cpuid: Enumeration for BTC_NO · 0826c759
      Andrew Cooper authored
      
      
      BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.
      
      Zen3 CPUs don't suffer BTC.
      
      This is part of XSA-407.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      (cherry picked from commit 76cb04ad64f3ab9ae785988c40655a71dde9c319)
      0826c759
    • Andrew Cooper's avatar
      x86/spec-ctrl: Support IBPB-on-entry · 76c5fcee
      Andrew Cooper authored
      
      
      We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
      but as we've talked about using it in other cases too, arrange to support it
      generally.  However, this is also very expensive in some cases, so we're going
      to want per-domain controls.
      
      Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
      DOM masks as appropriate.  Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
      to patch the code blocks.
      
      For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
      so no "else lfence" is necessary.  VT-x will use use the MSR host load list,
      so doesn't need any code in the VMExit path.
      
      For the IST path, we can't safely check CPL==0 to skip a flush, as we might
      have hit an entry path before it's IBPB.  As IST hitting Xen is rare, flush
      irrespective of CPL.  A later path, SCF_ist_sc_msr, provides Spectre-v1
      safety.
      
      For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
      we can safely check CPL==0.  Only flush when interrupting guest context.
      
      An "else lfence" is needed for safety, but we want to be able to skip it on
      unaffected CPUs, so the block wants to be an alternative, which means the
      lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
      have displacements fixed up for anything other than the first instruction).
      
      As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
      shrink the logic marginally.  Update the comments to specify this new
      dependency.
      
      This is part of XSA-407.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      (cherry picked from commit 53a570b285694947776d5190f591a0d5b9b18de7)
      76c5fcee
    • Andrew Cooper's avatar
      x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST · 2a9e690a
      Andrew Cooper authored
      
      
      We are shortly going to add a conditional IBPB in this path.
      
      Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
      it after we're done with its contents.  %rbx is available for use, and the
      more normal register to hold preserved information in.
      
      With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
      the adjustment to spec_ctrl_flags.
      
      This leaves no use of %rdx, except as 0 for the upper half of WRMSR.  In
      practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
      the foreseeable future, so update the macro entry requirements to state this
      dependency.  This marginal optimisation can be revisited if circumstances
      change.
      
      No practical change.
      
      This is part of XSA-407.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      (cherry picked from commit e9b8d31981f184c6539f91ec54bd9cae29cdae36)
      2a9e690a
    • Andrew Cooper's avatar
      x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch · e7671561
      Andrew Cooper authored
      
      
      We are about to introduce the use of IBPB at different points in Xen, making
      opt_ibpb ambiguous.  Rename it to opt_ibpb_ctxt_switch.
      
      No functional change.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      (cherry picked from commit a8e5ef079d6f5c88c472e3e620db5a8d1402a50d)
      e7671561
    • Andrew Cooper's avatar
      x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr · 31aa2a20
      Andrew Cooper authored
      
      
      We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
      ambiguous.
      
      No functional change.
      
      This is part of XSA-407.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      (cherry picked from commit 76d6a36f645dfdbad8830559d4d52caf36efc75e)
      31aa2a20
    • Andrew Cooper's avatar
      x86/spec-ctrl: Rework spec_ctrl_flags context switching · 3a280cba
      Andrew Cooper authored
      
      
      We are shortly going to need to context switch new bits in both the vcpu and
      S3 paths.  Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
      into d->arch.spec_ctrl_flags to accommodate.
      
      No functional change.
      
      This is part of XSA-407.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      (cherry picked from commit 5796912f7279d9348a3166655588d30eae9f72cc)
      3a280cba
    • Anthony PERARD's avatar
      libxl: check return value of libxl__xs_directory in name2bdf · 744accad
      Anthony PERARD authored
      libxl__xs_directory() can potentially return NULL without setting `n`.
      As `n` isn't initialised, we need to check libxl__xs_directory()
      return value before checking `n`. Otherwise, `n` might be non-zero
      with `bdfs` NULL which would lead to a segv.
      
      Fixes: 57bff091
      
       ("libxl: add 'name' field to 'libxl_device_pci' in the IDL...")
      Reported-by: default avatar"G.R." <firemeteor@users.sourceforge.net>
      Signed-off-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Tested-by: default avatar"G.R." <firemeteor@users.sourceforge.net>
      master commit: d778089ac70e5b8e3bdea0c85fc8c0b9ed0eaf2f
      master date: 2022-07-12 08:38:51 +0200
      744accad
    • Anthony PERARD's avatar
      tools/helpers: fix build of xen-init-dom0 with -Werror · 14fd97e3
      Anthony PERARD authored
      
      
      Missing prototype of asprintf() without _GNU_SOURCE.
      
      Signed-off-by: default avatarAnthony PERARD <anthony.perard@citrix.com>
      Reviewed-by: default avatarHenry Wang <Henry.Wang@arm.com>
      master commit: d693b22733044d68e9974766b5c9e6259c9b1708
      master date: 2022-07-12 08:38:35 +0200
      14fd97e3
    • Andrew Cooper's avatar
      x86/spec-ctrl: Add fine-grained cmdline suboptions for primitives · f066c8bb
      Andrew Cooper authored
      
      
      Support controling the PV/HVM suboption of msr-sc/rsb/md-clear, which
      previously wasn't possible.
      
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Reviewed-by: default avatarJan Beulich <jbeulich@suse.com>
      master commit: 27357c394ba6e1571a89105b840ce1c6f026485c
      master date: 2022-07-11 15:21:35 +0100
      f066c8bb