1. 29 Jul, 2016 4 commits
  2. 05 Jun, 2016 3 commits
  3. 04 Jun, 2016 1 commit
    • Matteo Frigo's avatar
      Cast Police · 29cee6cc
      Matteo Frigo authored
      Eliminate some useless (but harmless) int<->size_t conversions.
      29cee6cc
  4. 13 Mar, 2016 1 commit
  5. 20 Jan, 2016 3 commits
  6. 30 Sep, 2015 2 commits
  7. 08 Sep, 2015 2 commits
  8. 05 Aug, 2015 1 commit
  9. 26 May, 2015 3 commits
    • Erik Lindahl's avatar
      Update VSX SIMD to avoid inline assembly · 8cd9bfa3
      Erik Lindahl authored
      Thanks to some help from Michael Gschwind of
      IBM, this removes the remaining inline assembly
      calls and replace the with vector functions. This
      avoid interfering with the optimizer both on GCC
      and XLC, and gets us another 3-10% of performance
      when using VSX SIMD. Tested with GCC-4.9, XLC-13.1
      in single and double on little-endian power 8.
      8cd9bfa3
    • Erik Lindahl's avatar
      Enable SSE2 automatically with AVX,AVX2, or AVX512. · 579cec9a
      Erik Lindahl authored
      256-bit AVX can be significantly slower than
      128-bit SIMD. Despite recommendations many
      distributions appear to only enable AVX, but not
      SSE. This fixes the problem by also enabling
      SSE when we use the wider SIMD instructions.
      579cec9a
    • Erik Lindahl's avatar
      Turn AVX-128 into AMD-specific AVX-128-FMA · dd80210e
      Erik Lindahl authored
      The only platform where AVX-128 really matters
      is AMD (since the compute units can execute a
      single 256-bit or two 128-bit SIMD instructions),
      so now we only use it there which means we can
      also enable FMA instructions.
      dd80210e
  10. 25 May, 2015 7 commits
  11. 13 May, 2015 1 commit
  12. 07 May, 2015 1 commit
    • Erik Lindahl's avatar
      Separate routines to query 128-bit AVX support · cd2b27d1
      Erik Lindahl authored
      This also disables 256-bit AVX for current AMD processors
      that work better with 128-bit AVX. Note that this is not
      detected by the timing routines since the effect is only
      apparent when using multiple cores.
      cd2b27d1
  13. 20 Apr, 2015 2 commits
  14. 16 Apr, 2015 1 commit
  15. 14 Apr, 2015 1 commit
  16. 13 Apr, 2015 1 commit
  17. 12 Apr, 2015 4 commits
  18. 08 Apr, 2015 2 commits
    • Erik Lindahl's avatar
      Improved compiler flags for OS X · 7960d08a
      Erik Lindahl authored
      Separate detection for AVX/AVX2 on gcc and clang.
      Clang works for AVX, but AVX2 leads to a compiler
      crash. Issue 20471870 has been filed with Apple.
      When using gcc, we now request to use the external
      system assembler, or the AVX/AVX2 instructions will
      cause errors.
      7960d08a
    • Erik Lindahl's avatar
      Fix alignments for generic simd. · 91928338
      Erik Lindahl authored
      91928338