1. 08 Jan, 2015 3 commits
  2. 15 Dec, 2014 1 commit
    • Loïc Dachary's avatar
      use assert(0) instead of exit(1) · e7131cfd
      Loïc Dachary authored
      
      
      When a fatal error (unaligned memory etc.) is detected, gf-complete should
      assert(3) instead of exit(3) to give a chance to the calling program to
      catch the exception and display a stack trace. Although it is possible
      for gdb to display the stack trace and break on exit, libraries are not
      usually expected to terminate the calling program in this way.
      
      Signed-off-by: Loïc Dachary's avatarLoic Dachary <loic@dachary.org>
      (cherry picked from commit 29427efa)
      e7131cfd
  3. 24 Oct, 2014 6 commits
    • Kevin Greenan's avatar
      Merged in jannau/gf-complete/neon (pull request #25) · 70dd94ae
      Kevin Greenan authored
      arm neon optimisations
      70dd94ae
    • Janne Grunau's avatar
      arm: NEON optimisations for gf_w64 · 6fdd8bc3
      Janne Grunau authored
      Optimisations for 4,64 split table region multiplications. Only used on
      ARMv8-A since it is not faster on ARMv7-A.
      6fdd8bc3
    • Janne Grunau's avatar
      arm: NEON optimisations for gf_w32 · 370c88b9
      Janne Grunau authored
      Optimisations for 4,32 split table multiplications.
      
      Selected time_tool.sh results on a 1.7 GHz cortex-a9:
      Region Best (MB/s):   346.67   W-Method: 32 -m SPLIT 32 4 -r SIMD -
      Region Best (MB/s):    92.89   W-Method: 32 -m SPLIT 32 4 -r NOSIMD -
      Region Best (MB/s):   258.17   W-Method: 32 -m SPLIT 32 4 -r SIMD -r ALTMAP -
      Region Best (MB/s):   162.00   W-Method: 32 -m SPLIT 32 8 -
      Region Best (MB/s):   160.53   W-Method: 32 -m SPLIT 8 8 -
      Region Best (MB/s):    32.74   W-Method: 32 -m COMPOSITE 2 - -
      Region Best (MB/s):   199.79   W-Method: 32 -m COMPOSITE 2 - -r ALTMAP -
      370c88b9
    • Janne Grunau's avatar
      arm: NEON optimisations for gf_w16 · 474010a9
      Janne Grunau authored
      Optimisations for the 4,16 split table region multiplications.
      
      Selected time_tool.sh 16 -A -B results for a 1.7 GHz cortex-a9:
      Region Best (MB/s):   532.14   W-Method: 16 -m SPLIT 16 4 -r SIMD -
      Region Best (MB/s):   212.34   W-Method: 16 -m SPLIT 16 4 -r NOSIMD -
      Region Best (MB/s):   801.36   W-Method: 16 -m SPLIT 16 4 -r SIMD -r ALTMAP -
      Region Best (MB/s):    93.20   W-Method: 16 -m SPLIT 16 4 -r NOSIMD -r ALTMAP -
      Region Best (MB/s):   273.99   W-Method: 16 -m SPLIT 16 8 -
      Region Best (MB/s):   270.81   W-Method: 16 -m SPLIT 8 8 -
      Region Best (MB/s):    70.42   W-Method: 16 -m COMPOSITE 2 - -
      Region Best (MB/s):   393.54   W-Method: 16 -m COMPOSITE 2 - -r ALTMAP -
      474010a9
    • Janne Grunau's avatar
      arm: NEON optimisations for gf_w8 · bec15359
      Janne Grunau authored
      Optimisations for the 4,4 split table region multiplication and carry
      less multiplication using NEON's polynomial long multiplication.
      arm: w8: NEON carry less multiplication
      
      Selected time_tool.sh results for a 1.7GHz cortex-a9:
      Region Best (MB/s):   375.86   W-Method: 8 -m CARRY_FREE -
      Region Best (MB/s):   142.94   W-Method: 8 -m TABLE -
      Region Best (MB/s):   225.01   W-Method: 8 -m TABLE -r DOUBLE -
      Region Best (MB/s):   211.23   W-Method: 8 -m TABLE -r DOUBLE -r LAZY -
      Region Best (MB/s):   160.09   W-Method: 8 -m LOG -
      Region Best (MB/s):   123.61   W-Method: 8 -m LOG_ZERO -
      Region Best (MB/s):   123.85   W-Method: 8 -m LOG_ZERO_EXT -
      Region Best (MB/s):  1183.79   W-Method: 8 -m SPLIT 8 4 -r SIMD -
      Region Best (MB/s):   177.68   W-Method: 8 -m SPLIT 8 4 -r NOSIMD -
      Region Best (MB/s):    87.85   W-Method: 8 -m COMPOSITE 2 - -
      Region Best (MB/s):   428.59   W-Method: 8 -m COMPOSITE 2 - -r ALTMAP -
      bec15359
    • Janne Grunau's avatar
      arm: NEON optimisations for gf_w4 · 1311a44f
      Janne Grunau authored
      Optimisations for the single table region multiplication and carry less
      multiplication using NEON's polynomial multiplication of 8-bit values.
      
      The single polynomial multiplication is not that useful but vector
      version is for region multiplication.
      
      Selected time_tool.sh results for a 1.7GHz cortex-a9:
      Region Best (MB/s):   672.72   W-Method: 4 -m CARRY_FREE -
      Region Best (MB/s):   265.84   W-Method: 4 -m BYTWO_p -
      Region Best (MB/s):   329.41   W-Method: 4 -m TABLE -r DOUBLE -
      Region Best (MB/s):   278.63   W-Method: 4 -m TABLE -r QUAD -
      Region Best (MB/s):   329.81   W-Method: 4 -m TABLE -r QUAD -r LAZY -
      Region Best (MB/s):  1318.03   W-Method: 4 -m TABLE -r SIMD -
      Region Best (MB/s):   165.15   W-Method: 4 -m TABLE -r NOSIMD -
      Region Best (MB/s):    99.73   W-Method: 4 -m LOG -
      1311a44f
  4. 09 Oct, 2014 7 commits
  5. 03 Oct, 2014 1 commit
  6. 17 Sep, 2014 2 commits
  7. 23 Aug, 2014 2 commits
  8. 16 Jun, 2014 4 commits
  9. 09 Jun, 2014 3 commits
  10. 06 Jun, 2014 1 commit
  11. 14 May, 2014 9 commits
  12. 13 May, 2014 1 commit