assembly: improve organization of simd add

2026-01-25 19:21:35 +01:00 · 2019-05-12 00:00:09 +00:00
parent b5f5f6a5bc
commit 11359838a1
6 changed files with 44 additions and 12 deletions
--- a/README.adoc
+++ b/README.adoc
@@ -11690,13 +11690,13 @@ ____
 Much like ADD for non-SIMD, start learning SIMD instructions by looking at the integer and floating point SIMD ADD instructions of each ISA:

 * x86
-** link:userland/arch/x86_64/addpd.S[]: `ADDPS`, `ADDPD`
-** link:userland/arch/x86_64/paddq.S[]: `PADDQ`, `PADDL`, `PADDW`, `PADDB`
+** <<x86-addpd-instruction>>
+** <<x86-paddq-instruction>>
 * arm
-** link:userland/arch/arm/vadd.S[]
+** <<arm-vadd-instruction>>
 * aarch64
-** link:userland/arch/aarch64/add_vector.S[]
-** link:userland/arch/aarch64/fadd_vector.S[]
+** <<armv8-aarch64-add-vector-instruction>>
+** <<armv8-aarch64-fadd-vector-instruction>>

 Then it is just a huge copy paste of infinite boring details:

@@ -12009,6 +12009,20 @@ History:
 * AVX2:2013
 * AVX-512: 2016. 512-bit ZMM registers. Extension of YMM.

+===== x86 SSE2
+
+====== x86 addpd instruction
+
+link:userland/arch/x86_64/addpd.S[]: `addps`, `addpd`
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+====== x86 paddq instruction
+
+link:userland/arch/x86_64/paddq.S[]: `paddq`, `paddl`, `paddw`, `paddb`
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
 === rdtsc

 TODO: review this section, make a more controlled userland experiment with <<m5ops>> instrumentation.
@@ -12594,11 +12608,11 @@ Bibliography: https://stackoverflow.com/questions/1875491/nop-for-iphone-binarie

 ==== ARM fadd vs vadd

-It is very confusing, but `fadds` and `faddd` in Aarch32 are <<gnu-gas-assembler-arm-unified-syntax,pre-UAL>> for `vadd.f32` and `vadd.f64`.
+It is very confusing, but `fadds` and `faddd` in Aarch32 are <<gnu-gas-assembler-arm-unified-syntax,pre-UAL>> for `vadd.f32` and `vadd.f64` which we use in this tutorial: <<arm-vadd-instruction>>

 The same goes for most ARMv7 mnemonics: `f*` is old, and `v*` is the newer better syntax.

-But then, in ARMv8, they decided to use `fadd` as the main floating point add name, and get rid of `vadd`!
+But then, in ARMv8, they decided to use <<armv8-aarch64-fadd-vector-instruction>> as the main floating point add name, and get rid of `vadd`!

 Also keep in mind that fused multiply add is `fmadd`.

@@ -12606,6 +12620,24 @@ Examples at: <<simd-assembly>>

 ==== ARM SIMD instructions

+===== ARM vadd instruction
+
+link:userland/arch/arm/vadd.S[]
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+===== ARMv8 aarch64 add vector instruction
+
+link:userland/arch/aarch64/add_vector.S[]
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+===== ARMv8 aarch64 fadd vector instruction
+
+link:userland/arch/aarch64/fadd_vector.S[]
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
 ===== ARM vcvt instruction

 Example: link:userland/arch/arm/vcvt.S[]
--- a/userland/arch/aarch64/add_vector.S
+++ b/userland/arch/aarch64/add_vector.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#simd-assembly
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-add-vector-instruction
 *
 * Add a bunch of integers in one go.
 */
--- a/userland/arch/aarch64/fadd_vector.S
+++ b/userland/arch/aarch64/fadd_vector.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#simd-assembly
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-fadd-vector-instruction
 *
 * Add a bunch of floating point numbers in one go.
 */
--- a/userland/arch/arm/vadd.S
+++ b/userland/arch/arm/vadd.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-simd-instruction-assembly */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vadd-instruction */

 #include "common.h"

--- a/userland/arch/x86_64/addpd.S
+++ b/userland/arch/x86_64/addpd.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-simd
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-addpq-instruction
 *
 * Add a bunch of floating point numbers in one go.
 */
--- a/userland/arch/x86_64/paddq.S
+++ b/userland/arch/x86_64/paddq.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#simd-assembly
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-paddq-instruction
 *
 * Add a bunch of integers in one go.
 *