diff --git a/README.adoc b/README.adoc index 2e537cf..09893d8 100644 --- a/README.adoc +++ b/README.adoc @@ -11690,13 +11690,13 @@ ____ Much like ADD for non-SIMD, start learning SIMD instructions by looking at the integer and floating point SIMD ADD instructions of each ISA: * x86 -** link:userland/arch/x86_64/addpd.S[]: `ADDPS`, `ADDPD` -** link:userland/arch/x86_64/paddq.S[]: `PADDQ`, `PADDL`, `PADDW`, `PADDB` +** <> +** <> * arm -** link:userland/arch/arm/vadd.S[] +** <> * aarch64 -** link:userland/arch/aarch64/add_vector.S[] -** link:userland/arch/aarch64/fadd_vector.S[] +** <> +** <> Then it is just a huge copy paste of infinite boring details: @@ -12009,6 +12009,20 @@ History: * AVX2:2013 * AVX-512: 2016. 512-bit ZMM registers. Extension of YMM. +===== x86 SSE2 + +====== x86 addpd instruction + +link:userland/arch/x86_64/addpd.S[]: `addps`, `addpd` + +Good first instruction to learn SIMD: <> + +====== x86 paddq instruction + +link:userland/arch/x86_64/paddq.S[]: `paddq`, `paddl`, `paddw`, `paddb` + +Good first instruction to learn SIMD: <> + === rdtsc TODO: review this section, make a more controlled userland experiment with <> instrumentation. @@ -12594,11 +12608,11 @@ Bibliography: https://stackoverflow.com/questions/1875491/nop-for-iphone-binarie ==== ARM fadd vs vadd -It is very confusing, but `fadds` and `faddd` in Aarch32 are <> for `vadd.f32` and `vadd.f64`. +It is very confusing, but `fadds` and `faddd` in Aarch32 are <> for `vadd.f32` and `vadd.f64` which we use in this tutorial: <> The same goes for most ARMv7 mnemonics: `f*` is old, and `v*` is the newer better syntax. -But then, in ARMv8, they decided to use `fadd` as the main floating point add name, and get rid of `vadd`! +But then, in ARMv8, they decided to use <> as the main floating point add name, and get rid of `vadd`! Also keep in mind that fused multiply add is `fmadd`. @@ -12606,6 +12620,24 @@ Examples at: <> ==== ARM SIMD instructions +===== ARM vadd instruction + +link:userland/arch/arm/vadd.S[] + +Good first instruction to learn SIMD: <> + +===== ARMv8 aarch64 add vector instruction + +link:userland/arch/aarch64/add_vector.S[] + +Good first instruction to learn SIMD: <> + +===== ARMv8 aarch64 fadd vector instruction + +link:userland/arch/aarch64/fadd_vector.S[] + +Good first instruction to learn SIMD: <> + ===== ARM vcvt instruction Example: link:userland/arch/arm/vcvt.S[] diff --git a/userland/arch/aarch64/add_vector.S b/userland/arch/aarch64/add_vector.S index b449b73..c919551 100644 --- a/userland/arch/aarch64/add_vector.S +++ b/userland/arch/aarch64/add_vector.S @@ -1,4 +1,4 @@ -/* https://github.com/cirosantilli/linux-kernel-module-cheat#simd-assembly +/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-add-vector-instruction * * Add a bunch of integers in one go. */ diff --git a/userland/arch/aarch64/fadd_vector.S b/userland/arch/aarch64/fadd_vector.S index ebaa595..d94a144 100644 --- a/userland/arch/aarch64/fadd_vector.S +++ b/userland/arch/aarch64/fadd_vector.S @@ -1,4 +1,4 @@ -/* https://github.com/cirosantilli/linux-kernel-module-cheat#simd-assembly +/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-fadd-vector-instruction * * Add a bunch of floating point numbers in one go. */ diff --git a/userland/arch/arm/vadd.S b/userland/arch/arm/vadd.S index fb44d18..e688584 100644 --- a/userland/arch/arm/vadd.S +++ b/userland/arch/arm/vadd.S @@ -1,4 +1,4 @@ -/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-simd-instruction-assembly */ +/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vadd-instruction */ #include "common.h" diff --git a/userland/arch/x86_64/addpd.S b/userland/arch/x86_64/addpd.S index eefdfd2..cef8599 100644 --- a/userland/arch/x86_64/addpd.S +++ b/userland/arch/x86_64/addpd.S @@ -1,4 +1,4 @@ -/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-simd +/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-addpq-instruction * * Add a bunch of floating point numbers in one go. */ diff --git a/userland/arch/x86_64/paddq.S b/userland/arch/x86_64/paddq.S index ff6b4a4..fbe23e3 100644 --- a/userland/arch/x86_64/paddq.S +++ b/userland/arch/x86_64/paddq.S @@ -1,4 +1,4 @@ -/* https://github.com/cirosantilli/linux-kernel-module-cheat#simd-assembly +/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-paddq-instruction * * Add a bunch of integers in one go. *