From 9f9db3680d3f38e05f8f2459ee34dcb8d94460f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ciro=20Santilli=20=E5=85=AD=E5=9B=9B=E4=BA=8B=E4=BB=B6=20?= =?UTF-8?q?=E6=B3=95=E8=BD=AE=E5=8A=9F?= Date: Sun, 12 May 2019 00:00:10 +0000 Subject: [PATCH] ld2 move in --- README.adoc | 46 +++++++++++-------- .../arch/aarch64/{simd_interleave.S => ld2.S} | 2 +- 2 files changed, 28 insertions(+), 20 deletions(-) rename userland/arch/aarch64/{simd_interleave.S => ld2.S} (96%) diff --git a/README.adoc b/README.adoc index 09893d8..9ade68c 100644 --- a/README.adoc +++ b/README.adoc @@ -12606,7 +12606,25 @@ Bibliography: https://stackoverflow.com/questions/1875491/nop-for-iphone-binarie === ARM SIMD -==== ARM fadd vs vadd +==== ARM vadd instruction + +link:userland/arch/arm/vadd.S[] + +Good first instruction to learn SIMD: <> + +==== ARMv8 aarch64 add vector instruction + +link:userland/arch/aarch64/add_vector.S[] + +Good first instruction to learn SIMD: <> + +==== ARMv8 aarch64 fadd vector instruction + +link:userland/arch/aarch64/fadd_vector.S[] + +Good first instruction to learn SIMD: <> + +===== ARM fadd vs vadd It is very confusing, but `fadds` and `faddd` in Aarch32 are <> for `vadd.f32` and `vadd.f64` which we use in this tutorial: <> @@ -12618,27 +12636,17 @@ Also keep in mind that fused multiply add is `fmadd`. Examples at: <> -==== ARM SIMD instructions +==== arm ld2 instruction -===== ARM vadd instruction +Example: link:userland/arch/aarch64/ld2.S[] -link:userland/arch/arm/vadd.S[] +We can load multiple vectors interleaved from memory in one single instruction! -Good first instruction to learn SIMD: <> +This is why the `ldN` instructions take an argument list denoted by `{}` for the registers, much like armv7 <>. -===== ARMv8 aarch64 add vector instruction +There are analogous `ld3` and `ld4` instruction. -link:userland/arch/aarch64/add_vector.S[] - -Good first instruction to learn SIMD: <> - -===== ARMv8 aarch64 fadd vector instruction - -link:userland/arch/aarch64/fadd_vector.S[] - -Good first instruction to learn SIMD: <> - -===== ARM vcvt instruction +==== ARM vcvt instruction Example: link:userland/arch/arm/vcvt.S[] @@ -12658,7 +12666,7 @@ E.g., in our 32-bit float to 32-bit unsigned example we use: vld1.32.f32 .... -====== ARM vcvtr instruction +===== ARM vcvtr instruction Example: link:userland/arch/arm/vcvtr.S[] @@ -12670,7 +12678,7 @@ Rounding mode selection is exposed in the ANSI C standard through link:https://e TODO: is the initial rounding mode specified by the ELF standard? Could not find a reference. -====== ARM vcvta instruction +===== ARM vcvta instruction Example: link:userland/arch/arm/vcvt.S[] diff --git a/userland/arch/aarch64/simd_interleave.S b/userland/arch/aarch64/ld2.S similarity index 96% rename from userland/arch/aarch64/simd_interleave.S rename to userland/arch/aarch64/ld2.S index 38bcc8b..22d4304 100644 --- a/userland/arch/aarch64/simd_interleave.S +++ b/userland/arch/aarch64/ld2.S @@ -1,4 +1,4 @@ -/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-simd-interleaving */ +/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ld2-instruction */ #include "common.h"