From 9f9db3680d3f38e05f8f2459ee34dcb8d94460f3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ciro=20Santilli=20=E5=85=AD=E5=9B=9B=E4=BA=8B=E4=BB=B6=20?=
 =?UTF-8?q?=E6=B3=95=E8=BD=AE=E5=8A=9F?= <ciro.santilli@gmail.com>
Date: Sun, 12 May 2019 00:00:10 +0000
Subject: [PATCH] ld2 move in

---
 README.adoc                                   | 46 +++++++++++--------
 .../arch/aarch64/{simd_interleave.S => ld2.S} |  2 +-
 2 files changed, 28 insertions(+), 20 deletions(-)
 rename userland/arch/aarch64/{simd_interleave.S => ld2.S} (96%)
diff --git a/README.adoc b/README.adoc
index 09893d8..9ade68c 100644
--- a/README.adoc
+++ b/README.adoc
@@ -12606,7 +12606,25 @@ Bibliography: https://stackoverflow.com/questions/1875491/nop-for-iphone-binarie
 
 === ARM SIMD
 
-==== ARM fadd vs vadd
+==== ARM vadd instruction
+
+link:userland/arch/arm/vadd.S[]
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+==== ARMv8 aarch64 add vector instruction
+
+link:userland/arch/aarch64/add_vector.S[]
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+==== ARMv8 aarch64 fadd vector instruction
+
+link:userland/arch/aarch64/fadd_vector.S[]
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+===== ARM fadd vs vadd
 
 It is very confusing, but `fadds` and `faddd` in Aarch32 are <<gnu-gas-assembler-arm-unified-syntax,pre-UAL>> for `vadd.f32` and `vadd.f64` which we use in this tutorial: <<arm-vadd-instruction>>
 
@@ -12618,27 +12636,17 @@ Also keep in mind that fused multiply add is `fmadd`.
 
 Examples at: <<simd-assembly>>
 
-==== ARM SIMD instructions
+==== arm ld2 instruction
 
-===== ARM vadd instruction
+Example: link:userland/arch/aarch64/ld2.S[]
 
-link:userland/arch/arm/vadd.S[]
+We can load multiple vectors interleaved from memory in one single instruction!
 
-Good first instruction to learn SIMD: <<simd-assembly>>
+This is why the `ldN` instructions take an argument list denoted by `{}` for the registers, much like armv7 <<ldmia>>.
 
-===== ARMv8 aarch64 add vector instruction
+There are analogous `ld3` and `ld4` instruction.
 
-link:userland/arch/aarch64/add_vector.S[]
-
-Good first instruction to learn SIMD: <<simd-assembly>>
-
-===== ARMv8 aarch64 fadd vector instruction
-
-link:userland/arch/aarch64/fadd_vector.S[]
-
-Good first instruction to learn SIMD: <<simd-assembly>>
-
-===== ARM vcvt instruction
+==== ARM vcvt instruction
 
 Example: link:userland/arch/arm/vcvt.S[]
 
@@ -12658,7 +12666,7 @@ E.g., in our 32-bit float to 32-bit unsigned example we use:
 vld1.32.f32
 ....
 
-====== ARM vcvtr instruction
+===== ARM vcvtr instruction
 
 Example: link:userland/arch/arm/vcvtr.S[]
 
@@ -12670,7 +12678,7 @@ Rounding mode selection is exposed in the ANSI C standard through link:https://e
 
 TODO: is the initial rounding mode specified by the ELF standard? Could not find a reference.
 
-====== ARM vcvta instruction
+===== ARM vcvta instruction
 
 Example: link:userland/arch/arm/vcvt.S[]
 
diff --git a/userland/arch/aarch64/simd_interleave.S b/userland/arch/aarch64/ld2.S
similarity index 96%
rename from userland/arch/aarch64/simd_interleave.S
rename to userland/arch/aarch64/ld2.S
index 38bcc8b..22d4304 100644
--- a/userland/arch/aarch64/simd_interleave.S
+++ b/userland/arch/aarch64/ld2.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-simd-interleaving */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ld2-instruction */
 
 #include "common.h"