From 180e26590a7997399a28a733f08b8eef3fcd5a87 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ciro=20Santilli=20=E5=85=AD=E5=9B=9B=E4=BA=8B=E4=BB=B6=20?= =?UTF-8?q?=E6=B3=95=E8=BD=AE=E5=8A=9F?= Date: Sun, 12 May 2019 00:00:07 +0000 Subject: [PATCH] move more arm in --- README.adoc | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 84 insertions(+), 2 deletions(-) diff --git a/README.adoc b/README.adoc index 14e4e18..13fecaf 100644 --- a/README.adoc +++ b/README.adoc @@ -11701,6 +11701,7 @@ Much like ADD for non-SIMD, start learning SIMD instructions by looking at the i Then it is just a huge copy paste of infinite boring details: * <> +* <> === User vs system assembly @@ -12086,6 +12087,8 @@ ARMv8 has link:https://en.wikipedia.org/wiki/ARM_architecture#ARMv8-A[had severa * v8.4: TODO * v8.5: 2018 +They are described at: <> A1.7 "ARMv8 architecture extensions". + ===== AArch32 32-bit mode of operation of ARMv8. @@ -12099,7 +12102,7 @@ For this reason, QEMU and GAS seems to enable both AArch32 and ARMv7 under `arm` There are however some extensions over ARMv7, many of them are functionality that ARMv8 has and that designers decided to backport on AArch32 as well, e.g.: -* <> +* <> ===== AArch32 vs AArch64 @@ -12107,6 +12110,7 @@ A great summary of differences can be found at: https://en.wikipedia.org/wiki/AR Some random ones: +* aarch32 has two encodings: Thumb and ARM: <> * in ARMv8, the stack has to 16-byte aligned. Therefore, the main way to push things to stack is with 8-byte pair pushes with the <> ==== Free ARM implementations @@ -12135,6 +12139,36 @@ ____ ARM designed CPUs however are mostly called `Coretx-A`: https://en.wikipedia.org/wiki/List_of_applications_of_ARM_cores Vortex and Tempest are Apple designed ones. Bibliography: https://www.quora.com/Why-is-it-that-you-need-a-license-from-ARM-to-design-an-ARM-CPU-How-are-the-instruction-sets-protected +==== ARM instruction encodings + +Understanding the basics of instruction encodings is fundamental to help you to remember what instructions do and why some things are possible or not, notably the <> and the <>. + +aarch32 has two "instruction sets", which to look just like encodings. + +Some control bit determines which one we are currently on, and userland can switch between them with the <>. + +The encodings are: + +* A32: every instruction is 4 bytes long. Can encode every instruction. +* T32: most common instructions are 2 bytes long. Many others less common ones are 4 bytes long. ++ +T stands for "Thumb", which is the original name for the technology. The word "Thumb" does not appear on <> however. It does appear on <> though. ++ +See also: <> F2.1.3 "Instruction encodings". + +Within each instruction set, there can be multiple encodings for a given function, and they are noted simply as: + +* A1, A2, ...: A32 encodings +* T1, T2, ..m: T32 encodings + +This RISC-y mostly fixed instruction length design likely makes processor design easier and allows for certain optimizations, at the cost of slightly more complex assembly, as you can't encode 4 / 8 byte addresses in a single instruction. Totally worth it IMHO. + +This design can be contrasted with x86, which has widely variable instruction length. + +Bibliography: + +* https://stackoverflow.com/questions/28669905/what-is-the-difference-between-the-arm-thumb-and-thumb-2-instruction-encodings +* https://reverseengineering.stackexchange.com/questions/6080/how-to-detect-thumb-mode-in-arm-disassembly === ARM branch instructions @@ -12447,7 +12481,7 @@ Cannot load from or to memory, since only the `ldr` and `str` instruction famili Example: link:userland/arch/arm/mov.S[] -Since every instruction <>, there is not enough space to encode arbitrary 32-bit immediates in a single instruction, since some of the bits are needed to actually encode the instruction itself. +Since every instruction <>, there is not enough space to encode arbitrary 32-bit immediates in a single instruction, since some of the bits are needed to actually encode the instruction itself. The solutions to this problem are mentioned at: @@ -12553,6 +12587,54 @@ gdb-multiarch -batch -ex 'arch arm' -ex "file v7/nop.out" -ex "disassemble/rs as Bibliography: https://stackoverflow.com/questions/1875491/nop-for-iphone-binaries +=== ARM SIMD + +==== ARM SIMD instructions + +===== ARM vcvt instruction + +Example: link:userland/arch/arm/vcvt.S[] + +Convert between integers and floating point. + +<> on rounding: + +____ +The floating-point to fixed-point operation uses the Round towards Zero rounding mode. The fixed-point to floating-point operation uses the Round to Nearest rounding mode. +____ + +Notice how the opcode takes two types. + +E.g., in our 32-bit float to 32-bit unsigned example we use: + +.... +vld1.32.f32 +.... + +====== ARM vcvtr instruction + +Example: link:userland/arch/arm/vcvtr.S[] + +Like <>, but the rounding mode is selected by the FPSCR.RMode field. + +Selecting rounding mode explicitly per instruction was apparently not possible in ARMv7, but was made possible in <> e.g. with <>. + +Rounding mode selection is exposed in the ANSI C standard through link:https://en.cppreference.com/w/c/numeric/fenv/feround[`fesetround`]. + +TODO: is the initial rounding mode specified by the ELF standard? Could not find a reference. + +====== ARM vcvta instruction + +Example: link:userland/arch/arm/vcvt.S[] + +Added in ARMv8 <> only, not present in ARMv7. + +In ARMv7, to use a non-round-to-zero rounding mode, you had to set the rounding mode with FPSCR and use the R version of the instruction e.g. <>. + +Now in AArch32 it is possible to do it explicitly per-instruction. + +Also there was no ties to away mode in ARMv7. This mode does not exist in C99 either. + === ARM assembly bibliography ==== ARM non-official bibliography