Pārlūkot izejas kodu

Merge branch 'feature/ulp' into 'master'

Initial support for generation of ULP coprocessor code

This adds basic support for writing ULP coprocessor programs using an assembly-like syntax, with integer labels and branches to labels.

See merge request !261

Ivan Grokhotkov 9 gadi atpakaļ
vecāks
revīzija
c8685c2002

+ 121 - 0
components/ulp/README.rst

@@ -0,0 +1,121 @@
+ULP coprocessor programming
+===========================
+
+.. warning:: ULP coprocessor programming approach described here is experimental. It is probable that once binutils support for ULP is done, this preprocessor-based approach may be deprecated. We welcome discussion about and contributions to ULP programming tools.
+
+ULP coprocessor is a simple FSM which is designed to perform measurements using ADC, temperature sensor, and external I2C sensors, while main processors are in deep sleep mode. ULP coprocessor can access RTC_SLOW_MEM memory region, and registers in RTC_CNTL, RTC_IO, and SARADC peripherals. ULP coprocessor uses fixed-width 32-bit instructions, 32-bit memory addressing, and has 4 general purpose 16-bit registers.
+
+ULP coprocessor doesn't have a dedicated binutils port yet. Programming ULP coprocessor is possible by embedding assembly-like macros into an ESP32 application.
+Here is an example how this can be done::
+
+    const ulp_insn_t program[] = {
+        I_MOVI(R3, 16),         // R3 <- 16
+        I_LD(R0, R3, 0),        // R0 <- RTC_SLOW_MEM[R3 + 0]
+        I_LD(R1, R3, 1),        // R1 <- RTC_SLOW_MEM[R3 + 1]
+        I_ADDR(R2, R0, R1),     // R2 <- R0 + R1
+        I_ST(R2, R3, 2),        // R2 -> RTC_SLOW_MEM[R2 + 2]
+        I_HALT()
+    };
+    size_t load_addr = 0;
+    size_t size = sizeof(program)/sizeof(ulp_insn_t);
+    ulp_process_macros_and_load(load_addr, program, &size);
+    ulp_run(load_addr);
+
+The ``program`` array is an array of ``ulp_insn_t``, i.e. ULP coprocessor instructions. Each ``I_XXX`` preprocessor define translates into a single 32-bit instruction. Arguments of these preprocessor defines can be register numbers (``R0 — R3``) and literal constants. See `ULP coprocessor instruction defines`_ section for descriptions of instructions and arguments they take.
+
+Load and store instructions use addresses expressed in 32-bit words. Address 0 corresponds to the first word of ``RTC_SLOW_MEM`` (which is address 0x50000000 as seen by the main CPUs).
+
+To generate branch instructions, special ``M_`` preprocessor defines are used. ``M_LABEL`` define can be used to define a branch target. Label identifier is a 16-bit integer. ``M_Bxxx`` defines can be used to generate branch instructions with target set to a particular label. 
+
+Implementation note: these ``M_`` preprocessor defines will be translated into two ``ulp_insn_t`` values: one is a token value which contains label number, and the other is the actual instruction. ``ulp_process_macros_and_load`` function resolves the label number to the address, modifies the branch instruction to use the correct address, and removes the the extra ``ulp_insn_t`` token which contains the label numer.
+
+Here is an example of using labels and branches::
+
+    const ulp_insn_t program[] = {
+        I_MOVI(R0, 34),         // R0 <- 34
+        M_LABEL(1),             // label_1
+        I_MOVI(R1, 32),         // R1 <- 32
+        I_LD(R1, R1, 0),        // R1 <- RTC_SLOW_MEM[R1]
+        I_MOVI(R2, 33),         // R2 <- 33
+        I_LD(R2, R2, 0),        // R2 <- RTC_SLOW_MEM[R2]
+        I_SUBR(R3, R1, R2),     // R3 <- R1 - R2
+        I_ST(R3, R0, 0),        // R3 -> RTC_SLOW_MEM[R0 + 0]
+        I_ADDI(R0, R0, 1),      // R0++
+        M_BL(1, 64),            // if (R0 < 64) goto label_1
+        I_HALT(),
+    };
+    RTC_SLOW_MEM[32] = 42;
+    RTC_SLOW_MEM[33] = 18;
+    size_t load_addr = 0;
+    size_t size = sizeof(program)/sizeof(ulp_insn_t);
+    ulp_process_macros_and_load(load_addr, program, &size);
+    ulp_run(load_addr);
+
+
+Functions
+^^^^^^^^^
+
+.. doxygenfunction:: ulp_process_macros_and_load
+.. doxygenfunction:: ulp_run
+
+Error codes
+^^^^^^^^^^^
+
+.. doxygendefine:: ESP_ERR_ULP_BASE
+.. doxygendefine:: ESP_ERR_ULP_SIZE_TOO_BIG
+.. doxygendefine:: ESP_ERR_ULP_INVALID_LOAD_ADDR
+.. doxygendefine:: ESP_ERR_ULP_DUPLICATE_LABEL
+.. doxygendefine:: ESP_ERR_ULP_UNDEFINED_LABEL
+.. doxygendefine:: ESP_ERR_ULP_BRANCH_OUT_OF_RANGE
+
+ULP coprocessor registers
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+ULP co-processor has 4 16-bit general purpose registers. All registers have same functionality, with one exception. R0 register is used by some of the compare-and-branch instructions as a source register.
+ 
+These definitions can be used for all instructions which require a register.
+
+.. doxygengroup:: ulp_registers
+    :content-only:
+    
+ULP coprocessor instruction defines
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. doxygendefine:: I_DELAY
+.. doxygendefine:: I_HALT
+.. doxygendefine:: I_ST
+.. doxygendefine:: I_LD
+.. doxygendefine:: I_BL
+.. doxygendefine:: I_BGE
+.. doxygendefine:: I_BXR
+.. doxygendefine:: I_BXI
+.. doxygendefine:: I_BXZR
+.. doxygendefine:: I_BXZI
+.. doxygendefine:: I_BXFR
+.. doxygendefine:: I_BXFI
+.. doxygendefine:: I_ADDR
+.. doxygendefine:: I_SUBR
+.. doxygendefine:: I_ANDR
+.. doxygendefine:: I_ORR
+.. doxygendefine:: I_MOVR
+.. doxygendefine:: I_LSHR
+.. doxygendefine:: I_RSHR
+.. doxygendefine:: I_ADDI
+.. doxygendefine:: I_SUBI
+.. doxygendefine:: I_ANDI
+.. doxygendefine:: I_ORI
+.. doxygendefine:: I_MOVI
+.. doxygendefine:: I_LSHI
+.. doxygendefine:: I_RSHI
+.. doxygendefine:: M_LABEL
+.. doxygendefine:: M_BL
+.. doxygendefine:: M_BGE
+.. doxygendefine:: M_BX
+.. doxygendefine:: M_BXZ
+.. doxygendefine:: M_BXF
+
+Defines
+^^^^^^^
+
+.. doxygendefine:: RTC_SLOW_MEM
+

+ 0 - 0
components/ulp/component.mk


+ 708 - 0
components/ulp/include/esp32/ulp.h

@@ -0,0 +1,708 @@
+// Copyright 2016 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#pragma once
+#include <stdint.h>
+#include <stddef.h>
+#include <stdlib.h>
+#include "esp_err.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+/**
+ * @defgroup ulp_registers ULP coprocessor registers
+ * @{
+ */
+
+
+#define R0 0    /*!< general purpose register 0 */
+#define R1 1    /*!< general purpose register 1 */
+#define R2 2    /*!< general purpose register 2 */
+#define R3 3    /*!< general purpose register 3 */
+/**@}*/
+
+/** @defgroup ulp_opcodes ULP coprocessor opcodes, sub opcodes, and various modifiers/flags
+ *
+ * These definitions are not intended to be used directly.
+ * They are used in definitions of instructions later on.
+ *
+ * @{
+ */
+
+#define OPCODE_WR_REG 1         /*!< Instruction: write peripheral register (RTC_CNTL/RTC_IO/SARADC) (not implemented yet) */
+
+#define OPCODE_RD_REG 2         /*!< Instruction: read peripheral register (RTC_CNTL/RTC_IO/SARADC) (not implemented yet) */
+
+#define OPCODE_I2C 3            /*!< Instruction: read/write I2C (not implemented yet) */
+
+#define OPCODE_DELAY 4          /*!< Instruction: delay (nop) for a given number of cycles */
+
+#define OPCODE_ADC 5            /*!< Instruction: SAR ADC measurement (not implemented yet) */
+
+#define OPCODE_ST 6             /*!< Instruction: store indirect to RTC memory */
+#define SUB_OPCODE_ST 4         /*!< Store 32 bits, 16 MSBs contain PC, 16 LSBs contain value from source register */
+
+#define OPCODE_ALU 7            /*!< Arithmetic instructions */
+#define SUB_OPCODE_ALU_REG 0    /*!< Arithmetic instruction, both source values are in register */
+#define SUB_OPCODE_ALU_IMM 1    /*!< Arithmetic instruction, one source value is an immediate */
+#define SUB_OPCODE_ALU_CNT 2    /*!< Arithmetic instruction between counter register and an immediate (not implemented yet)*/
+#define ALU_SEL_ADD 0           /*!< Addition */
+#define ALU_SEL_SUB 1           /*!< Subtraction */
+#define ALU_SEL_AND 2           /*!< Logical AND */
+#define ALU_SEL_OR  3           /*!< Logical OR */
+#define ALU_SEL_MOV 4           /*!< Copy value (immediate to destination register or source register to destination register */
+#define ALU_SEL_LSH 5           /*!< Shift left by given number of bits */
+#define ALU_SEL_RSH 6           /*!< Shift right by given number of bits */
+
+#define OPCODE_BRANCH 8         /*!< Branch instructions */
+#define SUB_OPCODE_BX  0        /*!< Branch to absolute PC (immediate or in register) */
+#define BX_JUMP_TYPE_DIRECT 0   /*!< Unconditional jump */
+#define BX_JUMP_TYPE_ZERO 1     /*!< Branch if last ALU result is zero */
+#define BX_JUMP_TYPE_OVF 2      /*!< Branch if last ALU operation caused and overflow */
+#define SUB_OPCODE_B  1         /*!< Branch to a relative offset */
+#define B_CMP_L 0               /*!< Branch if R0 is less than an immediate */
+#define B_CMP_GE 1              /*!< Branch if R0 is greater than or equal to an immediate */
+
+#define OPCODE_END 9            /*!< Stop executing the program (not implemented yet) */
+#define SUB_OPCODE_END 0        /*!< Stop executing the program and optionally wake up the chip */
+#define SUB_OPCODE_SLEEP 1      /*!< Stop executing the program and run it again after selected interval */
+
+#define OPCODE_TSENS 10         /*!< Instruction: temperature sensor measurement (not implemented yet) */
+
+#define OPCODE_HALT 11          /*!< Halt the coprocessor */
+
+#define OPCODE_LD 13            /*!< Indirect load lower 16 bits from RTC memory */
+
+#define OPCODE_MACRO 15         /*!< Not a real opcode. Used to identify labels and branches in the program */
+#define SUB_OPCODE_MACRO_LABEL 0    /*!< Label macro */
+#define SUB_OPCODE_MACRO_BRANCH 1   /*!< Branch macro */
+/**@}*/
+
+/**@{*/
+#define ESP_ERR_ULP_BASE                0x1200                  /*!< Offset for ULP-related error codes */
+#define ESP_ERR_ULP_SIZE_TOO_BIG        (ESP_ERR_ULP_BASE + 1)  /*!< Program doesn't fit into RTC memory reserved for the ULP */
+#define ESP_ERR_ULP_INVALID_LOAD_ADDR   (ESP_ERR_ULP_BASE + 2)  /*!< Load address is outside of RTC memory reserved for the ULP */
+#define ESP_ERR_ULP_DUPLICATE_LABEL     (ESP_ERR_ULP_BASE + 3)  /*!< More than one label with the same number was defined */
+#define ESP_ERR_ULP_UNDEFINED_LABEL     (ESP_ERR_ULP_BASE + 4)  /*!< Branch instructions references an undefined label */
+#define ESP_ERR_ULP_BRANCH_OUT_OF_RANGE (ESP_ERR_ULP_BASE + 5)  /*!< Branch target is out of range of B instruction (try replacing with BX) */
+/**@}*/
+
+
+/**
+ * @brief Instruction format structure
+ *
+ * All ULP instructions are 32 bit long.
+ * This union contains field layouts used by all of the supported instructions.
+ * This union also includes a special "macro" instruction layout.
+ * This is not a real instruction which can be executed by the CPU. It acts
+ * as a token which is removed from the program by the
+ * ulp_process_macros_and_load function.
+ *
+ * These structures are not intended to be used directly.
+ * Preprocessor definitions provided below fill the fields of these structure with
+ * the right arguments.
+ */
+typedef union {
+
+    struct {
+        uint32_t cycles : 16;       /*!< Number of cycles to sleep */
+        uint32_t unused : 12;       /*!< Unused */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_DELAY) */
+    } delay;                        /*!< Format of DELAY instruction */
+
+    struct {
+        uint32_t dreg : 2;          /*!< Register which contains data to store */
+        uint32_t sreg : 2;          /*!< Register which contains address in RTC memory (expressed in words) */
+        uint32_t unused1 : 6;       /*!< Unused */
+        uint32_t offset : 11;       /*!< Offset to add to sreg */
+        uint32_t unused2 : 4;       /*!< Unused */
+        uint32_t sub_opcode : 3;    /*!< Sub opcode (SUB_OPCODE_ST) */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_ST) */
+    } st;                           /*!< Format of ST instruction */
+
+    struct {
+        uint32_t dreg : 2;          /*!< Register where the data should be loaded to */
+        uint32_t sreg : 2;          /*!< Register which contains address in RTC memory (expressed in words) */
+        uint32_t unused1 : 6;       /*!< Unused */
+        uint32_t offset : 11;       /*!< Offset to add to sreg */
+        uint32_t unused2 : 7;       /*!< Unused */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_LD) */
+    } ld;                           /*!< Format of LD instruction */
+
+    struct {
+        uint32_t unused : 28;       /*!< Unused */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_HALT) */
+    } halt;                         /*!< Format of HALT instruction */
+
+    struct {
+        uint32_t dreg : 2;          /*!< Register which contains target PC, expressed in words (used if .reg == 1) */
+        uint32_t addr : 11;         /*!< Target PC, expressed in words (used if .reg == 0) */
+        uint32_t unused : 8;        /*!< Unused */
+        uint32_t reg : 1;           /*!< Target PC in register (1) or immediate (0) */
+        uint32_t type : 3;          /*!< Jump condition (BX_JUMP_TYPE_xxx) */
+        uint32_t sub_opcode : 3;    /*!< Sub opcode (SUB_OPCODE_BX) */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_BRANCH) */
+    } bx;                           /*!< Format of BRANCH instruction (absolute address) */
+
+    struct {
+        uint32_t imm : 16;          /*!< Immediate value to compare against */
+        uint32_t cmp : 1;           /*!< Comparison to perform: B_CMP_L or B_CMP_GE */
+        uint32_t offset : 7;        /*!< Absolute value of target PC offset w.r.t. current PC, expressed in words */
+        uint32_t sign : 1;          /*!< Sign of target PC offset: 0: positive, 1: negative */
+        uint32_t sub_opcode : 3;    /*!< Sub opcode (SUB_OPCODE_B) */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_BRANCH) */
+    } b;                            /*!< Format of BRANCH instruction (relative address) */
+
+    struct {
+        uint32_t dreg : 2;          /*!< Destination register */
+        uint32_t sreg : 2;          /*!< Register with operand A */
+        uint32_t treg : 2;          /*!< Register with operand B */
+        uint32_t unused : 15;       /*!< Unused */
+        uint32_t sel : 4;           /*!< Operation to perform, one of ALU_SEL_xxx */
+        uint32_t sub_opcode : 3;    /*!< Sub opcode (SUB_OPCODE_ALU_REG) */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_ALU) */
+    } alu_reg;                      /*!< Format of ALU instruction (both sources are registers) */
+
+    struct {
+        uint32_t dreg : 2;          /*!< Destination register */
+        uint32_t sreg : 2;          /*!< Register with operand A */
+        uint32_t imm : 16;          /*!< Immediate value of operand B */
+        uint32_t unused : 1;        /*!< Unused */
+        uint32_t sel : 4;           /*!< Operation to perform, one of ALU_SEL_xxx */
+        uint32_t sub_opcode : 3;    /*!< Sub opcode (SUB_OPCODE_ALU_IMM) */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_ALU) */
+    } alu_imm;                      /*!< Format of ALU instruction (one source is an immediate) */
+
+    struct {
+        uint32_t addr : 8;          /*!< Address within either RTC_CNTL, RTC_IO, or SARADC */
+        uint32_t periph_sel : 2;    /*!< Select peripheral: RTC_CNTL (0), RTC_IO(1), SARADC(2) */
+        uint32_t data : 8;          /*!< 8 bits of data to write */
+        uint32_t high : 5;          /*!< High bit */
+        uint32_t low : 5;           /*!< Low bit */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_WR_REG) */
+    } wr_reg;                       /*!< Format of WR_REG instruction */
+
+    struct {
+        uint32_t addr : 8;          /*!< Address within either RTC_CNTL, RTC_IO, or SARADC */
+        uint32_t periph_sel : 2;    /*!< Select peripheral: RTC_CNTL (0), RTC_IO(1), SARADC(2) */
+        uint32_t unused : 8;        /*!< Unused */
+        uint32_t high : 5;          /*!< High bit */
+        uint32_t low : 5;           /*!< Low bit */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_WR_REG) */
+    } rd_reg;                       /*!< Format of WR_REG instruction */
+
+    struct {
+        uint32_t dreg : 2;          /*!< Register where to store ADC result */
+        uint32_t mux : 4;           /*!< Select SARADC pad (mux + 1) */
+        uint32_t sar_sel : 1;       /*!< Select SARADC0 (0) or SARADC1 (1) */
+        uint32_t unused1 : 1;       /*!< Unused */
+        uint32_t cycles : 16;       /*!< TBD, cycles used for measurement */
+        uint32_t unused2 : 4;       /*!< Unused */
+        uint32_t opcode: 4;         /*!< Opcode (OPCODE_ADC) */
+    } adc;                          /*!< Format of ADC instruction */
+
+    struct {
+        uint32_t dreg : 2;          /*!< Register where to store temperature measurement result */
+        uint32_t wait_delay: 14;    /*!< Cycles to wait after measurement is done */
+        uint32_t cycles: 12;        /*!< Cycles used to perform measurement */
+        uint32_t opcode: 4;         /*!< Opcode (OPCODE_TSENS) */
+    } tsens;                        /*!< Format of TSENS instruction */
+
+    struct {
+        uint32_t i2c_addr : 8;      /*!< I2C slave address */
+        uint32_t data : 8;          /*!< Data to read or write */
+        uint32_t low_bits : 3;      /*!< TBD */
+        uint32_t high_bits : 3;     /*!< TBD */
+        uint32_t i2c_sel : 4;       /*!< TBD, select reg_i2c_slave_address[7:0] */
+        uint32_t unused : 1;        /*!< Unused */
+        uint32_t rw : 1;            /*!< Write (1) or read (0) */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_I2C) */
+    } i2c;                          /*!< Format of I2C instruction */
+
+    struct {
+        uint32_t wakeup : 1;        /*!< Set to 1 to wake up chip */
+        uint32_t unused : 24;       /*!< Unused */
+        uint32_t sub_opcode : 3;    /*!< Sub opcode (SUB_OPCODE_WAKEUP) */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_END) */
+    } end;                          /*!< Format of END instruction with wakeup */
+
+    struct {
+        uint32_t cycle_sel : 4;     /*!< Select which one of SARADC_ULP_CP_SLEEP_CYCx_REG to get the sleep duration from */
+        uint32_t unused : 21;       /*!< Unused */
+        uint32_t sub_opcode : 3;    /*!< Sub opcode (SUB_OPCODE_SLEEP) */
+        uint32_t opcode : 4;        /*!< Opcode (OPCODE_END) */
+    } sleep;                        /*!< Format of END instruction with sleep */
+
+    struct {
+        uint32_t label : 16;        /*!< Label number */
+        uint32_t unused : 8;        /*!< Unused */
+        uint32_t sub_opcode : 4;    /*!< SUB_OPCODE_MACRO_LABEL or SUB_OPCODE_MACRO_BRANCH */
+        uint32_t opcode: 4;         /*!< Opcode (OPCODE_MACRO) */
+    } macro;                        /*!< Format of tokens used by LABEL and BRANCH macros */
+
+} ulp_insn_t;
+
+/**
+ * Delay (nop) for a given number of cycles
+ */
+#define I_DELAY(cycles_) { .delay = {\
+    .opcode = OPCODE_DELAY, \
+    .unused = 0, \
+    .cycles = cycles_ } }
+
+/**
+ * Halt the coprocessor
+ */
+#define I_HALT() { .halt = {\
+    .unused = 0, \
+    .opcode = OPCODE_HALT } }
+
+
+/**
+ * Store value from register reg_val into RTC memory.
+ *
+ * The value is written to an offset calculated by adding value of
+ * reg_addr register and offset_ field (this offset is expressed in 32-bit words).
+ * 32 bits written to RTC memory are built as follows:
+ * - 5 MSBs are zero
+ * - next 11 bits hold the PC of current instruction, expressed in 32-bit words
+ * - next 16 bits hold the actual value to be written
+ *
+ * RTC_SLOW_MEM[addr + offset_] = { 5'b0, insn_PC[10:0], val[15:0] }
+ */
+#define I_ST(reg_val, reg_addr, offset_) { .st = { \
+    .dreg = reg_val, \
+    .sreg = reg_addr, \
+    .unused1 = 0, \
+    .offset = offset_, \
+    .unused2 = 0, \
+    .sub_opcode = SUB_OPCODE_ST, \
+    .opcode = OPCODE_ST } }
+
+
+/**
+ * Load value from RTC memory into reg_dest register.
+ *
+ * Loads 16 LSBs from RTC memory word given by the sum of value in reg_addr and
+ * value of offset_.
+ */
+#define I_LD(reg_dest, reg_addr, offset_) { .ld = { \
+    .dreg = reg_dest, \
+    .sreg = reg_addr, \
+    .unused1 = 0, \
+    .offset = offset_, \
+    .unused2 = 0, \
+    .opcode = OPCODE_LD } }
+
+
+/**
+ *  Branch relative if R0 less than immediate value.
+ *
+ *  pc_offset is expressed in words, and can be from -127 to 127
+ *  imm_value is a 16-bit value to compare R0 against
+ */
+#define I_BL(pc_offset, imm_value) { .b = { \
+    .imm = imm_value, \
+    .cmp = B_CMP_L, \
+    .offset = abs(pc_offset), \
+    .sign = (pc_offset >= 0) ? 0 : 1, \
+    .sub_opcode = SUB_OPCODE_B, \
+    .opcode = OPCODE_BRANCH } }
+
+/**
+ *  Branch relative if R0 greater or equal than immediate value.
+ *
+ *  pc_offset is expressed in words, and can be from -127 to 127
+ *  imm_value is a 16-bit value to compare R0 against
+ */
+#define I_BGE(pc_offset, imm_value) { .b = { \
+    .imm = imm_value, \
+    .cmp = B_CMP_GE, \
+    .offset = abs(pc_offset), \
+    .sign = (pc_offset >= 0) ? 0 : 1, \
+    .sub_opcode = SUB_OPCODE_B, \
+    .opcode = OPCODE_BRANCH } }
+
+/**
+ * Unconditional branch to absolute PC, address in register.
+ *
+ * reg_pc is the register which contains address to jump to.
+ * Address is expressed in 32-bit words.
+ */
+#define I_BXR(reg_pc) { .bx = { \
+    .dreg = reg_pc, \
+    .addr = 0, \
+    .unused = 0, \
+    .reg = 1, \
+    .type = BX_JUMP_TYPE_DIRECT, \
+    .sub_opcode = SUB_OPCODE_BX, \
+    .opcode = OPCODE_BRANCH } }
+
+/**
+ *  Unconditional branch to absolute PC, immediate address.
+ *
+ *  Address imm_pc is expressed in 32-bit words.
+ */
+#define I_BXI(imm_pc) { .bx = { \
+    .dreg = 0, \
+    .addr = imm_pc, \
+    .unused = 0, \
+    .reg = 0, \
+    .type = BX_JUMP_TYPE_DIRECT, \
+    .sub_opcode = SUB_OPCODE_BX, \
+    .opcode = OPCODE_BRANCH } }
+
+/**
+ * Branch to absolute PC if ALU result is zero, address in register.
+ *
+ * reg_pc is the register which contains address to jump to.
+ * Address is expressed in 32-bit words.
+ */
+#define I_BXZR(reg_pc) { .bx = { \
+    .dreg = reg_pc, \
+    .addr = 0, \
+    .unused = 0, \
+    .reg = 1, \
+    .type = BX_JUMP_TYPE_ZERO, \
+    .sub_opcode = SUB_OPCODE_BX, \
+    .opcode = OPCODE_BRANCH } }
+
+/**
+ * Branch to absolute PC if ALU result is zero, immediate address.
+ *
+ * Address imm_pc is expressed in 32-bit words.
+ */
+#define I_BXZI(imm_pc) { .bx = { \
+    .dreg = 0, \
+    .addr = imm_pc, \
+    .unused = 0, \
+    .reg = 0, \
+    .type = BX_JUMP_TYPE_ZERO, \
+    .sub_opcode = SUB_OPCODE_BX, \
+    .opcode = OPCODE_BRANCH } }
+
+/**
+ * Branch to absolute PC if ALU overflow, address in register
+ *
+ * reg_pc is the register which contains address to jump to.
+ * Address is expressed in 32-bit words.
+ */
+#define I_BXFR(reg_pc) { .bx = { \
+    .dreg = reg_pc, \
+    .addr = 0, \
+    .unused = 0, \
+    .reg = 1, \
+    .type = BX_JUMP_TYPE_OVF, \
+    .sub_opcode = SUB_OPCODE_BX, \
+    .opcode = OPCODE_BRANCH } }
+
+/**
+ * Branch to absolute PC if ALU overflow, immediate address
+ *
+ * Address imm_pc is expressed in 32-bit words.
+ */
+#define I_BXFI(imm_pc) { .bx = { \
+    .dreg = 0, \
+    .addr = imm_pc, \
+    .unused = 0, \
+    .reg = 0, \
+    .type = BX_JUMP_TYPE_OVF, \
+    .sub_opcode = SUB_OPCODE_BX, \
+    .opcode = OPCODE_BRANCH } }
+
+
+/**
+ * Addition: dest = src1 + src2
+ */
+#define I_ADDR(reg_dest, reg_src1, reg_src2) { .alu_reg = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src1, \
+    .treg = reg_src2, \
+    .unused = 0, \
+    .sel = ALU_SEL_ADD, \
+    .sub_opcode = SUB_OPCODE_ALU_REG, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Subtraction: dest = src1 - src2
+ */
+#define I_SUBR(reg_dest, reg_src1, reg_src2) { .alu_reg = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src1, \
+    .treg = reg_src2, \
+    .unused = 0, \
+    .sel = ALU_SEL_SUB, \
+    .sub_opcode = SUB_OPCODE_ALU_REG, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Logical AND: dest = src1 & src2
+ */
+#define I_ANDR(reg_dest, reg_src1, reg_src2) { .alu_reg = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src1, \
+    .treg = reg_src2, \
+    .unused = 0, \
+    .sel = ALU_SEL_AND, \
+    .sub_opcode = SUB_OPCODE_ALU_REG, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Logical OR: dest = src1 | src2
+ */
+#define I_ORR(reg_dest, reg_src1, reg_src2)  { .alu_reg = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src1, \
+    .treg = reg_src2, \
+    .unused = 0, \
+    .sel = ALU_SEL_OR, \
+    .sub_opcode = SUB_OPCODE_ALU_REG, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Copy: dest = src
+ */
+#define I_MOVR(reg_dest, reg_src) { .alu_reg = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .treg = 0, \
+    .unused = 0, \
+    .sel = ALU_SEL_MOV, \
+    .sub_opcode = SUB_OPCODE_ALU_REG, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Logical shift left: dest = src << shift
+ */
+#define I_LSHR(reg_dest, reg_src, reg_shift)  { .alu_reg = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .treg = reg_shift, \
+    .unused = 0, \
+    .sel = ALU_SEL_LSH, \
+    .sub_opcode = SUB_OPCODE_ALU_REG, \
+    .opcode = OPCODE_ALU } }
+
+
+/**
+ * Logical shift right: dest = src >> shift
+ */
+#define I_RSHR(reg_dest, reg_src, reg_shift)  { .alu_reg = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .treg = reg_shift, \
+    .unused = 0, \
+    .sel = ALU_SEL_RSH, \
+    .sub_opcode = SUB_OPCODE_ALU_REG, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Add register and an immediate value: dest = src1 + imm
+ */
+#define I_ADDI(reg_dest, reg_src, imm_) { .alu_imm = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .imm = imm_, \
+    .unused = 0, \
+    .sel = ALU_SEL_ADD, \
+    .sub_opcode = SUB_OPCODE_ALU_IMM, \
+    .opcode = OPCODE_ALU } }
+
+
+/**
+ *  Subtract register and an immediate value: dest = src - imm
+ */
+#define I_SUBI(reg_dest, reg_src, imm_) { .alu_imm = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .imm = imm_, \
+    .unused = 0, \
+    .sel = ALU_SEL_SUB, \
+    .sub_opcode = SUB_OPCODE_ALU_IMM, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Logical AND register and an immediate value: dest = src & imm
+ */
+#define I_ANDI(reg_dest, reg_src, imm_) { .alu_imm = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .imm = reg_imm_, \
+    .unused = 0, \
+    .sel = ALU_SEL_AND, \
+    .sub_opcode = SUB_OPCODE_ALU_IMM, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Logical OR register and an immediate value: dest = src | imm
+ */
+#define I_ORI(reg_dest, reg_src, imm_) { .alu_imm = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .imm = imm_, \
+    .unused = 0, \
+    .sel = ALU_SEL_OR, \
+    .sub_opcode = SUB_OPCODE_ALU_IMM, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Copy an immediate value into register: dest = imm
+ */
+#define I_MOVI(reg_dest, imm_) { .alu_imm = { \
+    .dreg = reg_dest, \
+    .sreg = 0, \
+    .imm = imm_, \
+    .unused = 0, \
+    .sel = ALU_SEL_MOV, \
+    .sub_opcode = SUB_OPCODE_ALU_IMM, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Logical shift left register value by an immediate: dest = src << imm
+ */
+#define I_LSHI(reg_dest, reg_src, imm_) { .alu_imm = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .imm = imm_, \
+    .unused = 0, \
+    .sel = ALU_SEL_LSH, \
+    .sub_opcode = SUB_OPCODE_ALU_IMM, \
+    .opcode = OPCODE_ALU } }
+
+
+/**
+ * Logical shift right register value by an immediate: dest = val >> imm
+ */
+#define I_RSHI(reg_dest, reg_src, imm_) { .alu_imm = { \
+    .dreg = reg_dest, \
+    .sreg = reg_src, \
+    .imm = imm_, \
+    .unused = 0, \
+    .sel = ALU_SEL_RSH, \
+    .sub_opcode = SUB_OPCODE_ALU_IMM, \
+    .opcode = OPCODE_ALU } }
+
+/**
+ * Define a label with number label_num.
+ *
+ * This is a macro which doesn't generate a real instruction.
+ * The token generated by this macro is removed by ulp_process_macros_and_load
+ * function. Label defined using this macro can be used in branch macros defined
+ * below.
+ */
+#define M_LABEL(label_num) { .macro = { \
+    .label = label_num, \
+    .unused = 0, \
+    .sub_opcode = SUB_OPCODE_MACRO_LABEL, \
+    .opcode = OPCODE_MACRO } }
+
+/**
+ * Token macro used by M_B and M_BX macros. Not to be used directly.
+ */
+#define M_BRANCH(label_num) { .macro = { \
+    .label = label_num, \
+    .unused = 0, \
+    .sub_opcode = SUB_OPCODE_MACRO_BRANCH, \
+    .opcode = OPCODE_MACRO } }
+
+/**
+ * Macro: branch to label label_num if R0 is less than immediate value.
+ *
+ * This macro generates two ulp_insn_t values separated by a comma, and should
+ * be used when defining contents of ulp_insn_t arrays. First value is not a
+ * real instruction; it is a token which is removed by ulp_process_macros_and_load
+ * function.
+ */
+#define M_BL(label_num, imm_value) \
+    M_BRANCH(label_num), \
+    I_BL(0, imm_value)
+
+/**
+ * Macro: branch to label label_num if R0 is greater or equal than immediate value
+ *
+ * This macro generates two ulp_insn_t values separated by a comma, and should
+ * be used when defining contents of ulp_insn_t arrays. First value is not a
+ * real instruction; it is a token which is removed by ulp_process_macros_and_load
+ * function.
+ */
+#define M_BGE(label_num, imm_value) \
+    M_BRANCH(label_num), \
+    I_BGE(0, imm_value)
+
+/**
+ * Macro: unconditional branch to label
+ *
+ * This macro generates two ulp_insn_t values separated by a comma, and should
+ * be used when defining contents of ulp_insn_t arrays. First value is not a
+ * real instruction; it is a token which is removed by ulp_process_macros_and_load
+ * function.
+ */
+#define M_BX(label_num) \
+    M_BRANCH(label_num), \
+    I_BXI(0)
+
+/**
+ * Macro: branch to label if ALU result is zero
+ *
+ * This macro generates two ulp_insn_t values separated by a comma, and should
+ * be used when defining contents of ulp_insn_t arrays. First value is not a
+ * real instruction; it is a token which is removed by ulp_process_macros_and_load
+ * function.
+ */
+#define M_BXZ(label_num) \
+    M_BRANCH(label_num), \
+    I_BXZI(0)
+
+/**
+ * Macro: branch to label if ALU overflow
+ *
+ * This macro generates two ulp_insn_t values separated by a comma, and should
+ * be used when defining contents of ulp_insn_t arrays. First value is not a
+ * real instruction; it is a token which is removed by ulp_process_macros_and_load
+ * function.
+ */
+#define M_BXF(label_num) \
+    M_BRANCH(label_num), \
+    I_BXFI(0)
+
+
+
+#define RTC_SLOW_MEM ((uint32_t*) 0x50000000)       /*!< RTC slow memory, 8k size */
+
+/**
+ * @brief Resolve all macro references in a program and load it into RTC memory
+ * @param load_addr  address where the program should be loaded, expressed in 32-bit words
+ * @param program  ulp_insn_t array with the program
+ * @param psize  size of the program, expressed in 32-bit words
+ * @return
+ *      - ESP_OK on success
+ *      - ESP_ERR_NO_MEM if auxiliary temporary structure can not be allocated
+ *      - one of ESP_ERR_ULP_xxx if program is not valid or can not be loaded
+ */
+esp_err_t ulp_process_macros_and_load(uint32_t load_addr, const ulp_insn_t* program, size_t* psize);
+
+/**
+ * @brief Run the program loaded into RTC memory
+ * @param entry_point entry point, expressed in 32-bit words
+ * @return  ESP_OK on success
+ */
+esp_err_t ulp_run(uint32_t entry_point);
+
+
+#ifdef __cplusplus
+}
+#endif

+ 94 - 0
components/ulp/test/test_ulp.c

@@ -0,0 +1,94 @@
+// Copyright 2010-2016 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdio.h>
+#include <string.h>
+#include <freertos/FreeRTOS.h>
+#include <freertos/task.h>
+#include <freertos/semphr.h>
+
+#include <unity.h>
+#include "esp_attr.h"
+#include "esp_err.h"
+#include "esp_log.h"
+
+#include "esp32/ulp.h"
+
+#include "soc/soc.h"
+#include "soc/rtc_cntl_reg.h"
+#include "soc/saradc_reg.h"
+
+#include "sdkconfig.h"
+
+static void hexdump(const uint32_t* src, size_t count) {
+    for (size_t i = 0; i < count; ++i) {
+        printf("%08x ", *src);
+        ++src;
+        if ((i + 1) % 4 == 0) {
+            printf("\n");
+        }
+    }
+}
+
+TEST_CASE("ulp add test", "[ulp]")
+{
+    memset(RTC_SLOW_MEM, 0, CONFIG_ULP_COPROC_RESERVE_MEM);
+    const ulp_insn_t program[] = {
+        I_MOVI(R3, 16),
+        I_LD(R0, R3, 0),
+        I_LD(R1, R3, 1),
+        I_ADDR(R2, R0, R1),
+        I_ST(R2, R3, 2),
+        I_HALT()
+    };
+    RTC_SLOW_MEM[16] = 10;
+    RTC_SLOW_MEM[17] = 11;
+    size_t size = sizeof(program)/sizeof(ulp_insn_t);
+    TEST_ASSERT_EQUAL(ESP_OK, ulp_process_macros_and_load(0, program, &size));
+    TEST_ASSERT_EQUAL(ESP_OK, ulp_run(0));
+    ets_delay_us(1000);
+    hexdump(RTC_SLOW_MEM, CONFIG_ULP_COPROC_RESERVE_MEM / 4);
+    TEST_ASSERT_EQUAL(10 + 11, RTC_SLOW_MEM[18] & 0xffff);
+}
+
+TEST_CASE("ulp branch test", "[ulp]")
+{
+    assert(CONFIG_ULP_COPROC_RESERVE_MEM >= 260 && "this test needs ULP_COPROC_RESERVE_MEM option set in menuconfig");
+    memset(RTC_SLOW_MEM, 0, CONFIG_ULP_COPROC_RESERVE_MEM);
+    const ulp_insn_t program[] = {
+        I_MOVI(R0, 34),     // r0 = dst
+        M_LABEL(1),
+        I_MOVI(R1, 32),
+        I_LD(R1, R1, 0),    // r1 = mem[33]
+        I_MOVI(R2, 33),
+        I_LD(R2, R2, 0),    // r2 = mem[34]
+        I_SUBR(R3, R1, R2), // r3 = r1 - r2
+        I_ST(R3, R0, 0),    // dst[0] = r3
+        I_ADDI(R0, R0, 1),
+        M_BL(1, 64),
+        I_HALT(),
+    };
+    RTC_SLOW_MEM[32] = 42;
+    RTC_SLOW_MEM[33] = 18;
+    hexdump(RTC_SLOW_MEM, CONFIG_ULP_COPROC_RESERVE_MEM / 4);
+    size_t size = sizeof(program)/sizeof(ulp_insn_t);
+    ulp_process_macros_and_load(0, program, &size);
+    ulp_run(0);
+    printf("\n\n");
+    hexdump(RTC_SLOW_MEM, CONFIG_ULP_COPROC_RESERVE_MEM / 4);
+    for (int i = 34; i < 64; ++i) {
+        TEST_ASSERT_EQUAL(42 - 18, RTC_SLOW_MEM[i] & 0xffff);
+    }
+    TEST_ASSERT_EQUAL(0, RTC_SLOW_MEM[64]);
+}

+ 270 - 0
components/ulp/ulp.c

@@ -0,0 +1,270 @@
+// Copyright 2010-2016 Espressif Systems (Shanghai) PTE LTD
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+
+#include "esp_attr.h"
+#include "esp_err.h"
+#include "esp_log.h"
+#include "esp32/ulp.h"
+
+#include "soc/soc.h"
+#include "soc/rtc_cntl_reg.h"
+#include "soc/saradc_reg.h"
+
+#include "sdkconfig.h"
+
+static const char* TAG = "ulp";
+
+typedef struct {
+    uint32_t label : 16;
+    uint32_t addr : 11;
+    uint32_t unused : 1;
+    uint32_t type : 4;
+} reloc_info_t;
+
+#define RELOC_TYPE_LABEL   0
+#define RELOC_TYPE_BRANCH  1
+
+/* This record means: there is a label at address
+ * insn_addr, with number label_num.
+ */
+#define RELOC_INFO_LABEL(label_num, insn_addr) (reloc_info_t) { \
+    .label = label_num, \
+    .addr = insn_addr, \
+    .unused = 0, \
+    .type = RELOC_TYPE_LABEL }
+
+/* This record means: there is a branch instruction at
+ * insn_addr, it needs to be changed to point to address
+ * of label label_num.
+ */
+#define RELOC_INFO_BRANCH(label_num, insn_addr) (reloc_info_t) { \
+    .label = label_num, \
+    .addr = insn_addr, \
+    .unused = 0, \
+    .type = RELOC_TYPE_BRANCH }
+
+
+/* Processing branch and label macros involves four steps:
+ *
+ * 1. Iterate over program and count all instructions
+ *    with "macro" opcode. Allocate relocations array
+ *    with number of entries equal to number of macro
+ *    instructions.
+ *
+ * 2. Remove all fake instructions with "macro" opcode
+ *    and record their locations into relocations array.
+ *    Removal is done using two pointers. Instructions
+ *    are read from read_ptr, and written to write_ptr.
+ *    When a macro instruction is encountered,
+ *    its contents are recorded into the appropriate
+ *    table, and then read_ptr is advanced again.
+ *    When a real instruction is encountered, it is
+ *    read via read_ptr and written to write_ptr.
+ *    In the end, all macro instructions are removed,
+ *    size of the program (expressed in words) is
+ *    reduced by the total number of macro instructions
+ *    which were present.
+ *
+ * 3. Sort relocations array by label number, and then
+ *    by type ("label" or "branch") if label numbers
+ *    match. This is done to simplify lookup on the next
+ *    step.
+ *
+ * 4. Iterate over entries of relocations table.
+ *    For each label number, label entry comes first
+ *    because the array was sorted at the previous step.
+ *    Label address is recorded, and all subsequent
+ *    "branch" entries which point to the same label number
+ *    are processed. For each branch entry, correct offset
+ *    or absolute address is calculated, depending on branch
+ *    type, and written into the appropriate field of
+ *    the instruction.
+ *
+ */
+
+static esp_err_t do_single_reloc(ulp_insn_t* program, uint32_t load_addr,
+        reloc_info_t label_info, reloc_info_t branch_info)
+{
+    size_t insn_offset = branch_info.addr - load_addr;
+    ulp_insn_t* insn = &program[insn_offset];
+    // B and BX have the same layout of opcode/sub_opcode fields,
+    // and share the same opcode
+    assert(insn->b.opcode == OPCODE_BRANCH
+            && "branch macro was applied to a non-branch instruction");
+    switch (insn->b.sub_opcode) {
+        case SUB_OPCODE_B: {
+            int32_t offset = ((int32_t) label_info.addr) - ((int32_t) branch_info.addr);
+            uint32_t abs_offset = abs(offset);
+            uint32_t sign = (offset >= 0) ? 0 : 1;
+            if (abs_offset > 127) {
+                ESP_LOGW(TAG, "target out of range: branch from %x to %x",
+                        branch_info.addr, label_info.addr);
+                return ESP_ERR_ULP_BRANCH_OUT_OF_RANGE;
+            }
+            insn->b.offset = abs_offset;
+            insn->b.sign = sign;
+            break;
+        }
+        case SUB_OPCODE_BX: {
+            assert(insn->bx.reg == 0 &&
+                    "relocation applied to a jump with offset in register");
+            insn->bx.addr = label_info.addr;
+            break;
+        }
+        default:
+            assert(false && "unexpected sub-opcode");
+    }
+    return ESP_OK;
+}
+
+esp_err_t ulp_process_macros_and_load(uint32_t load_addr, const ulp_insn_t* program, size_t* psize)
+{
+    const ulp_insn_t* read_ptr = program;
+    const ulp_insn_t* end = program + *psize;
+    size_t macro_count = 0;
+    // step 1: calculate number of macros
+    while (read_ptr < end) {
+        ulp_insn_t r_insn = *read_ptr;
+        if (r_insn.macro.opcode == OPCODE_MACRO) {
+            ++macro_count;
+        }
+        ++read_ptr;
+    }
+    size_t real_program_size = *psize - macro_count;
+    const size_t ulp_mem_end = CONFIG_ULP_COPROC_RESERVE_MEM / sizeof(ulp_insn_t);
+    if (load_addr > ulp_mem_end) {
+        ESP_LOGW(TAG, "invalid load address %x, max is %x",
+                load_addr, ulp_mem_end);
+        return ESP_ERR_ULP_INVALID_LOAD_ADDR;
+    }
+    if (real_program_size + load_addr > ulp_mem_end) {
+        ESP_LOGE(TAG, "program too big: %d words, max is %d words",
+                real_program_size, ulp_mem_end);
+        return ESP_ERR_ULP_SIZE_TOO_BIG;
+    }
+    // If no macros found, copy the program and return.
+    if (macro_count == 0) {
+        memcpy(((ulp_insn_t*) RTC_SLOW_MEM) + load_addr, program, *psize * sizeof(ulp_insn_t));
+        return ESP_OK;
+    }
+    reloc_info_t* reloc_info =
+            (reloc_info_t*) malloc(sizeof(reloc_info_t) * macro_count);
+    if (reloc_info == NULL) {
+        return ESP_ERR_NO_MEM;
+    }
+
+    // step 2: record macros into reloc_info array
+    // and remove them from then program
+    read_ptr = program;
+    ulp_insn_t* output_program = ((ulp_insn_t*) RTC_SLOW_MEM) + load_addr;
+    ulp_insn_t* write_ptr = output_program;
+    uint32_t cur_insn_addr = load_addr;
+    reloc_info_t* cur_reloc = reloc_info;
+    while (read_ptr < end) {
+        ulp_insn_t r_insn = *read_ptr;
+        if (r_insn.macro.opcode == OPCODE_MACRO) {
+            switch(r_insn.macro.sub_opcode) {
+                case SUB_OPCODE_MACRO_LABEL:
+                    *cur_reloc = RELOC_INFO_LABEL(r_insn.macro.label,
+                            cur_insn_addr);
+                    break;
+                case SUB_OPCODE_MACRO_BRANCH:
+                    *cur_reloc = RELOC_INFO_BRANCH(r_insn.macro.label,
+                            cur_insn_addr);
+                    break;
+                default:
+                    assert(0 && "invalid sub_opcode for macro insn");
+            }
+            ++read_ptr;
+            assert(read_ptr != end && "program can not end with macro insn");
+            ++cur_reloc;
+        } else {
+            // normal instruction (not a macro)
+            *write_ptr = *read_ptr;
+            ++read_ptr;
+            ++write_ptr;
+            ++cur_insn_addr;
+        }
+    }
+
+    // step 3: sort relocations array
+    int reloc_sort_func(const void* p_lhs, const void* p_rhs) {
+        const reloc_info_t lhs = *(const reloc_info_t*) p_lhs;
+        const reloc_info_t rhs = *(const reloc_info_t*) p_rhs;
+        if (lhs.label < rhs.label) {
+            return -1;
+        } else if (lhs.label > rhs.label) {
+            return 1;
+        }
+        // label numbers are equal
+        if (lhs.type < rhs.type) {
+            return -1;
+        } else if (lhs.type > rhs.type) {
+            return 1;
+        }
+
+        // both label number and type are equal
+        return 0;
+    }
+    qsort(reloc_info, macro_count, sizeof(reloc_info_t),
+            reloc_sort_func);
+
+    // step 4: walk relocations array and fix instructions
+    reloc_info_t* reloc_end = reloc_info + macro_count;
+    cur_reloc = reloc_info;
+    while(cur_reloc < reloc_end) {
+        reloc_info_t label_info = *cur_reloc;
+        assert(label_info.type == RELOC_TYPE_LABEL);
+        ++cur_reloc;
+        while (cur_reloc < reloc_end) {
+            if (cur_reloc->type == RELOC_TYPE_LABEL) {
+                if(cur_reloc->label == label_info.label) {
+                    ESP_LOGE(TAG, "duplicate label definition: %d",
+                            label_info.label);
+                    free(reloc_info);
+                    return ESP_ERR_ULP_DUPLICATE_LABEL;
+                }
+                break;
+            }
+            if (cur_reloc->label != label_info.label) {
+                ESP_LOGE(TAG, "branch to an inexistent label: %d",
+                        cur_reloc->label);
+                free(reloc_info);
+                return ESP_ERR_ULP_UNDEFINED_LABEL;
+            }
+            esp_err_t rc = do_single_reloc(output_program, load_addr,
+                    label_info, *cur_reloc);
+            if (rc != ESP_OK) {
+                free(reloc_info);
+                return rc;
+            }
+            ++cur_reloc;
+        }
+    }
+    free(reloc_info);
+    *psize = real_program_size;
+    return ESP_OK;
+}
+
+esp_err_t ulp_run(uint32_t entry_point)
+{
+    SET_PERI_REG_MASK(SARADC_SAR_START_FORCE_REG, SARADC_ULP_CP_FORCE_START_TOP_M);
+    SET_PERI_REG_BITS(SARADC_SAR_START_FORCE_REG, SARADC_PC_INIT_V, entry_point, SARADC_PC_INIT_S);
+    SET_PERI_REG_MASK(SARADC_SAR_START_FORCE_REG, SARADC_ULP_CP_START_TOP_M);
+    return ESP_OK;
+}

+ 2 - 1
docs/Doxyfile

@@ -26,7 +26,8 @@ INPUT = ../components/esp32/include/esp_wifi.h \
 	../components/esp32/include/esp_int_wdt.h \
 	../components/esp32/include/esp_task_wdt.h \
 	../components/app_update/include/esp_ota_ops.h \
-	../components/ethernet/include/esp_eth.h
+	../components/ethernet/include/esp_eth.h \
+	../components/ulp/include/esp32/ulp.h
 
 ## Get warnings for functions that have no documentation for their parameters or return value 
 ##

+ 1 - 0
docs/api/ulp.rst

@@ -0,0 +1 @@
+.. include:: ../../components/ulp/README.rst

+ 1 - 0
docs/index.rst

@@ -36,6 +36,7 @@ Contents:
    build_system
    openocd
    Secure Boot <security/secure-boot>
+   ULP coprocessor <api/ulp.rst>
 
 .. API Reference
    ..

+ 3 - 2
tools/unit-test-app/sdkconfig

@@ -93,8 +93,8 @@ CONFIG_SYSTEM_EVENT_QUEUE_SIZE=32
 CONFIG_SYSTEM_EVENT_TASK_STACK_SIZE=2048
 CONFIG_MAIN_TASK_STACK_SIZE=4096
 CONFIG_NEWLIB_STDOUT_ADDCR=y
-# CONFIG_ULP_COPROC_ENABLED is not set
-CONFIG_ULP_COPROC_RESERVE_MEM=0
+CONFIG_ULP_COPROC_ENABLED=y
+CONFIG_ULP_COPROC_RESERVE_MEM=512
 # CONFIG_ESP32_PANIC_PRINT_HALT is not set
 CONFIG_ESP32_PANIC_PRINT_REBOOT=y
 # CONFIG_ESP32_PANIC_SILENT_REBOOT is not set
@@ -112,6 +112,7 @@ CONFIG_ESP32_RTC_CLOCK_SOURCE_INTERNAL_RC=y
 CONFIG_ESP32_PHY_AUTO_INIT=y
 # CONFIG_ESP32_PHY_INIT_DATA_IN_PARTITION is not set
 CONFIG_ESP32_PHY_MAX_TX_POWER=20
+# CONFIG_ETHERNET is not set
 
 #
 # FreeRTOS