Explorar el Código

Merge branch 'main' into gitbook

Wenyong Huang hace 2 años
padre
commit
421ffc793c
Se han modificado 100 ficheros con 3879 adiciones y 881 borrados
  1. 6 5
      .devcontainer/Dockerfile
  2. 49 15
      .github/workflows/build_wamr_lldb.yml
  3. 2 2
      .github/workflows/build_wamr_vscode_ext.yml
  4. 123 27
      .github/workflows/compilation_on_android_ubuntu.yml
  5. 23 28
      .github/workflows/nightly_run.yml
  6. 1 0
      .gitignore
  7. 1 0
      README.md
  8. 4 0
      build-scripts/config_common.cmake
  9. 2 2
      build-scripts/runtime_lib.cmake
  10. 7 0
      core/config.h
  11. 5 1
      core/deps/install_tensorflow.sh
  12. 15 0
      core/iwasm/aot/aot_loader.c
  13. 8 21
      core/iwasm/aot/aot_runtime.c
  14. 0 3
      core/iwasm/aot/aot_runtime.h
  15. 44 11
      core/iwasm/aot/arch/aot_reloc_riscv.c
  16. 26 4
      core/iwasm/aot/arch/aot_reloc_x86_32.c
  17. 19 0
      core/iwasm/compilation/aot.h
  18. 95 2
      core/iwasm/compilation/aot_compiler.c
  19. 337 1
      core/iwasm/compilation/aot_emit_aot_file.c
  20. 33 156
      core/iwasm/compilation/aot_emit_function.c
  21. 1 0
      core/iwasm/compilation/aot_emit_function.h
  22. 686 87
      core/iwasm/compilation/aot_llvm.c
  23. 21 5
      core/iwasm/compilation/aot_llvm.h
  24. 12 2
      core/iwasm/compilation/aot_llvm_extra.cpp
  25. 44 0
      core/iwasm/compilation/aot_llvm_extra2.cpp
  26. 17 0
      core/iwasm/compilation/aot_llvm_extra2.h
  27. 31 5
      core/iwasm/compilation/aot_orc_extra.cpp
  28. 5 0
      core/iwasm/compilation/aot_orc_extra.h
  29. 145 0
      core/iwasm/compilation/aot_orc_extra2.cpp
  30. 14 11
      core/iwasm/compilation/debug/dwarf_extractor.cpp
  31. 10 8
      core/iwasm/compilation/debug/dwarf_extractor.h
  32. 25 20
      core/iwasm/interpreter/wasm_loader.c
  33. 8 19
      core/iwasm/interpreter/wasm_runtime.c
  34. 0 4
      core/iwasm/interpreter/wasm_runtime.h
  35. 28 0
      core/iwasm/libraries/libc-uvwasi/FindLIBUV.cmake
  36. 25 0
      core/iwasm/libraries/libc-uvwasi/FindUVWASI.cmake
  37. 42 25
      core/iwasm/libraries/libc-uvwasi/libc_uvwasi.cmake
  38. 2 0
      core/iwasm/libraries/libc-wasi/libc_wasi_wrapper.c
  39. 8 4
      core/iwasm/libraries/wasi-nn/README.md
  40. 23 26
      core/iwasm/libraries/wasi-nn/cmake/Findtensorflow_lite.cmake
  41. 46 0
      core/iwasm/libraries/wasi-nn/cmake/iwasm_helper.cmake
  42. 22 0
      core/iwasm/libraries/wasi-nn/cmake/wasi_nn.cmake
  43. 58 0
      core/iwasm/libraries/wasi-nn/external/CMakeLists.txt
  44. 13 0
      core/iwasm/libraries/wasi-nn/external/README.md
  45. 0 0
      core/iwasm/libraries/wasi-nn/include/wasi_nn.h
  46. 3 0
      core/iwasm/libraries/wasi-nn/include/wasi_nn_types.h
  47. 133 63
      core/iwasm/libraries/wasi-nn/src/wasi_nn.c
  48. 3 8
      core/iwasm/libraries/wasi-nn/src/wasi_nn_private.h
  49. 1 2
      core/iwasm/libraries/wasi-nn/src/wasi_nn_tensorflowlite.cpp
  50. 0 173
      core/iwasm/libraries/wasi-nn/test/CMakeLists.txt
  51. 5 2
      core/iwasm/libraries/wasi-nn/test/Dockerfile.compile
  52. 14 5
      core/iwasm/libraries/wasi-nn/test/Dockerfile.cpu
  53. 21 11
      core/iwasm/libraries/wasi-nn/test/Dockerfile.nvidia-gpu
  54. 52 38
      core/iwasm/libraries/wasi-nn/test/Dockerfile.vx-delegate
  55. 1 1
      core/iwasm/libraries/wasi-nn/test/build.sh
  56. 8 11
      core/iwasm/libraries/wasi-nn/test/test_tensorflow.c
  57. 0 22
      core/iwasm/libraries/wasi-nn/wasi_nn.cmake
  58. 1 0
      core/shared/mem-alloc/ems/ems_alloc.c
  59. 4 1
      core/shared/mem-alloc/ems/ems_gc_internal.h
  60. 1 0
      core/shared/platform/android/platform_internal.h
  61. 2 0
      core/shared/platform/linux/platform_internal.h
  62. 8 1
      doc/build_wamr.md
  63. 74 0
      doc/perf_tune.md
  64. 1 1
      product-mini/platforms/darwin/CMakeLists.txt
  65. 1 1
      product-mini/platforms/freebsd/CMakeLists.txt
  66. 1 1
      product-mini/platforms/ios/CMakeLists.txt
  67. 17 0
      product-mini/platforms/linux-sgx/CMakeLists.txt
  68. 81 1
      product-mini/platforms/linux-sgx/enclave-sample/App/App.cpp
  69. 42 0
      product-mini/platforms/linux-sgx/enclave-sample/Enclave/Enclave.cpp
  70. 3 2
      product-mini/platforms/linux-sgx/enclave-sample/Makefile
  71. 3 3
      product-mini/platforms/linux/CMakeLists.txt
  72. 22 0
      samples/mem_allocator/CMakeLists.txt
  73. 58 0
      samples/mem_allocator/main.c
  74. 1 1
      samples/wasm-c-api/CMakeLists.txt
  75. 6 1
      test-tools/wamr-ide/VSCode-Extension/.gitignore
  76. 1 0
      test-tools/wamr-ide/VSCode-Extension/.npmrc
  77. 11 0
      test-tools/wamr-ide/VSCode-Extension/.vscode/launch.json
  78. 8 0
      test-tools/wamr-ide/VSCode-Extension/.vscodeignore
  79. 747 0
      test-tools/wamr-ide/VSCode-Extension/formatters/rust.py
  80. 10 5
      test-tools/wamr-ide/VSCode-Extension/package.json
  81. 2 0
      test-tools/wamr-ide/VSCode-Extension/resource/test/build.sh
  82. 35 0
      test-tools/wamr-ide/VSCode-Extension/resource/test/test.rs
  83. 29 5
      test-tools/wamr-ide/VSCode-Extension/src/debugConfigurationProvider.ts
  84. 6 6
      test-tools/wamr-ide/VSCode-Extension/src/extension.ts
  85. 33 0
      test-tools/wamr-ide/VSCode-Extension/src/test/runTest.ts
  86. 183 0
      test-tools/wamr-ide/VSCode-Extension/src/test/suite/extension.test.ts
  87. 42 0
      test-tools/wamr-ide/VSCode-Extension/src/test/suite/index.ts
  88. 43 0
      test-tools/wamr-ide/VSCode-Extension/src/test/suite/utils.ts
  89. 2 2
      test-tools/wamr-ide/VSCode-Extension/src/utilities/dockerUtilities.ts
  90. 15 10
      test-tools/wamr-ide/VSCode-Extension/src/utilities/lldbUtilities.ts
  91. 4 0
      tests/benchmarks/coremark/README.md
  92. 7 2
      tests/benchmarks/coremark/test_pgo.sh
  93. 7 2
      tests/benchmarks/dhrystone/test_pgo.sh
  94. 4 0
      tests/benchmarks/jetstream/README.md
  95. 7 2
      tests/benchmarks/jetstream/test_pgo.sh
  96. 6 0
      tests/benchmarks/libsodium/README.md
  97. 7 2
      tests/benchmarks/libsodium/test_pgo.sh
  98. 6 0
      tests/benchmarks/polybench/README.md
  99. 7 2
      tests/benchmarks/polybench/test_pgo.sh
  100. 4 0
      tests/benchmarks/sightglass/README.md

+ 6 - 5
.devcontainer/Dockerfile

@@ -81,7 +81,7 @@ RUN mkdir /opt/bazelisk \
 #
 # install clang+llvm
 ARG LLVM_VER=14
-RUN apt-get purge -y clang-10 llvm-10 && apt autoremove -y
+RUN apt-get purge -y clang-10 llvm-10 && apt-get autoremove -y
 WORKDIR /etc/apt/apt.conf.d
 RUN touch 99verfiy-peer.conf \
   && echo "Acquire { https::Verify-Peer false }" > 99verfiy-peer.conf
@@ -110,14 +110,15 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip \
 
 #
 # Install github-cli. It doens't work as a feature of devcontainer.json
-RUN cd /tmp \
-  && wget https://github.com/cli/cli/releases/download/v2.20.2/gh_2.20.2_linux_amd64.deb \
+WORKDIR /tmp
+RUN wget -q https://github.com/cli/cli/releases/download/v2.20.2/gh_2.20.2_linux_amd64.deb \
   && dpkg -i gh_2.20.2_linux_amd64.deb
 
 #
 # Install NodeJS
-RUN curl -fsSL https://deb.nodesource.com/setup_19.x | bash -
-RUN apt-get install -y nodejs
+RUN wget -qO- https://deb.nodesource.com/setup_19.x | bash -
+# hadolint ignore=DL3008
+RUN apt-get install -y nodejs --no-install-recommends
 
 # set path
 ENV PATH="/opt/bazelisk:/usr/lib/llvm-${LLVM_VER}/bin:${PATH}"

+ 49 - 15
.github/workflows/build_wamr_lldb.yml

@@ -36,6 +36,11 @@ jobs:
     needs: try_reuse
     if: needs.try_reuse.outputs.result != 'hit'
     runs-on: ${{ inputs.runner }}
+
+    env:
+      PYTHON_VERSION: '3.10'
+      PYTHON_UBUNTU_STANDALONE_BUILD: https://github.com/indygreg/python-build-standalone/releases/download/20230507/cpython-3.10.11+20230507-x86_64-unknown-linux-gnu-install_only.tar.gz
+      PYTHON_MACOS_STANDALONE_BUILD: https://github.com/indygreg/python-build-standalone/releases/download/20230507/cpython-3.10.11+20230507-x86_64-apple-darwin-install_only.tar.gz
     steps:
       - uses: actions/checkout@v3
 
@@ -63,10 +68,12 @@ jobs:
       - name: install utils macos
         if: steps.lldb_build_cache.outputs.cache-hit != 'true' && contains(inputs.runner, 'macos')
         run: |
-          brew install swig cmake ninja libedit
+          brew remove swig
+          brew install swig@3 cmake ninja libedit
+          brew link --overwrite swig@3
           sudo rm -rf /Library/Developer/CommandLineTools
 
-      - name: intsall utils ubuntu
+      - name: install utils ubuntu
         if: steps.lldb_build_cache.outputs.cache-hit != 'true' && contains(inputs.runner, 'ubuntu')
         run: sudo apt update && sudo apt-get install -y lld ninja-build
 
@@ -88,6 +95,20 @@ jobs:
           git apply ../../../build-scripts/lldb-wasm.patch
         working-directory: core/deps/llvm-project
 
+      - name: get stand-alone python ubuntu
+        if: steps.lldb_build_cache.outputs.cache-hit != 'true' && contains(inputs.runner, 'ubuntu')
+        run: |
+          wget ${{ env.PYTHON_UBUNTU_STANDALONE_BUILD }} -O python.tar.gz
+          tar -xvf python.tar.gz
+        working-directory: core/deps
+
+      - name: get stand-alone python macos
+        if: steps.lldb_build_cache.outputs.cache-hit != 'true' && contains(inputs.runner, 'macos')
+        run: |
+          wget ${{ env.PYTHON_MACOS_STANDALONE_BUILD }} -O python.tar.gz
+          tar -xvf python.tar.gz
+        working-directory: core/deps
+
       - name: build lldb ubuntu
         if: steps.lldb_build_cache.outputs.cache-hit != 'true' && contains(inputs.runner, 'ubuntu')
         run: |
@@ -102,17 +123,21 @@ jobs:
             -DLLVM_TARGETS_TO_BUILD:STRING="X86;WebAssembly" \
             -DLLVM_BUILD_BENCHMARKS:BOOL=OFF \
             -DLLVM_BUILD_DOCS:BOOL=OFF \
-            -DLLVM_BUILD_EXAMPLES:BOOL=OFF  \
+            -DLLVM_BUILD_EXAMPLES:BOOL=OFF \
             -DLLVM_BUILD_LLVM_DYLIB:BOOL=OFF \
-            -DLLVM_BUILD_TESTS:BOOL=OFF  \
-            -DLLVM_INCLUDE_BENCHMARKS:BOOL=OFF  \
+            -DLLVM_BUILD_TESTS:BOOL=OFF \
+            -DLLVM_INCLUDE_BENCHMARKS:BOOL=OFF \
             -DLLVM_INCLUDE_DOCS:BOOL=OFF \
             -DLLVM_INCLUDE_EXAMPLES:BOOL=OFF \
             -DLLVM_INCLUDE_TESTS:BOOL=OFF \
             -DLLVM_ENABLE_BINDINGS:BOOL=OFF \
             -DLLVM_ENABLE_LIBXML2:BOOL=ON \
-            -DLLDB_ENABLE_PYTHON:BOOL=OFF \
-            -DLLVM_ENABLE_LLD:BOOL=ON
+            -DLLVM_ENABLE_LLD:BOOL=ON \
+            -DLLDB_ENABLE_PYTHON:BOOL=ON \
+            -DLLDB_EMBED_PYTHON_HOME=ON \
+            -DLLDB_PYTHON_HOME=.. \
+            -DLLDB_PYTHON_RELATIVE_PATH=lib/lldb-python \
+            -DPython3_EXECUTABLE="$(pwd)/../python/bin/python${{ env.PYTHON_VERSION }}"
           cmake --build build --target lldb install --parallel $(nproc)
         working-directory: core/deps/llvm-project
 
@@ -130,20 +155,21 @@ jobs:
             -DLLVM_TARGETS_TO_BUILD:STRING="X86;WebAssembly" \
             -DLLVM_BUILD_BENCHMARKS:BOOL=OFF \
             -DLLVM_BUILD_DOCS:BOOL=OFF \
-            -DLLVM_BUILD_EXAMPLES:BOOL=OFF  \
+            -DLLVM_BUILD_EXAMPLES:BOOL=OFF \
             -DLLVM_BUILD_LLVM_DYLIB:BOOL=OFF \
-            -DLLVM_BUILD_TESTS:BOOL=OFF  \
-            -DLLVM_INCLUDE_BENCHMARKS:BOOL=OFF  \
+            -DLLVM_BUILD_TESTS:BOOL=OFF \
+            -DLLVM_INCLUDE_BENCHMARKS:BOOL=OFF \
             -DLLVM_INCLUDE_DOCS:BOOL=OFF \
             -DLLVM_INCLUDE_EXAMPLES:BOOL=OFF \
             -DLLVM_INCLUDE_TESTS:BOOL=OFF \
-            -DLLVM_BUILD_BENCHMARKS:BOOL=OFF \
-            -DLLVM_BUILD_DOCS:BOOL=OFF \
-            -DLLVM_BUILD_LLVM_DYLIB:BOOL=OFF \
             -DLLVM_ENABLE_BINDINGS:BOOL=OFF \
             -DLLVM_ENABLE_LIBXML2:BOOL=ON \
-            -DLLDB_ENABLE_PYTHON:BOOL=OFF \
-            -DLLDB_BUILD_FRAMEWORK:BOOL=OFF
+            -DLLDB_BUILD_FRAMEWORK:BOOL=OFF \
+            -DLLDB_ENABLE_PYTHON:BOOL=ON \
+            -DLLDB_EMBED_PYTHON_HOME=ON \
+            -DLLDB_PYTHON_HOME=.. \
+            -DLLDB_PYTHON_RELATIVE_PATH=lib/lldb-python \
+            -DPython3_EXECUTABLE="$(pwd)/../python/bin/python${{ env.PYTHON_VERSION }}"
           cmake --build build --target lldb install --parallel $(nproc)
         working-directory: core/deps/llvm-project
 
@@ -162,12 +188,20 @@ jobs:
         run: |
           cp build/lib/liblldb*.so wamr-lldb/lib
           cp build/lib/liblldb*.so.* wamr-lldb/lib
+          cp -R build/lib/lldb-python wamr-lldb/lib
+          cp -R ../python/lib/python* wamr-lldb/lib
+          cp ../python/lib/libpython${{ env.PYTHON_VERSION }}.so.1.0 wamr-lldb/lib
         working-directory: core/deps/llvm-project
 
       - name: pack macos specific libraries
         if: steps.lldb_build_cache.outputs.cache-hit != 'true' && contains(inputs.runner, 'macos')
         run: |
           cp build/lib/liblldb*.dylib wamr-lldb/lib
+          cp -R build/lib/lldb-python wamr-lldb/lib
+          cp -R ../python/lib/python* wamr-lldb/lib
+          cp ../python/lib/libpython*.dylib wamr-lldb/lib
+          install_name_tool -change /install/lib/libpython${{ env.PYTHON_VERSION }}.dylib @rpath/libpython${{ env.PYTHON_VERSION }}.dylib wamr-lldb/lib/liblldb.*.dylib
+        # Patch path of python library -> https://github.com/indygreg/python-build-standalone/blob/85923ca3911784e6978b85d56e06e9ae75cb2dc4/docs/quirks.rst?plain=1#L412-L446  
         working-directory: core/deps/llvm-project
 
       - name: compress the binary

+ 2 - 2
.github/workflows/build_wamr_vscode_ext.yml

@@ -20,10 +20,10 @@ jobs:
     steps:
       - uses: actions/checkout@v3
 
-      - name: Use Node.js 14.x
+      - name: Use Node.js 16.x
         uses: actions/setup-node@v3
         with:
-          node-version: 14.x
+          node-version: 16.x
 
       - name: set vscode extension to correct version
         run: |

+ 123 - 27
.github/workflows/compilation_on_android_ubuntu.yml

@@ -1,7 +1,7 @@
 # Copyright (C) 2019 Intel Corporation.  All rights reserved.
 # SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 
-name: compilation on android, ubuntu-20.04, ubuntu-22.04
+name: compilation on android, ubuntu-22.04
 
 on:
   # will be triggered on PR events
@@ -21,6 +21,7 @@ on:
       - "tests/wamr-test-suites/**"
       - "wamr-compiler/**"
       - "wamr-sdk/**"
+      - "test-tools/wamr-ide/**"
   # will be triggered on push events
   push:
     branches:
@@ -38,6 +39,7 @@ on:
       - "tests/wamr-test-suites/**"
       - "wamr-compiler/**"
       - "wamr-sdk/**"
+      - "test-tools/wamr-ide/**"
   # allow to be triggered manually
   workflow_dispatch:
 
@@ -65,12 +67,6 @@ env:
   WASI_TEST_OPTIONS: "-s wasi_certification -w"
 
 jobs:
-  build_llvm_libraries_on_ubuntu_2004:
-    uses: ./.github/workflows/build_llvm_libraries.yml
-    with:
-      os: "ubuntu-20.04"
-      arch: "X86"
-
   build_llvm_libraries_on_ubuntu_2204:
     uses: ./.github/workflows/build_llvm_libraries.yml
     with:
@@ -79,13 +75,11 @@ jobs:
   
   build_wamrc:
     needs:
-      [build_llvm_libraries_on_ubuntu_2004, build_llvm_libraries_on_ubuntu_2204]
+      [build_llvm_libraries_on_ubuntu_2204]
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
         include:
-          - os: ubuntu-20.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}
           - os: ubuntu-22.04
             llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2204.outputs.cache_key }}
     steps:
@@ -119,7 +113,7 @@ jobs:
 
   build_iwasm:
     needs:
-      [build_llvm_libraries_on_ubuntu_2004, build_llvm_libraries_on_ubuntu_2204]
+      [build_llvm_libraries_on_ubuntu_2204]
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
@@ -151,7 +145,7 @@ jobs:
             "-DWAMR_BUILD_TAIL_CALL=1",
             "-DWAMR_DISABLE_HW_BOUND_CHECK=1",
           ]
-        os: [ubuntu-20.04, ubuntu-22.04]
+        os: [ubuntu-22.04]
         platform: [android, linux]
         exclude:
           # uncompatiable feature and platform
@@ -215,12 +209,7 @@ jobs:
             platform: android
           - make_options_run_mode: $MULTI_TIER_JIT_BUILD_OPTIONS
             platform: android
-          # only test andorid on ubuntu latest
-          - os: ubuntu-20.04
-            platform: android
         include:
-          - os: ubuntu-20.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}
           - os: ubuntu-22.04
             llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2204.outputs.cache_key }}
     steps:
@@ -256,7 +245,6 @@ jobs:
     needs:
       [
         build_iwasm,
-        build_llvm_libraries_on_ubuntu_2004,
         build_llvm_libraries_on_ubuntu_2204,
         build_wamrc,
       ]
@@ -274,7 +262,7 @@ jobs:
             $LLVM_EAGER_JIT_BUILD_OPTIONS,
             $MULTI_TIER_JIT_BUILD_OPTIONS,
           ]
-        os: [ubuntu-20.04, ubuntu-22.04]
+        os: [ubuntu-22.04]
         wasi_sdk_release:
           [
             "https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz",
@@ -284,8 +272,6 @@ jobs:
             "https://github.com/WebAssembly/wabt/releases/download/1.0.31/wabt-1.0.31-ubuntu.tar.gz",
           ]
         include:
-          - os: ubuntu-20.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}
           - os: ubuntu-22.04
             llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2204.outputs.cache_key }}
 
@@ -338,7 +324,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
-        os: [ubuntu-20.04, ubuntu-22.04]
+        os: [ubuntu-22.04]
         wasi_sdk_release:
           [
             "https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz",
@@ -430,7 +416,6 @@ jobs:
     needs:
       [
         build_iwasm,
-        build_llvm_libraries_on_ubuntu_2004,
         build_llvm_libraries_on_ubuntu_2204,
         build_wamrc,
       ]
@@ -438,7 +423,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        os: [ubuntu-20.04, ubuntu-22.04]
+        os: [ubuntu-22.04]
         running_mode:
           [
             "classic-interp",
@@ -461,9 +446,6 @@ jobs:
             "https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz",
           ]
         include:
-          - os: ubuntu-20.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}
-            ubuntu_version: "20.04"
           - os: ubuntu-22.04
             llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2204.outputs.cache_key }}
             ubuntu_version: "22.04"
@@ -565,3 +547,117 @@ jobs:
         if: env.TEST_ON_X86_32 == 'true'
         run: ./test_wamr.sh ${{ env.X86_32_TARGET_TEST_OPTIONS }} ${{ matrix.test_option }} -t ${{ matrix.running_mode }}
         working-directory: ./tests/wamr-test-suites
+
+  test-wamr-ide:
+    needs:
+      [
+        build_iwasm
+      ]
+    runs-on: ubuntu-22.04
+    env:
+      PYTHON_VERSION: '3.10'
+      PYTHON_UBUNTU_STANDALONE_BUILD: https://github.com/indygreg/python-build-standalone/releases/download/20230507/cpython-3.10.11+20230507-x86_64-unknown-linux-gnu-install_only.tar.gz
+
+    steps:
+      - name: checkout
+        uses: actions/checkout@v3
+
+      - name: install dependencies
+        run: |
+          rustup target add wasm32-wasi
+          sudo apt update && sudo apt-get install -y lld ninja-build
+          npm install
+        working-directory: test-tools/wamr-ide/VSCode-Extension
+
+      - name: build iwasm with source debugging feature
+        run: |
+          mkdir build
+          cd build
+          cmake .. -DWAMR_BUILD_DEBUG_INTERP=1
+          make
+        working-directory: product-mini/platforms/linux
+
+      - name: Cache LLDB
+        id: cache-lldb
+        uses: actions/cache@v3
+        env:
+          cache-name: cache-lldb-vscode
+        with:
+          path: test-tools/wamr-ide/VSCode-Extension/resource/debug/linux
+          key: ${{ env.cache-name }}-${{ hashFiles('build-scripts/lldb-wasm.patch') }}-${{ env.PYTHON_UBUNTU_STANDALONE_BUILD }}
+
+      - if: ${{ steps.cache-lldb.outputs.cache-hit != 'true' }}
+        name: get stand-alone python ubuntu
+        run: |
+          wget ${{ env.PYTHON_UBUNTU_STANDALONE_BUILD }} -O python.tar.gz
+          tar -xvf python.tar.gz
+        working-directory: core/deps
+
+      - if: ${{ steps.cache-lldb.outputs.cache-hit != 'true' }}
+        name: download llvm
+        run: |
+          wget https://github.com/llvm/llvm-project/archive/1f27fe6128769f00197925c3b8f6abb9d0e5cd2e.zip
+          unzip -q 1f27fe6128769f00197925c3b8f6abb9d0e5cd2e.zip
+          mv llvm-project-1f27fe6128769f00197925c3b8f6abb9d0e5cd2e llvm-project
+        working-directory: core/deps
+
+      - if: ${{ steps.cache-lldb.outputs.cache-hit != 'true' }}
+        name: apply wamr patch
+        run: |
+          git init
+          git config user.email "action@github.com"
+          git config user.name "github action"
+          git apply ../../../build-scripts/lldb-wasm.patch
+        working-directory: core/deps/llvm-project
+
+      - if: ${{ steps.cache-lldb.outputs.cache-hit != 'true' }}
+        name: build lldb ubuntu
+        run: |
+          echo "start to build lldb..."
+          mkdir -p wamr-lldb
+          cmake -S ./llvm -B build \
+            -G Ninja \
+            -DCMAKE_INSTALL_PREFIX=../wamr-lldb \
+            -DCMAKE_BUILD_TYPE:STRING="Release" \
+            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
+            -DLLVM_ENABLE_PROJECTS="clang;lldb" \
+            -DLLVM_TARGETS_TO_BUILD:STRING="X86;WebAssembly" \
+            -DLLVM_BUILD_BENCHMARKS:BOOL=OFF \
+            -DLLVM_BUILD_DOCS:BOOL=OFF \
+            -DLLVM_BUILD_EXAMPLES:BOOL=OFF \
+            -DLLVM_BUILD_LLVM_DYLIB:BOOL=OFF \
+            -DLLVM_BUILD_TESTS:BOOL=OFF \
+            -DLLVM_INCLUDE_BENCHMARKS:BOOL=OFF \
+            -DLLVM_INCLUDE_DOCS:BOOL=OFF \
+            -DLLVM_INCLUDE_EXAMPLES:BOOL=OFF \
+            -DLLVM_INCLUDE_TESTS:BOOL=OFF \
+            -DLLVM_ENABLE_BINDINGS:BOOL=OFF \
+            -DLLVM_ENABLE_LIBXML2:BOOL=ON \
+            -DLLVM_ENABLE_LLD:BOOL=ON \
+            -DLLDB_ENABLE_PYTHON:BOOL=ON \
+            -DLLDB_EMBED_PYTHON_HOME=ON \
+            -DLLDB_PYTHON_HOME=.. \
+            -DLLDB_PYTHON_RELATIVE_PATH=lib/lldb-python \
+            -DPython3_EXECUTABLE="$(pwd)/../python/bin/python${{ env.PYTHON_VERSION }}"
+          cmake --build build --target lldb install --parallel $(nproc)
+        working-directory: core/deps/llvm-project
+
+      - if: ${{ steps.cache-lldb.outputs.cache-hit != 'true' }}
+        name: copy lldb to extension folder
+        run: |
+          mkdir -p bin
+          mkdir -p lib
+          cp ../../../../../../core/deps/llvm-project/lldb/tools/lldb-vscode/package.json ./
+          cp -r ../../../../../../core/deps/llvm-project/lldb/tools/lldb-vscode/syntaxes/ ./
+          cp ../../../../../../core/deps/llvm-project/build/bin/lldb* bin
+          cp ../../../../../../core/deps/llvm-project/build/lib/liblldb*.so lib
+          cp ../../../../../../core/deps/llvm-project/build/lib/liblldb*.so.* lib
+          cp -R ../../../../../../core/deps/llvm-project/build/lib/lldb-python lib
+          cp -R ../../../../../../core/deps/python/lib/python* lib
+          cp ../../../../../../core/deps/python/lib/libpython${{ env.PYTHON_VERSION }}.so.1.0 lib
+        working-directory: test-tools/wamr-ide/VSCode-Extension/resource/debug/linux
+
+      - name: run tests
+        timeout-minutes: 5
+        run: xvfb-run npm run test
+        working-directory: test-tools/wamr-ide/VSCode-Extension

+ 23 - 28
.github/workflows/nightly_run.yml

@@ -4,6 +4,14 @@
 name: nightly_run
 
 on:
+  pull_request:
+    types:
+      - opened
+      - synchronize
+    #running nightly pipeline if you're changing it 
+    paths:
+      - ".github/workflows/nightly_run.yml"
+      
   # midnight UTC
   schedule:
     - cron: "0 0 * * *"
@@ -39,24 +47,16 @@ jobs:
     with:
       os: "ubuntu-20.04"
       arch: "X86"
-
-  build_llvm_libraries_on_ubuntu_2204:
-    uses: ./.github/workflows/build_llvm_libraries.yml
-    with:
-      os: "ubuntu-22.04"
-      arch: "X86"
   
   build_wamrc:
     needs:
-      [build_llvm_libraries_on_ubuntu_2004, build_llvm_libraries_on_ubuntu_2204]
+      [build_llvm_libraries_on_ubuntu_2004]
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
         include:
           - os: ubuntu-20.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}
-          - os: ubuntu-22.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2204.outputs.cache_key }}
+            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}  
     steps:
       - name: checkout
         uses: actions/checkout@v3
@@ -88,7 +88,7 @@ jobs:
 
   build_iwasm:
     needs:
-      [build_llvm_libraries_on_ubuntu_2004, build_llvm_libraries_on_ubuntu_2204]
+      [build_llvm_libraries_on_ubuntu_2004]
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
@@ -120,7 +120,7 @@ jobs:
             "-DWAMR_BUILD_TAIL_CALL=1",
             "-DWAMR_DISABLE_HW_BOUND_CHECK=1",
           ]
-        os: [ubuntu-20.04, ubuntu-22.04]
+        os: [ubuntu-20.04]
         platform: [android, linux]
         exclude:
           # uncompatiable feature and platform
@@ -184,14 +184,10 @@ jobs:
             platform: android
           - make_options_run_mode: $MULTI_TIER_JIT_BUILD_OPTIONS
             platform: android
-          # only test andorid on ubuntu latest
-          - os: ubuntu-20.04
-            platform: android
         include:
           - os: ubuntu-20.04
             llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}
-          - os: ubuntu-22.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2204.outputs.cache_key }}
+
     steps:
       - name: checkout
         uses: actions/checkout@v3
@@ -283,7 +279,12 @@ jobs:
         uses: actions/checkout@v3
 
       - name: Install dependencies
-        run: apt update && apt install -y make g++-4.8 gcc-4.8 wget git
+        uses: nick-fields/retry@v2
+        with:
+          timeout_minutes: 10
+          max_attempts: 3
+          command: apt update && apt install -y make g++-4.8 gcc-4.8 wget git
+          on_retry_command: sudo rm -r /var/lib/apt/lists/*
 
       - name: Install cmake
         run: |
@@ -303,7 +304,6 @@ jobs:
       [
         build_iwasm,
         build_llvm_libraries_on_ubuntu_2004,
-        build_llvm_libraries_on_ubuntu_2204,
         build_wamrc,
       ]
     runs-on: ${{ matrix.os }}
@@ -321,7 +321,7 @@ jobs:
             $LLVM_EAGER_JIT_BUILD_OPTIONS,
             $MULTI_TIER_JIT_BUILD_OPTIONS,
           ]
-        os: [ubuntu-20.04, ubuntu-22.04]
+        os: [ubuntu-20.04]
         wasi_sdk_release:
           [
             "https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz",
@@ -333,8 +333,6 @@ jobs:
         include:
           - os: ubuntu-20.04
             llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}
-          - os: ubuntu-22.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2204.outputs.cache_key }}
         exclude:
           - make_options: $MULTI_TIER_JIT_BUILD_OPTIONS
             sanitizer: asan
@@ -386,7 +384,7 @@ jobs:
     runs-on: ${{ matrix.os }}
     strategy:
       matrix:
-        os: [ubuntu-20.04, ubuntu-22.04]
+        os: [ubuntu-20.04]
         wasi_sdk_release:
           [
             "https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-20/wasi-sdk-20.0-linux.tar.gz",
@@ -470,14 +468,13 @@ jobs:
       [
         build_iwasm,
         build_llvm_libraries_on_ubuntu_2004,
-        build_llvm_libraries_on_ubuntu_2204,
         build_wamrc,
       ]
     runs-on: ${{ matrix.os }}
     strategy:
       fail-fast: false
       matrix:
-        os: [ubuntu-20.04, ubuntu-22.04]
+        os: [ubuntu-20.04]
         sanitizer: ["", "ubsan", "asan"]
         running_mode:
           [
@@ -504,9 +501,7 @@ jobs:
           - os: ubuntu-20.04
             llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2004.outputs.cache_key }}
             ubuntu_version: "20.04"
-          - os: ubuntu-22.04
-            llvm_cache_key: ${{ needs.build_llvm_libraries_on_ubuntu_2204.outputs.cache_key }}
-            ubuntu_version: "22.04"
+
         exclude:
           # uncompatiable modes and features
           - os: ubuntu-20.04

+ 1 - 0
.gitignore

@@ -1,4 +1,5 @@
 .cache
+.clangd
 .vs
 .vscode
 .venv

+ 1 - 0
README.md

@@ -77,6 +77,7 @@ The following platforms are supported, click each link below for how to build iw
 - [Blog: Introduction to WAMR running modes](https://bytecodealliance.github.io/wamr.dev/blog/introduction-to-wamr-running-modes/)
 - [Memory usage tunning](./doc/memory_tune.md): the memory model and how to tune the memory usage
 - [Memory usage profiling](./doc/build_wamr.md#enable-memory-profiling-experiment): how to profile the memory usage
+- [Performance tunning](./doc/perf_tune.md): how to tune the performance
 - [Benchmarks](./tests/benchmarks): checkout these links for how to run the benchmarks: [PolyBench](./tests/benchmarks/polybench), [CoreMark](./tests/benchmarks/coremark), [Sightglass](./tests/benchmarks/sightglass), [JetStream2](./tests/benchmarks/jetstream)
 - [Performance and footprint data](https://github.com/bytecodealliance/wasm-micro-runtime/wiki/Performance): the performance and footprint data
 

+ 4 - 0
build-scripts/config_common.cmake

@@ -392,3 +392,7 @@ if (WAMR_BUILD_STATIC_PGO EQUAL 1)
   add_definitions (-DWASM_ENABLE_STATIC_PGO=1)
   message ("     AOT static PGO enabled")
 endif ()
+if (WAMR_DISABLE_WRITE_GS_BASE EQUAL 1)
+  add_definitions (-DWASM_DISABLE_WRITE_GS_BASE=1)
+  message ("     Write linear memory base addr to x86 GS register disabled")
+endif ()

+ 2 - 2
build-scripts/runtime_lib.cmake

@@ -101,7 +101,7 @@ if (WAMR_BUILD_LIB_PTHREAD_SEMAPHORE EQUAL 1)
 endif ()
 
 if (WAMR_BUILD_WASI_NN EQUAL 1)
-    include (${IWASM_DIR}/libraries/wasi-nn/wasi_nn.cmake)
+    include (${IWASM_DIR}/libraries/wasi-nn/cmake/wasi_nn.cmake)
 endif ()
 
 if (WAMR_BUILD_LIB_PTHREAD EQUAL 1)
@@ -177,7 +177,7 @@ set (source_all
     ${UTILS_SHARED_SOURCE}
     ${LIBC_BUILTIN_SOURCE}
     ${LIBC_WASI_SOURCE}
-    ${LIBC_WASI_NN_SOURCE}
+    ${WASI_NN_SOURCES}
     ${IWASM_COMMON_SOURCE}
     ${IWASM_INTERP_SOURCE}
     ${IWASM_AOT_SOURCE}

+ 7 - 0
core/config.h

@@ -449,4 +449,11 @@
 #define WASM_ENABLE_STATIC_PGO 0
 #endif
 
+/* Disable writing linear memory base address to GS segment register,
+   by default only in linux x86-64, linear memory base addr is written
+   to GS segment register before calling wasm/aot function. */
+#ifndef WASM_DISABLE_WRITE_GS_BASE
+#define WASM_DISABLE_WRITE_GS_BASE 0
+#endif
+
 #endif /* end of _CONFIG_H_ */

+ 5 - 1
core/deps/install_tensorflow.sh

@@ -6,6 +6,10 @@ cd ${DEPS_ROOT}
 echo "Downloading tensorflow in ${PWD}..."
 
 git clone https://github.com/tensorflow/tensorflow.git tensorflow-src \
-    --branch v2.11.1
+    --branch v2.12.0
+
+# NOTE: fixes this https://github.com/tensorflow/tensorflow/issues/59631
+cd tensorflow-src
+git cherry-pick 5115fa96d7c5b41451674892317be43e30b7c389
 
 exit 0

+ 15 - 0
core/iwasm/aot/aot_loader.c

@@ -1996,6 +1996,20 @@ do_text_relocation(AOTModule *module, AOTRelocationGroup *group,
             symbol_addr = module->func_ptrs[func_index];
 #endif
         }
+#if defined(BH_PLATFORM_WINDOWS) && defined(BUILD_TARGET_X86_32)
+        /* AOT function name starts with '_' in windows x86-32 */
+        else if (!strncmp(symbol, "_" AOT_FUNC_PREFIX,
+                          strlen("_" AOT_FUNC_PREFIX))) {
+            p = symbol + strlen("_" AOT_FUNC_PREFIX);
+            if (*p == '\0'
+                || (func_index = (uint32)atoi(p)) > module->func_count) {
+                set_error_buf_v(error_buf, error_buf_size, "invalid symbol %s",
+                                symbol);
+                goto check_symbol_fail;
+            }
+            symbol_addr = module->func_ptrs[func_index];
+        }
+#endif
         else if (!strcmp(symbol, ".text")) {
             symbol_addr = module->code;
         }
@@ -2006,6 +2020,7 @@ do_text_relocation(AOTModule *module, AOTRelocationGroup *group,
                  || !strncmp(symbol, ".rodata.cst", strlen(".rodata.cst"))
                  /* ".rodata.strn.m" */
                  || !strncmp(symbol, ".rodata.str", strlen(".rodata.str"))
+                 || !strcmp(symbol, AOT_STACK_SIZES_SECTION_NAME)
 #if WASM_ENABLE_STATIC_PGO != 0
                  || !strncmp(symbol, "__llvm_prf_cnts", 15)
                  || !strncmp(symbol, "__llvm_prf_data", 15)

+ 8 - 21
core/iwasm/aot/aot_runtime.c

@@ -1201,17 +1201,6 @@ aot_instantiate(AOTModule *module, bool is_sub_inst, WASMExecEnv *exec_env_main,
     }
 #endif
 
-#if WASM_ENABLE_WASI_NN != 0
-    if (!is_sub_inst) {
-        if (!(((AOTModuleInstanceExtra *)module_inst->e)->wasi_nn_ctx =
-                  wasi_nn_initialize())) {
-            set_error_buf(error_buf, error_buf_size,
-                          "wasi nn initialization failed");
-            goto fail;
-        }
-    }
-#endif
-
     /* Initialize the thread related data */
     if (stack_size == 0)
         stack_size = DEFAULT_WASM_STACK_SIZE;
@@ -1310,12 +1299,8 @@ aot_deinstantiate(AOTModuleInstance *module_inst, bool is_sub_inst)
             ((AOTModuleInstanceExtra *)module_inst->e)->c_api_func_imports);
 
 #if WASM_ENABLE_WASI_NN != 0
-    if (!is_sub_inst) {
-        WASINNContext *wasi_nn_ctx =
-            ((AOTModuleInstanceExtra *)module_inst->e)->wasi_nn_ctx;
-        if (wasi_nn_ctx)
-            wasi_nn_destroy(wasi_nn_ctx);
-    }
+    if (!is_sub_inst)
+        wasi_nn_destroy(module_inst);
 #endif
 
     wasm_runtime_free(module_inst);
@@ -2797,12 +2782,14 @@ aot_dump_call_stack(WASMExecEnv *exec_env, bool print, char *buf, uint32 len)
 
         /* function name not exported, print number instead */
         if (frame.func_name_wp == NULL) {
-            line_length = snprintf(line_buf, sizeof(line_buf), "#%02d $f%d\n",
-                                   n, frame.func_index);
+            line_length =
+                snprintf(line_buf, sizeof(line_buf),
+                         "#%02" PRIu32 " $f%" PRIu32 "\n", n, frame.func_index);
         }
         else {
-            line_length = snprintf(line_buf, sizeof(line_buf), "#%02d %s\n", n,
-                                   frame.func_name_wp);
+            line_length =
+                snprintf(line_buf, sizeof(line_buf), "#%02" PRIu32 " %s\n", n,
+                         frame.func_name_wp);
         }
 
         if (line_length >= sizeof(line_buf)) {

+ 0 - 3
core/iwasm/aot/aot_runtime.h

@@ -89,9 +89,6 @@ typedef struct AOTFunctionInstance {
 
 typedef struct AOTModuleInstanceExtra {
     CApiFuncImport *c_api_func_imports;
-#if WASM_ENABLE_WASI_NN != 0
-    WASINNContext *wasi_nn_ctx;
-#endif
 } AOTModuleInstanceExtra;
 
 #if defined(OS_ENABLE_HW_BOUND_CHECK) && defined(BH_PLATFORM_WINDOWS)

+ 44 - 11
core/iwasm/aot/arch/aot_reloc_riscv.c

@@ -9,6 +9,8 @@
 #define R_RISCV_64 2
 #define R_RISCV_CALL 18
 #define R_RISCV_CALL_PLT 19
+#define R_RISCV_PCREL_HI20 23
+#define R_RISCV_PCREL_LO12_I 24
 #define R_RISCV_HI20 26
 #define R_RISCV_LO12_I 27
 #define R_RISCV_LO12_S 28
@@ -267,9 +269,10 @@ typedef struct RelocTypeStrMap {
     }
 
 static RelocTypeStrMap reloc_type_str_maps[] = {
-    RELOC_TYPE_MAP(R_RISCV_32),       RELOC_TYPE_MAP(R_RISCV_CALL),
-    RELOC_TYPE_MAP(R_RISCV_CALL_PLT), RELOC_TYPE_MAP(R_RISCV_HI20),
-    RELOC_TYPE_MAP(R_RISCV_LO12_I),   RELOC_TYPE_MAP(R_RISCV_LO12_S),
+    RELOC_TYPE_MAP(R_RISCV_32),           RELOC_TYPE_MAP(R_RISCV_CALL),
+    RELOC_TYPE_MAP(R_RISCV_CALL_PLT),     RELOC_TYPE_MAP(R_RISCV_PCREL_HI20),
+    RELOC_TYPE_MAP(R_RISCV_PCREL_LO12_I), RELOC_TYPE_MAP(R_RISCV_HI20),
+    RELOC_TYPE_MAP(R_RISCV_LO12_I),       RELOC_TYPE_MAP(R_RISCV_LO12_S),
 };
 
 static const char *
@@ -369,13 +372,29 @@ apply_relocation(AOTModule *module, uint8 *target_section_addr,
             break;
         }
 
-        case R_RISCV_HI20:
+        case R_RISCV_HI20:       /* S + A */
+        case R_RISCV_PCREL_HI20: /* S + A - P */
         {
-            val = (int32)((intptr_t)symbol_addr + (intptr_t)reloc_addend);
+            if (reloc_type == R_RISCV_PCREL_HI20) {
+                val = (int32)((intptr_t)symbol_addr + (intptr_t)reloc_addend
+                              - (intptr_t)addr);
+            }
+            else {
+                val = (int32)((intptr_t)symbol_addr + (intptr_t)reloc_addend);
+            }
 
             CHECK_RELOC_OFFSET(sizeof(uint32));
-            if (val != ((intptr_t)symbol_addr + (intptr_t)reloc_addend)) {
-                goto fail_addr_out_of_range;
+            if (reloc_type == R_RISCV_PCREL_HI20) {
+                if (val
+                    != ((intptr_t)symbol_addr + (intptr_t)reloc_addend
+                        - (intptr_t)addr)) {
+                    goto fail_addr_out_of_range;
+                }
+            }
+            else {
+                if (val != ((intptr_t)symbol_addr + (intptr_t)reloc_addend)) {
+                    goto fail_addr_out_of_range;
+                }
             }
 
             addr = target_section_addr + reloc_offset;
@@ -386,13 +405,27 @@ apply_relocation(AOTModule *module, uint8 *target_section_addr,
             break;
         }
 
-        case R_RISCV_LO12_I:
+        case R_RISCV_LO12_I:       /* S + A */
+        case R_RISCV_PCREL_LO12_I: /* S - P */
         {
-            val = (int32)((intptr_t)symbol_addr + (intptr_t)reloc_addend);
+            if (reloc_type == R_RISCV_PCREL_LO12_I) {
+                /* A = 0 */
+                val = (int32)((intptr_t)symbol_addr - (intptr_t)addr);
+            }
+            else {
+                val = (int32)((intptr_t)symbol_addr + (intptr_t)reloc_addend);
+            }
 
             CHECK_RELOC_OFFSET(sizeof(uint32));
-            if (val != (intptr_t)symbol_addr + (intptr_t)reloc_addend) {
-                goto fail_addr_out_of_range;
+            if (reloc_type == R_RISCV_PCREL_LO12_I) {
+                if (val != (intptr_t)symbol_addr - (intptr_t)addr) {
+                    goto fail_addr_out_of_range;
+                }
+            }
+            else {
+                if (val != (intptr_t)symbol_addr + (intptr_t)reloc_addend) {
+                    goto fail_addr_out_of_range;
+                }
             }
 
             addr = target_section_addr + reloc_offset;

+ 26 - 4
core/iwasm/aot/arch/aot_reloc_x86_32.c

@@ -5,12 +5,19 @@
 
 #include "aot_reloc.h"
 
+/* clang-format off */
+#if !defined(BH_PLATFORM_WINDOWS)
 #define R_386_32 1    /* Direct 32 bit  */
 #define R_386_PC32 2  /* PC relative 32 bit */
 #define R_386_PLT32 4 /* 32-bit address ProcedureLinkageTable */
-#define R_386_TLS_GD_32                      \
-    24 /*  Direct 32 bit for general dynamic \
-           thread local data */
+#define R_386_TLS_GD_32 24 /* Direct 32 bit for general dynamic
+                              thread local data */
+#else
+#define IMAGE_REL_I386_DIR32 6 /* The target's 32-bit VA */
+#define IMAGE_REL_I386_REL32 20 /* The 32-bit relative displacement
+                                   to the target */
+#endif
+/* clang-format on */
 
 #if !defined(_WIN32) && !defined(_WIN32_)
 /* clang-format off */
@@ -48,6 +55,12 @@ __umoddi3(uint64 a, uint64 b)
 }
 #endif
 
+static uint64
+__aulldiv(uint64 a, uint64 b)
+{
+    return a / b;
+}
+
 /* clang-format off */
 static SymbolMap target_sym_map[] = {
     REG_COMMON_SYMBOLS
@@ -55,7 +68,8 @@ static SymbolMap target_sym_map[] = {
     REG_SYM(__divdi3),
     REG_SYM(__udivdi3),
     REG_SYM(__moddi3),
-    REG_SYM(__umoddi3)
+    REG_SYM(__umoddi3),
+    REG_SYM(__aulldiv)
 };
 /* clang-format on */
 
@@ -112,9 +126,13 @@ apply_relocation(AOTModule *module, uint8 *target_section_addr,
                  int32 symbol_index, char *error_buf, uint32 error_buf_size)
 {
     switch (reloc_type) {
+#if !defined(BH_PLATFORM_WINDOWS)
         case R_386_32:
 #if WASM_ENABLE_STATIC_PGO != 0
         case R_386_TLS_GD_32:
+#endif
+#else
+        case IMAGE_REL_I386_DIR32:
 #endif
         {
             intptr_t value;
@@ -127,12 +145,16 @@ apply_relocation(AOTModule *module, uint8 *target_section_addr,
             break;
         }
 
+#if !defined(BH_PLATFORM_WINDOWS)
         /*
          * Handle R_386_PLT32 like R_386_PC32 since it should be able to reach
          * any 32 bit address
          */
         case R_386_PLT32:
         case R_386_PC32:
+#else
+        case IMAGE_REL_I386_REL32:
+#endif
         {
             int32 value;
 

+ 19 - 0
core/iwasm/compilation/aot.h

@@ -19,6 +19,25 @@ extern "C" {
 #define AOT_FUNC_PREFIX "aot_func#"
 #endif
 
+#ifndef AOT_FUNC_INTERNAL_PREFIX
+#define AOT_FUNC_INTERNAL_PREFIX "aot_func_internal#"
+#endif
+
+#ifndef AOT_STACK_SIZES_NAME
+#define AOT_STACK_SIZES_NAME "aot_stack_sizes"
+#endif
+extern const char *aot_stack_sizes_name;
+
+#ifndef AOT_STACK_SIZES_ALIAS_NAME
+#define AOT_STACK_SIZES_ALIAS_NAME "aot_stack_sizes_alias"
+#endif
+extern const char *aot_stack_sizes_alias_name;
+
+#ifndef AOT_STACK_SIZES_SECTION_NAME
+#define AOT_STACK_SIZES_SECTION_NAME ".aot_stack_sizes"
+#endif
+extern const char *aot_stack_sizes_section_name;
+
 typedef InitializerExpression AOTInitExpr;
 typedef WASMType AOTFuncType;
 typedef WASMExport AOTExport;

+ 95 - 2
core/iwasm/compilation/aot_compiler.c

@@ -2761,6 +2761,16 @@ aot_compile_wasm(AOTCompContext *comp_ctx)
             aot_handle_llvm_errmsg("failed to addIRModule", err);
             return false;
         }
+
+        if (comp_ctx->stack_sizes != NULL) {
+            LLVMOrcJITTargetAddress addr;
+            if ((err = LLVMOrcLLLazyJITLookup(comp_ctx->orc_jit, &addr,
+                                              aot_stack_sizes_name))) {
+                aot_handle_llvm_errmsg("failed to look up stack_sizes", err);
+                return false;
+            }
+            comp_ctx->jit_stack_sizes = (uint32 *)addr;
+        }
     }
 
     return true;
@@ -2815,6 +2825,55 @@ aot_emit_llvm_file(AOTCompContext *comp_ctx, const char *file_name)
     return true;
 }
 
+static bool
+aot_move_file(const char *dest, const char *src)
+{
+    FILE *dfp = fopen(dest, "w");
+    FILE *sfp = fopen(src, "r");
+    size_t rsz;
+    char buf[128];
+    bool success = false;
+
+    if (dfp == NULL || sfp == NULL) {
+        LOG_DEBUG("open error %s %s", dest, src);
+        goto fail;
+    }
+    do {
+        rsz = fread(buf, 1, sizeof(buf), sfp);
+        if (rsz > 0) {
+            size_t wsz = fwrite(buf, 1, rsz, dfp);
+            if (wsz < rsz) {
+                LOG_DEBUG("write error");
+                goto fail;
+            }
+        }
+        if (rsz < sizeof(buf)) {
+            if (ferror(sfp)) {
+                LOG_DEBUG("read error");
+                goto fail;
+            }
+        }
+    } while (rsz > 0);
+    success = true;
+fail:
+    if (dfp != NULL) {
+        if (fclose(dfp)) {
+            LOG_DEBUG("close error");
+            success = false;
+        }
+        if (!success) {
+            (void)unlink(dest);
+        }
+    }
+    if (sfp != NULL) {
+        (void)fclose(sfp);
+    }
+    if (success) {
+        (void)unlink(src);
+    }
+    return success;
+}
+
 bool
 aot_emit_object_file(AOTCompContext *comp_ctx, char *file_name)
 {
@@ -2830,7 +2889,25 @@ aot_emit_object_file(AOTCompContext *comp_ctx, char *file_name)
         int ret;
 
         if (comp_ctx->external_llc_compiler) {
+            const char *stack_usage_flag = "";
             char bc_file_name[64];
+            char su_file_name[65]; /* See the comment below */
+
+            if (comp_ctx->stack_usage_file != NULL) {
+                /*
+                 * Note: we know the caller uses 64 byte buffer for
+                 * file_name. It will get 1 byte longer because we
+                 * replace ".o" with ".su".
+                 */
+                size_t len = strlen(file_name);
+                bh_assert(len + 1 <= sizeof(su_file_name));
+                bh_assert(len > 3);
+                bh_assert(file_name[len - 2] == '.');
+                bh_assert(file_name[len - 1] == 'o');
+                snprintf(su_file_name, sizeof(su_file_name), "%.*s.su",
+                         (int)(len - 2), file_name);
+                stack_usage_flag = " -fstack-usage";
+            }
 
             if (!aot_generate_tempfile_name("wamrc-bc", "bc", bc_file_name,
                                             sizeof(bc_file_name))) {
@@ -2842,8 +2919,8 @@ aot_emit_object_file(AOTCompContext *comp_ctx, char *file_name)
                 return false;
             }
 
-            snprintf(cmd, sizeof(cmd), "%s %s -o %s %s",
-                     comp_ctx->external_llc_compiler,
+            snprintf(cmd, sizeof(cmd), "%s%s %s -o %s %s",
+                     comp_ctx->external_llc_compiler, stack_usage_flag,
                      comp_ctx->llc_compiler_flags ? comp_ctx->llc_compiler_flags
                                                   : "-O3 -c",
                      file_name, bc_file_name);
@@ -2858,6 +2935,22 @@ aot_emit_object_file(AOTCompContext *comp_ctx, char *file_name)
                                    "with external LLC compiler.");
                 return false;
             }
+            if (comp_ctx->stack_usage_file != NULL) {
+                /*
+                 * move the temporary .su file to the specified location.
+                 *
+                 * Note: the former is automatimally inferred from the output
+                 * filename (file_name here) by clang.
+                 *
+                 * Note: the latter might be user-specified.
+                 * (wamrc --stack-usage=<file>)
+                 */
+                if (!aot_move_file(comp_ctx->stack_usage_file, su_file_name)) {
+                    aot_set_last_error("failed to move su file.");
+                    (void)unlink(su_file_name);
+                    return false;
+                }
+            }
         }
         else if (comp_ctx->external_asm_compiler) {
             char asm_file_name[64];

+ 337 - 1
core/iwasm/compilation/aot_emit_aot_file.c

@@ -140,6 +140,10 @@ typedef struct AOTObjectData {
     AOTSymbolList symbol_list;
     AOTRelocationGroup *relocation_groups;
     uint32 relocation_group_count;
+
+    const char *stack_sizes_section_name;
+    uint32 stack_sizes_offset;
+    uint32 *stack_sizes;
 } AOTObjectData;
 
 #if 0
@@ -1634,7 +1638,31 @@ aot_emit_object_data_section_info(uint8 *buf, uint8 *buf_end, uint32 *p_offset,
         EMIT_STR(data_section->name);
         offset = align_uint(offset, 4);
         EMIT_U32(data_section->size);
-        EMIT_BUF(data_section->data, data_section->size);
+        if (obj_data->stack_sizes_section_name != NULL
+            && !strcmp(obj_data->stack_sizes_section_name,
+                       data_section->name)) {
+            uint32 ss_offset = obj_data->stack_sizes_offset;
+            uint32 ss_size =
+                obj_data->func_count * sizeof(*obj_data->stack_sizes);
+            LOG_VERBOSE("Replacing stack_sizes in %s section, offset %" PRIu32
+                        ", size %" PRIu32,
+                        obj_data->stack_sizes_section_name, ss_offset, ss_size);
+            bh_assert(ss_offset + ss_size <= data_section->size);
+            /* 0 .. ss_offset */
+            if (ss_offset > 0) {
+                EMIT_BUF(data_section->data, ss_offset);
+            }
+            /* ss_offset .. ss_offset+ss_size */
+            EMIT_BUF(obj_data->stack_sizes, ss_size);
+            /* ss_offset+ss_size .. data_section->size */
+            if (data_section->size > ss_offset + ss_size) {
+                EMIT_BUF(data_section->data + ss_offset + ss_size,
+                         data_section->size - (ss_offset + ss_size));
+            }
+        }
+        else {
+            EMIT_BUF(data_section->data, data_section->size);
+        }
     }
 
     if (offset - *p_offset
@@ -2305,6 +2333,7 @@ is_data_section(AOTObjectData *obj_data, LLVMSectionIteratorRef sec_itr,
             || (!strcmp(section_name, ".rdata")
                 && get_relocations_count(sec_itr, &relocation_count)
                 && relocation_count > 0)
+            || !strcmp(section_name, aot_stack_sizes_section_name)
             || (obj_data->comp_ctx->enable_llvm_pgo
                 && (!strncmp(section_name, "__llvm_prf_cnts", 15)
                     || !strncmp(section_name, "__llvm_prf_data", 15)
@@ -2418,6 +2447,298 @@ aot_resolve_object_data_sections(AOTObjectData *obj_data)
     return true;
 }
 
+static bool
+read_stack_usage_file(const AOTCompContext *comp_ctx, const char *filename,
+                      uint32 *sizes, uint32 count)
+{
+    FILE *fp = NULL;
+    if (filename == NULL) {
+        aot_set_last_error("no stack usage file is specified.");
+        return false;
+    }
+    fp = fopen(filename, "r");
+    if (fp == NULL) {
+        LOG_ERROR("failed to open stack usage file: %s", filename);
+        goto fail;
+    }
+    /*
+     * the file consists of lines like:
+     *
+     * WASM Module:aot_func#9  72  static
+     */
+    const char *aot_func_prefix = AOT_FUNC_PREFIX;
+    const char *aot_func_internal_prefix = AOT_FUNC_INTERNAL_PREFIX;
+    uint32 precheck_found = 0;
+    uint32 precheck_stack_size_max = 0;
+    uint32 precheck_stack_size_min = UINT32_MAX;
+    uint32 found = 0;
+    while (true) {
+        const char *prefix;
+        char line[100];
+        char *cp = fgets(line, sizeof(line), fp);
+        char *fn;
+        char *colon;
+        uintmax_t func_idx;
+        uintmax_t sz;
+        int ret;
+
+        if (cp == NULL) {
+            break;
+        }
+        /*
+         * Note: strrchr (not strchr) because a module name can contain
+         * colons.
+         */
+        colon = strrchr(cp, ':');
+        if (colon == NULL) {
+            goto fail;
+        }
+        fn = strstr(colon, aot_func_prefix);
+        if (fn != NULL) {
+            prefix = aot_func_prefix;
+        }
+        else {
+            fn = strstr(colon, aot_func_internal_prefix);
+            if (fn == NULL) {
+                LOG_ERROR("failed to parse stack usage line: %s", cp);
+                goto fail;
+            }
+            prefix = aot_func_internal_prefix;
+        }
+        ret = sscanf(fn + strlen(prefix), "%ju %ju static", &func_idx, &sz);
+        if (ret != 2) {
+            goto fail;
+        }
+        if (sz > UINT32_MAX) {
+            goto fail;
+        }
+        if (func_idx > UINT32_MAX) {
+            goto fail;
+        }
+        if (func_idx >= count) {
+            goto fail;
+        }
+        if (prefix == aot_func_prefix) {
+            if (sz < precheck_stack_size_min) {
+                precheck_stack_size_min = sz;
+            }
+            if (sz > precheck_stack_size_max) {
+                precheck_stack_size_max = sz;
+            }
+            precheck_found++;
+            continue;
+        }
+        sizes[func_idx] = sz;
+        found++;
+    }
+    fclose(fp);
+    if (precheck_found != count) {
+        LOG_ERROR("%" PRIu32 " precheck entries found while %" PRIu32
+                  " entries are expected",
+                  precheck_found, count);
+        return false;
+    }
+    if (found != count) {
+        /*
+         * LLVM seems to eliminate calls to an empty function
+         * (and eliminate the function) even if it's marked noinline.
+         */
+        LOG_VERBOSE("%" PRIu32 " entries found while %" PRIu32
+                    " entries are expected. Maybe LLVM optimization eliminated "
+                    "some functions.",
+                    found, count);
+    }
+    if (precheck_stack_size_min != precheck_stack_size_max) {
+        /*
+         * Note: this is too strict.
+         *
+         * actually, the stack consumption of the precheck functions
+         * can depend on the type of them.
+         * that is, depending on various factors including
+         * calling conventions and compilers, a function with many
+         * parameters can consume more stack, even if it merely does
+         * a tail-call to another function.
+         */
+        bool musttail = aot_target_precheck_can_use_musttail(comp_ctx);
+        if (musttail) {
+            LOG_WARNING(
+                "precheck functions use variable amount of stack. (%" PRIu32
+                " - %" PRIu32 ")",
+                precheck_stack_size_min, precheck_stack_size_max);
+        }
+        else {
+            LOG_VERBOSE("precheck functions use %" PRIu32 " - %" PRIu32
+                        " bytes of stack.",
+                        precheck_stack_size_min, precheck_stack_size_max);
+        }
+    }
+    else {
+        LOG_VERBOSE("precheck functions use %" PRIu32 " bytes of stack.",
+                    precheck_stack_size_max);
+    }
+    if (precheck_stack_size_max >= 1024) {
+        LOG_WARNING("precheck functions themselves consume relatively large "
+                    "amount of stack (%" PRIu32
+                    "). Please ensure the runtime has large enough "
+                    "WASM_STACK_GUARD_SIZE.",
+                    precheck_stack_size_max);
+    }
+    return true;
+fail:
+    if (fp != NULL)
+        fclose(fp);
+    aot_set_last_error("failed to read stack usage file.");
+    return false;
+}
+
+static bool
+aot_resolve_stack_sizes(AOTCompContext *comp_ctx, AOTObjectData *obj_data)
+{
+    LLVMSectionIteratorRef sec_itr = NULL;
+    LLVMSymbolIteratorRef sym_itr;
+    const char *name;
+
+    if (!(sym_itr = LLVMObjectFileCopySymbolIterator(obj_data->binary))) {
+        aot_set_last_error("llvm get symbol iterator failed.");
+        return false;
+    }
+
+    while (!LLVMObjectFileIsSymbolIteratorAtEnd(obj_data->binary, sym_itr)) {
+        if ((name = LLVMGetSymbolName(sym_itr))
+            && !strcmp(name, aot_stack_sizes_alias_name)) {
+            uint64 sz = LLVMGetSymbolSize(sym_itr);
+            if (sz != sizeof(uint32) * obj_data->func_count) {
+                aot_set_last_error("stack_sizes had unexpected size.");
+                goto fail;
+            }
+            uint64 addr = LLVMGetSymbolAddress(sym_itr);
+            if (!(sec_itr =
+                      LLVMObjectFileCopySectionIterator(obj_data->binary))) {
+                aot_set_last_error("llvm get section iterator failed.");
+                goto fail;
+            }
+            LLVMMoveToContainingSection(sec_itr, sym_itr);
+            const char *sec_name = LLVMGetSectionName(sec_itr);
+            LOG_VERBOSE("stack_sizes found in section %s offset %" PRIu64 ".",
+                        sec_name, addr);
+            if (strcmp(sec_name, aot_stack_sizes_section_name) || addr != 0) {
+                aot_set_last_error(
+                    "stack_sizes found at an unexpected location.");
+                goto fail;
+            }
+            /*
+             * Note: We can't always modify stack_sizes in-place.
+             * Eg. When WAMRC_LLC_COMPILER is used, LLVM sometimes uses
+             * read-only mmap of the temporary file to back
+             * LLVMGetSectionContents.
+             */
+            const uint32 *ro_stack_sizes =
+                (const uint32 *)(LLVMGetSectionContents(sec_itr) + addr);
+            uint32 i;
+            for (i = 0; i < obj_data->func_count; i++) {
+                /* Note: -1 == AOT_NEG_ONE from aot_create_stack_sizes */
+                if (ro_stack_sizes[i] != (uint32)-1) {
+                    aot_set_last_error("unexpected data in stack_sizes.");
+                    goto fail;
+                }
+            }
+            if (addr > UINT32_MAX) {
+                aot_set_last_error("too large stack_sizes offset.");
+                goto fail;
+            }
+            /*
+             * Record section/offset and construct a copy of stack_sizes.
+             * aot_emit_object_data_section_info will emit this copy.
+             */
+            obj_data->stack_sizes_section_name = sec_name;
+            obj_data->stack_sizes_offset = addr;
+            obj_data->stack_sizes = wasm_runtime_malloc(
+                obj_data->func_count * sizeof(*obj_data->stack_sizes));
+            if (obj_data->stack_sizes == NULL) {
+                aot_set_last_error("failed to allocate memory.");
+                goto fail;
+            }
+            uint32 *stack_sizes = obj_data->stack_sizes;
+            for (i = 0; i < obj_data->func_count; i++) {
+                stack_sizes[i] = (uint32)-1;
+            }
+            if (!read_stack_usage_file(comp_ctx, comp_ctx->stack_usage_file,
+                                       stack_sizes, obj_data->func_count)) {
+                goto fail;
+            }
+            for (i = 0; i < obj_data->func_count; i++) {
+                const AOTFuncContext *func_ctx = comp_ctx->func_ctxes[i];
+                bool musttail = aot_target_precheck_can_use_musttail(comp_ctx);
+                unsigned int stack_consumption_to_call_wrapped_func =
+                    musttail ? 0
+                             : aot_estimate_stack_usage_for_function_call(
+                                 comp_ctx, func_ctx->aot_func->func_type);
+
+                /*
+                 * LLVM seems to eliminate calls to an empty function
+                 * (and eliminate the function) even if it's marked noinline.
+                 *
+                 * Note: -1 == AOT_NEG_ONE from aot_create_stack_sizes
+                 */
+                if (stack_sizes[i] == (uint32)-1) {
+                    if (func_ctx->stack_consumption_for_func_call != 0) {
+                        /*
+                         * This happens if a function calling another
+                         * function has been optimized out.
+                         *
+                         * for example,
+                         *
+                         *   (func $func
+                         *     (local i32)
+                         *     local.get 0
+                         *     if
+                         *       call $another
+                         *     end
+                         *   )
+                         */
+                        LOG_VERBOSE("AOT func#%" PRIu32
+                                    " had call(s) but eliminated?",
+                                    i);
+                    }
+                    else {
+                        LOG_VERBOSE("AOT func#%" PRIu32 " eliminated?", i);
+                    }
+                    stack_sizes[i] = 0;
+                }
+                else {
+                    LOG_VERBOSE("AOT func#%" PRIu32 " stack_size %u + %" PRIu32
+                                " + %u",
+                                i, stack_consumption_to_call_wrapped_func,
+                                stack_sizes[i],
+                                func_ctx->stack_consumption_for_func_call);
+                    if (UINT32_MAX - stack_sizes[i]
+                        < func_ctx->stack_consumption_for_func_call) {
+                        aot_set_last_error("stack size overflow.");
+                        goto fail;
+                    }
+                    stack_sizes[i] += func_ctx->stack_consumption_for_func_call;
+                    if (UINT32_MAX - stack_sizes[i]
+                        < stack_consumption_to_call_wrapped_func) {
+                        aot_set_last_error("stack size overflow.");
+                        goto fail;
+                    }
+                    stack_sizes[i] += stack_consumption_to_call_wrapped_func;
+                }
+            }
+            LLVMDisposeSectionIterator(sec_itr);
+            LLVMDisposeSymbolIterator(sym_itr);
+            return true;
+        }
+        LLVMMoveToNextSymbol(sym_itr);
+    }
+    aot_set_last_error("stack_sizes not found.");
+fail:
+    if (sec_itr)
+        LLVMDisposeSectionIterator(sec_itr);
+    LLVMDisposeSymbolIterator(sym_itr);
+    return false;
+}
+
 static bool
 aot_resolve_functions(AOTCompContext *comp_ctx, AOTObjectData *obj_data)
 {
@@ -2429,6 +2750,10 @@ aot_resolve_functions(AOTCompContext *comp_ctx, AOTObjectData *obj_data)
     /* allocate memory for aot function */
     obj_data->func_count = comp_ctx->comp_data->func_count;
     if (obj_data->func_count) {
+        if ((comp_ctx->enable_stack_bound_check
+             || comp_ctx->enable_stack_estimation)
+            && !aot_resolve_stack_sizes(comp_ctx, obj_data))
+            return false;
         total_size = (uint32)sizeof(AOTObjectFunc) * obj_data->func_count;
         if (!(obj_data->funcs = wasm_runtime_malloc(total_size))) {
             aot_set_last_error("allocate memory for functions failed.");
@@ -2611,6 +2936,15 @@ aot_resolve_object_relocation_group(AOTObjectData *obj_data,
                 + align_uint(obj_data->text_unlikely_size, 4);
         }
 
+        /*
+         * Note: aot_stack_sizes_section_name section only contains
+         * stack_sizes table.
+         */
+        if (!strcmp(relocation->symbol_name, aot_stack_sizes_name)) {
+            /* discard const */
+            relocation->symbol_name = (char *)aot_stack_sizes_section_name;
+        }
+
         if (obj_data->comp_ctx->enable_llvm_pgo
             && (!strcmp(relocation->symbol_name, "__llvm_prf_cnts")
                 || !strcmp(relocation->symbol_name, "__llvm_prf_data"))) {
@@ -2922,6 +3256,8 @@ aot_obj_data_destroy(AOTObjectData *obj_data)
                                   obj_data->relocation_group_count);
     if (obj_data->symbol_list.len)
         destroy_relocation_symbol_list(&obj_data->symbol_list);
+    if (obj_data->stack_sizes)
+        wasm_runtime_free(obj_data->stack_sizes);
     wasm_runtime_free(obj_data);
 }
 

+ 33 - 156
core/iwasm/compilation/aot_emit_function.c

@@ -366,143 +366,6 @@ fail:
 #endif /* end of (WASM_ENABLE_DUMP_CALL_STACK != 0) \
                  || (WASM_ENABLE_PERF_PROFILING != 0) */
 
-static bool
-record_stack_usage(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-                   uint32 callee_cell_num)
-{
-    LLVMBasicBlockRef block_curr = LLVMGetInsertBlock(comp_ctx->builder);
-    LLVMBasicBlockRef block_update;
-    LLVMBasicBlockRef block_after_update;
-    LLVMValueRef callee_local_size, new_sp, cmp;
-    LLVMValueRef native_stack_top_min;
-    LLVMTypeRef ptrdiff_type;
-    if (comp_ctx->pointer_size == sizeof(uint64_t)) {
-        ptrdiff_type = I64_TYPE;
-    }
-    else {
-        ptrdiff_type = I32_TYPE;
-    }
-
-    /*
-     * new_sp = last_alloca - callee_local_size;
-     * if (*native_stack_top_min_addr > new_sp) {
-     *    *native_stack_top_min_addr = new_sp;
-     * }
-     */
-
-    if (!(callee_local_size = LLVMConstInt(
-              ptrdiff_type, -(int64_t)callee_cell_num * 4, true))) {
-        aot_set_last_error("llvm build const failed.");
-        return false;
-    }
-    if (!(new_sp = LLVMBuildInBoundsGEP2(comp_ctx->builder, INT8_TYPE,
-                                         func_ctx->last_alloca,
-                                         &callee_local_size, 1, "new_sp"))) {
-        aot_set_last_error("llvm build gep failed");
-        return false;
-    }
-    if (!(native_stack_top_min = LLVMBuildLoad2(
-              comp_ctx->builder, OPQ_PTR_TYPE,
-              func_ctx->native_stack_top_min_addr, "native_stack_top_min"))) {
-        aot_set_last_error("llvm build load failed");
-        return false;
-    }
-    if (!(cmp = LLVMBuildICmp(comp_ctx->builder, LLVMIntULT, new_sp,
-                              native_stack_top_min, "cmp"))) {
-        aot_set_last_error("llvm build icmp failed.");
-        return false;
-    }
-
-    if (!(block_update = LLVMAppendBasicBlockInContext(
-              comp_ctx->context, func_ctx->func, "block_update"))) {
-        aot_set_last_error("llvm add basic block failed.");
-        return false;
-    }
-    if (!(block_after_update = LLVMAppendBasicBlockInContext(
-              comp_ctx->context, func_ctx->func, "block_after_update"))) {
-        aot_set_last_error("llvm add basic block failed.");
-        return false;
-    }
-    LLVMMoveBasicBlockAfter(block_update, block_curr);
-    LLVMMoveBasicBlockAfter(block_after_update, block_update);
-
-    if (!LLVMBuildCondBr(comp_ctx->builder, cmp, block_update,
-                         block_after_update)) {
-        aot_set_last_error("llvm build cond br failed.");
-        return false;
-    }
-
-    LLVMPositionBuilderAtEnd(comp_ctx->builder, block_update);
-    if (!LLVMBuildStore(comp_ctx->builder, new_sp,
-                        func_ctx->native_stack_top_min_addr)) {
-        aot_set_last_error("llvm build store failed");
-        return false;
-    }
-    if (!LLVMBuildBr(comp_ctx->builder, block_after_update)) {
-        aot_set_last_error("llvm build br failed.");
-        return false;
-    }
-
-    LLVMPositionBuilderAtEnd(comp_ctx->builder, block_after_update);
-    return true;
-}
-
-static bool
-check_stack_boundary(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-                     uint32 callee_cell_num)
-{
-    LLVMBasicBlockRef block_curr = LLVMGetInsertBlock(comp_ctx->builder);
-    LLVMBasicBlockRef check_stack;
-    LLVMValueRef callee_local_size, stack_bound, cmp;
-
-    if (!(callee_local_size = I32_CONST(callee_cell_num * 4))) {
-        aot_set_last_error("llvm build const failed.");
-        return false;
-    }
-
-    if (!(stack_bound = LLVMBuildInBoundsGEP2(
-              comp_ctx->builder, INT8_TYPE, func_ctx->native_stack_bound,
-              &callee_local_size, 1, "stack_bound"))) {
-        aot_set_last_error("llvm build inbound gep failed.");
-        return false;
-    }
-
-    if (!(check_stack = LLVMAppendBasicBlockInContext(
-              comp_ctx->context, func_ctx->func, "check_stack"))) {
-        aot_set_last_error("llvm add basic block failed.");
-        return false;
-    }
-
-    LLVMMoveBasicBlockAfter(check_stack, block_curr);
-
-    if (!(cmp = LLVMBuildICmp(comp_ctx->builder, LLVMIntULT,
-                              func_ctx->last_alloca, stack_bound, "cmp"))) {
-        aot_set_last_error("llvm build icmp failed.");
-        return false;
-    }
-
-    if (!aot_emit_exception(comp_ctx, func_ctx, EXCE_NATIVE_STACK_OVERFLOW,
-                            true, cmp, check_stack)) {
-        return false;
-    }
-
-    LLVMPositionBuilderAtEnd(comp_ctx->builder, check_stack);
-    return true;
-}
-
-static bool
-check_stack(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-            uint32 callee_cell_num)
-{
-    if (comp_ctx->enable_stack_estimation
-        && !record_stack_usage(comp_ctx, func_ctx, callee_cell_num))
-        return false;
-    if (comp_ctx->enable_stack_bound_check
-        && !check_stack_boundary(comp_ctx, func_ctx, callee_cell_num))
-        return false;
-    return true;
-}
-
 /**
  * Check whether the app address and its buffer are inside the linear memory,
  * if no, throw exception
@@ -610,6 +473,30 @@ check_app_addr_and_convert(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
     return true;
 }
 
+static void
+aot_estimate_and_record_stack_usage_for_function_call(
+    const AOTCompContext *comp_ctx, AOTFuncContext *caller_func_ctx,
+    const AOTFuncType *callee_func_type)
+{
+    unsigned int size;
+
+    if (!(comp_ctx->enable_stack_bound_check
+          || comp_ctx->enable_stack_estimation)) {
+        return;
+    }
+
+    size =
+        aot_estimate_stack_usage_for_function_call(comp_ctx, callee_func_type);
+    /*
+     * only record the max value, assuming that LLVM emits machine code
+     * which rewinds the stack before making the next call in the
+     * function.
+     */
+    if (caller_func_ctx->stack_consumption_for_func_call < size) {
+        caller_func_ctx->stack_consumption_for_func_call = size;
+    }
+}
+
 bool
 aot_compile_op_call(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
                     uint32 func_idx, bool tail_call)
@@ -620,7 +507,6 @@ aot_compile_op_call(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
     uint32 ext_ret_cell_num = 0, cell_num = 0;
     AOTFuncContext **func_ctxes = comp_ctx->func_ctxes;
     AOTFuncType *func_type;
-    AOTFunc *aot_func;
     LLVMTypeRef *param_types = NULL, ret_type;
     LLVMTypeRef ext_ret_ptr_type;
     LLVMValueRef *param_values = NULL, value_ret = NULL, func;
@@ -628,7 +514,6 @@ aot_compile_op_call(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
     LLVMValueRef ext_ret, ext_ret_ptr, ext_ret_idx;
     int32 i, j = 0, param_count, result_count, ext_ret_count;
     uint64 total_size;
-    uint32 callee_cell_num;
     uint8 wasm_ret_type;
     uint8 *ext_ret_types = NULL;
     const char *signature = NULL;
@@ -658,6 +543,8 @@ aot_compile_op_call(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
         func_type =
             func_ctxes[func_idx - import_func_count]->aot_func->func_type;
     }
+    aot_estimate_and_record_stack_usage_for_function_call(comp_ctx, func_ctx,
+                                                          func_type);
 
     /* Get param cell number */
     param_cell_num = func_type->param_cell_num;
@@ -885,15 +772,17 @@ aot_compile_op_call(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
         else {
             if (func_ctxes[func_idx - import_func_count] == func_ctx) {
                 /* recursive call */
-                func = func_ctx->func;
+                func = func_ctx->precheck_func;
             }
             else {
                 if (!comp_ctx->is_jit_mode) {
-                    func = func_ctxes[func_idx - import_func_count]->func;
+                    func =
+                        func_ctxes[func_idx - import_func_count]->precheck_func;
                 }
                 else {
 #if !(WASM_ENABLE_FAST_JIT != 0 && WASM_ENABLE_LAZY_JIT != 0)
-                    func = func_ctxes[func_idx - import_func_count]->func;
+                    func =
+                        func_ctxes[func_idx - import_func_count]->precheck_func;
 #else
                     /* JIT tier-up, load func ptr from func_ptrs[func_idx] */
                     LLVMValueRef func_ptr, func_idx_const;
@@ -938,13 +827,6 @@ aot_compile_op_call(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
             }
         }
 
-        aot_func = func_ctxes[func_idx - import_func_count]->aot_func;
-        callee_cell_num =
-            aot_func->param_cell_num + aot_func->local_cell_num + 1;
-
-        if (!check_stack(comp_ctx, func_ctx, callee_cell_num))
-            goto fail;
-
 #if LLVM_VERSION_MAJOR >= 14
         llvm_func_type = func_ctxes[func_idx - import_func_count]->func_type;
 #endif
@@ -1213,6 +1095,8 @@ aot_compile_op_call_indirect(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
     CHECK_LLVM_CONST(ftype_idx_const);
 
     func_type = comp_ctx->comp_data->func_types[type_idx];
+    aot_estimate_and_record_stack_usage_for_function_call(comp_ctx, func_ctx,
+                                                          func_type);
     func_param_count = func_type->param_count;
     func_result_count = func_type->result_count;
 
@@ -1564,13 +1448,6 @@ aot_compile_op_call_indirect(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
     /* Translate call non-import block */
     LLVMPositionBuilderAtEnd(comp_ctx->builder, block_call_non_import);
 
-    if (!check_stack(comp_ctx, func_ctx,
-                     param_cell_num + ext_cell_num
-                         + 1
-                         /* Reserve some local variables */
-                         + 16))
-        goto fail;
-
     /* Load function pointer */
     if (!(func_ptr = LLVMBuildInBoundsGEP2(comp_ctx->builder, OPQ_PTR_TYPE,
                                            func_ctx->func_ptrs, &func_idx, 1,

+ 1 - 0
core/iwasm/compilation/aot_emit_function.h

@@ -29,6 +29,7 @@ aot_compile_op_ref_is_null(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx);
 bool
 aot_compile_op_ref_func(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
                         uint32 func_idx);
+
 #ifdef __cplusplus
 } /* end of extern "C" */
 #endif

+ 686 - 87
core/iwasm/compilation/aot_llvm.c

@@ -14,6 +14,15 @@
 #include "debug/dwarf_extractor.h"
 #endif
 
+static bool
+create_native_symbol(const AOTCompContext *comp_ctx, AOTFuncContext *func_ctx);
+static bool
+create_native_stack_bound(const AOTCompContext *comp_ctx,
+                          AOTFuncContext *func_ctx);
+static bool
+create_native_stack_top_min(const AOTCompContext *comp_ctx,
+                            AOTFuncContext *func_ctx);
+
 LLVMTypeRef
 wasm_type_to_llvm_type(const AOTLLVMTypes *llvm_types, uint8 wasm_type)
 {
@@ -38,17 +47,474 @@ wasm_type_to_llvm_type(const AOTLLVMTypes *llvm_types, uint8 wasm_type)
     return NULL;
 }
 
+static LLVMValueRef
+aot_add_llvm_func1(const AOTCompContext *comp_ctx, LLVMModuleRef module,
+                   uint32 func_index, uint32 param_count, LLVMTypeRef func_type,
+                   const char *prefix)
+{
+    char func_name[48];
+    LLVMValueRef func;
+    LLVMValueRef local_value;
+    uint32 i, j;
+
+    /* Add LLVM function */
+    snprintf(func_name, sizeof(func_name), "%s%d", prefix, func_index);
+    if (!(func = LLVMAddFunction(module, func_name, func_type))) {
+        aot_set_last_error("add LLVM function failed.");
+        return NULL;
+    }
+
+    j = 0;
+    local_value = LLVMGetParam(func, j++);
+    LLVMSetValueName(local_value, "exec_env");
+
+    /* Set parameter names */
+    for (i = 0; i < param_count; i++) {
+        local_value = LLVMGetParam(func, j++);
+        LLVMSetValueName(local_value, "");
+    }
+
+    return func;
+}
+
+/*
+ * create a basic func_ctx enough to call aot_emit_exception.
+ *
+ * that is:
+ * - exec_env
+ * - aot_inst
+ * - native_symbol (if is_indirect_mode)
+ */
+static bool
+create_basic_func_context(const AOTCompContext *comp_ctx,
+                          AOTFuncContext *func_ctx)
+{
+    LLVMValueRef aot_inst_offset = I32_TWO, aot_inst_addr;
+
+    /* Save the pameters for fast access */
+    func_ctx->exec_env = LLVMGetParam(func_ctx->func, 0);
+
+    /* Get aot inst address, the layout of exec_env is:
+       exec_env->next, exec_env->prev, exec_env->module_inst, and argv_buf */
+    if (!(aot_inst_addr = LLVMBuildInBoundsGEP2(
+              comp_ctx->builder, OPQ_PTR_TYPE, func_ctx->exec_env,
+              &aot_inst_offset, 1, "aot_inst_addr"))) {
+        aot_set_last_error("llvm build in bounds gep failed");
+        goto fail;
+    }
+
+    /* Load aot inst */
+    if (!(func_ctx->aot_inst = LLVMBuildLoad2(comp_ctx->builder, OPQ_PTR_TYPE,
+                                              aot_inst_addr, "aot_inst"))) {
+        aot_set_last_error("llvm build load failed");
+        goto fail;
+    }
+
+    if (comp_ctx->is_indirect_mode
+        && !create_native_symbol(comp_ctx, func_ctx)) {
+        goto fail;
+    }
+
+    return true;
+fail:
+    return false;
+}
+
+/*
+ * return if the "precheck" wrapper function can use tail call optimization
+ */
+bool
+aot_target_precheck_can_use_musttail(const AOTCompContext *comp_ctx)
+{
+    if (!strcmp(comp_ctx->target_arch, "xtensa")) {
+        /*
+         * xtensa windowed ABI doesn't have tail call optimization.
+         *
+         * Note: as of writing this, the xtensa version of LLVM
+         * simply ignores the musttail attribute.
+         * https://github.com/espressif/llvm-project/pull/73
+         */
+        return false;
+    }
+    if (!strcmp(comp_ctx->target_arch, "riscv32")
+        || !strcmp(comp_ctx->target_arch, "riscv64")) {
+        /*
+         * REVISIT: actually, riscv can use tail call optimization
+         * in some cases. I (yamamoto) don't know the exact conditions
+         * though.
+         */
+        return false;
+    }
+    /*
+     * x86-64/i386: true
+     *
+     * others: assume true for now
+     */
+    return true;
+}
+
+unsigned int
+aot_estimate_stack_usage_for_function_call(const AOTCompContext *comp_ctx,
+                                           const AOTFuncType *callee_func_type)
+{
+    /*
+     * Estimate how much stack is necessary to make a function call.
+     * This does not include the stack consumption of the callee function.
+     *
+     * For precise estimation, ideally this function needs to be
+     * target-specific.
+     * However, this implementation aims to be target-independent,
+     * allowing a small overstimation, which is probably ok for our purpose.
+     * (overflow detection and memory profiling)
+     * On the other hand, an underestimation should be avoided as it
+     * can cause more serious problems like silent data corruptions.
+     *
+     * Assumptions:
+     *
+     * - the first result is returned via a register.
+     *
+     * - all parameters, including exec_env and pointers to non-first
+     *   results, are passed via stack.
+     *   (this is a bit pessimistic than many of real calling conventions,
+     *   where some of parameters are passed via register.)
+     *
+     * - N-byte value needs N-byte alignment on stack.
+     *
+     * - a value smaller than a pointer is extended.
+     *   (eg. 4 byte values are extended to 8 byte on x86-64.)
+     */
+
+    const unsigned int param_count = callee_func_type->param_count;
+    const unsigned int result_count = callee_func_type->result_count;
+    unsigned int size = 0;
+    unsigned int i;
+    unsigned int nb;
+
+    if (!strcmp(comp_ctx->target_arch, "xtensa")) {
+        /*
+         * In the xtensa windowed ABI, outgoing arguments are already
+         * included in the callee's stack frame size, which equals to
+         * the operand of the ENTRY instruction and what LLVM
+         * MFI->getStackSize returns.
+         */
+        return 0;
+    }
+
+    /* exec_env */
+    size = comp_ctx->pointer_size;
+
+    /* parameters */
+    for (i = 0; i < param_count; i++) {
+        nb = wasm_value_type_cell_num(callee_func_type->types[i]) * 4;
+        if (nb < comp_ctx->pointer_size) {
+            nb = comp_ctx->pointer_size;
+        }
+        size = align_uint(size, nb) + nb;
+    }
+
+    /* pointers to results */
+    nb = comp_ctx->pointer_size;
+    for (i = 1; i < result_count; i++) {
+        size = align_uint(size, nb) + nb;
+    }
+
+    /* return address */
+    nb = comp_ctx->pointer_size;
+    size = align_uint(size, nb) + nb;
+
+    /*
+     * some extra for possible arch-dependent things like
+     * 16-byte alignment for x86_64.
+     */
+    size += 16;
+    return size;
+}
+
+/*
+ * a "precheck" function performs a few things before calling wrapped_func.
+ *
+ * - update native_stack_top_min if necessary
+ * - stack overflow check (if it does, trap)
+ */
+static LLVMValueRef
+aot_add_precheck_function(AOTCompContext *comp_ctx, LLVMModuleRef module,
+                          uint32 func_index, uint32 orig_param_count,
+                          LLVMTypeRef func_type, LLVMValueRef wrapped_func)
+{
+    LLVMValueRef precheck_func;
+    LLVMBasicBlockRef begin = NULL;
+    LLVMBasicBlockRef check_top_block = NULL;
+    LLVMBasicBlockRef update_top_block = NULL;
+    LLVMBasicBlockRef stack_bound_check_block = NULL;
+    LLVMBasicBlockRef call_wrapped_func_block = NULL;
+    LLVMValueRef *params = NULL;
+
+    precheck_func =
+        aot_add_llvm_func1(comp_ctx, module, func_index, orig_param_count,
+                           func_type, AOT_FUNC_PREFIX);
+    if (!precheck_func) {
+        goto fail;
+    }
+    begin = LLVMAppendBasicBlockInContext(comp_ctx->context, precheck_func,
+                                          "begin");
+    check_top_block = LLVMAppendBasicBlockInContext(
+        comp_ctx->context, precheck_func, "check_top_block");
+    if (comp_ctx->enable_stack_estimation) {
+        update_top_block = LLVMAppendBasicBlockInContext(
+            comp_ctx->context, precheck_func, "update_top_block");
+        if (!update_top_block) {
+            goto fail;
+        }
+    }
+    stack_bound_check_block = LLVMAppendBasicBlockInContext(
+        comp_ctx->context, precheck_func, "stack_bound_check_block");
+    call_wrapped_func_block = LLVMAppendBasicBlockInContext(
+        comp_ctx->context, precheck_func, "call_wrapped_func");
+    if (!begin || !check_top_block || !stack_bound_check_block
+        || !call_wrapped_func_block) {
+        goto fail;
+    }
+    LLVMBuilderRef b = comp_ctx->builder;
+    LLVMPositionBuilderAtEnd(b, begin);
+
+    /* create a temporary minimum func_ctx */
+    AOTFuncContext tmp;
+    AOTFuncContext *func_ctx = &tmp;
+    memset(func_ctx, 0, sizeof(*func_ctx));
+    func_ctx->func = precheck_func;
+    func_ctx->module = module;
+    func_ctx->aot_func = comp_ctx->comp_data->funcs[func_index];
+#if WASM_ENABLE_DEBUG_AOT != 0
+    func_ctx->debug_func = NULL;
+#endif
+    if (!create_basic_func_context(comp_ctx, func_ctx))
+        goto fail;
+    if (comp_ctx->enable_stack_bound_check
+        && !create_native_stack_bound(comp_ctx, func_ctx))
+        goto fail;
+    if (comp_ctx->enable_stack_estimation
+        && !create_native_stack_top_min(comp_ctx, func_ctx)) {
+        goto fail;
+    }
+
+    unsigned int param_count = LLVMCountParams(precheck_func);
+    uint64 sz = param_count * sizeof(LLVMValueRef);
+    params = wasm_runtime_malloc(sz);
+    if (params == NULL) {
+        goto fail;
+    }
+    LLVMGetParams(precheck_func, params);
+
+    const bool is_64bit = comp_ctx->pointer_size == sizeof(uint64);
+    LLVMTypeRef uintptr_type;
+    if (is_64bit)
+        uintptr_type = I64_TYPE;
+    else
+        uintptr_type = I32_TYPE;
+
+    /*
+     * load the stack pointer
+     */
+    LLVMValueRef sp_ptr = LLVMBuildAlloca(b, I32_TYPE, "sp_ptr");
+    if (!sp_ptr) {
+        goto fail;
+    }
+    LLVMValueRef sp = LLVMBuildPtrToInt(b, sp_ptr, uintptr_type, "sp");
+    if (!sp) {
+        goto fail;
+    }
+
+    /*
+     * load the value for this wrapped function from the stack_sizes array
+     */
+    LLVMValueRef func_index_const = I32_CONST(func_index);
+    LLVMValueRef sizes =
+        LLVMBuildBitCast(b, comp_ctx->stack_sizes, INT32_PTR_TYPE, "sizes");
+    if (!sizes) {
+        goto fail;
+    }
+    LLVMValueRef sizep = LLVMBuildInBoundsGEP2(b, I32_TYPE, sizes,
+                                               &func_index_const, 1, "sizep");
+    if (!sizep) {
+        goto fail;
+    }
+    LLVMValueRef size32 = LLVMBuildLoad2(b, I32_TYPE, sizep, "size32");
+    if (!size32) {
+        goto fail;
+    }
+    LLVMValueRef size;
+    if (is_64bit) {
+        size = LLVMBuildZExt(b, size32, uintptr_type, "size");
+        if (!size) {
+            goto fail;
+        }
+    }
+    else {
+        size = size32;
+    }
+    /*
+     * calculate new sp
+     */
+    LLVMValueRef underflow =
+        LLVMBuildICmp(b, LLVMIntULT, sp, size, "underflow");
+    if (!underflow) {
+        goto fail;
+    }
+    LLVMValueRef new_sp = LLVMBuildSub(b, sp, size, "new_sp");
+    if (!new_sp) {
+        goto fail;
+    }
+    if (!LLVMBuildBr(b, check_top_block)) {
+        goto fail;
+    }
+
+    LLVMPositionBuilderAtEnd(b, check_top_block);
+    if (comp_ctx->enable_stack_estimation) {
+        /*
+         * load native_stack_top_min from the exec_env
+         */
+        LLVMValueRef top_min =
+            LLVMBuildLoad2(b, OPQ_PTR_TYPE, func_ctx->native_stack_top_min_addr,
+                           "native_stack_top_min");
+        if (!top_min) {
+            goto fail;
+        }
+        LLVMValueRef top_min_int = LLVMBuildPtrToInt(
+            b, top_min, uintptr_type, "native_stack_top_min_int");
+        if (!top_min_int) {
+            goto fail;
+        }
+
+        bh_assert(update_top_block);
+
+        /*
+         * update native_stack_top_min if
+         * new_sp = sp - size < native_stack_top_min
+         *
+         * Note: unless the stack has already overflown in this exec_env,
+         * native_stack_bound <= native_stack_top_min
+         */
+        LLVMValueRef cmp_top =
+            LLVMBuildICmp(b, LLVMIntULT, new_sp, top_min_int, "cmp_top");
+        if (!cmp_top) {
+            goto fail;
+        }
+        cmp_top = LLVMBuildOr(b, underflow, cmp_top, "cmp_top2");
+        if (!cmp_top) {
+            goto fail;
+        }
+        if (!LLVMBuildCondBr(b, cmp_top, update_top_block,
+                             call_wrapped_func_block)) {
+            aot_set_last_error("llvm build cond br failed.");
+            goto fail;
+        }
+
+        /*
+         * update native_stack_top_min
+         */
+        LLVMPositionBuilderAtEnd(b, update_top_block);
+        LLVMValueRef new_sp_ptr =
+            LLVMBuildIntToPtr(b, new_sp, INT8_PTR_TYPE, "new_sp_ptr");
+        if (!new_sp_ptr) {
+            goto fail;
+        }
+        if (!LLVMBuildStore(b, new_sp_ptr,
+                            func_ctx->native_stack_top_min_addr)) {
+            goto fail;
+        }
+        if (!LLVMBuildBr(b, stack_bound_check_block)) {
+            goto fail;
+        }
+    }
+    else {
+        if (!LLVMBuildBr(b, stack_bound_check_block)) {
+            goto fail;
+        }
+    }
+
+    LLVMPositionBuilderAtEnd(b, stack_bound_check_block);
+    if (comp_ctx->enable_stack_bound_check) {
+        /*
+         * trap if new_sp < native_stack_bound
+         */
+        LLVMValueRef bound_int = LLVMBuildPtrToInt(
+            b, func_ctx->native_stack_bound, uintptr_type, "bound_base_int");
+        if (!bound_int) {
+            goto fail;
+        }
+        LLVMValueRef cmp =
+            LLVMBuildICmp(b, LLVMIntULT, new_sp, bound_int, "cmp");
+        if (!cmp) {
+            goto fail;
+        }
+        cmp = LLVMBuildOr(b, underflow, cmp, "cmp2");
+        if (!cmp) {
+            goto fail;
+        }
+        /* todo: @llvm.expect.i1(i1 %cmp, i1 0) */
+        if (!aot_emit_exception(comp_ctx, func_ctx, EXCE_NATIVE_STACK_OVERFLOW,
+                                true, cmp, call_wrapped_func_block))
+            goto fail;
+    }
+    else {
+        if (!LLVMBuildBr(b, call_wrapped_func_block)) {
+            goto fail;
+        }
+    }
+
+    /*
+     * call the wrapped function
+     * use a tail-call if possible
+     */
+    LLVMPositionBuilderAtEnd(b, call_wrapped_func_block);
+    const char *name = "tail_call";
+    LLVMTypeRef ret_type = LLVMGetReturnType(func_type);
+    if (ret_type == VOID_TYPE) {
+        name = "";
+    }
+    LLVMValueRef retval =
+        LLVMBuildCall2(b, func_type, wrapped_func, params, param_count, name);
+    if (!retval) {
+        goto fail;
+    }
+    wasm_runtime_free(params);
+    params = NULL;
+    if (aot_target_precheck_can_use_musttail(comp_ctx)) {
+        LLVMSetTailCallKind(retval, LLVMTailCallKindMustTail);
+    }
+    else {
+        LLVMSetTailCallKind(retval, LLVMTailCallKindTail);
+    }
+    if (ret_type == VOID_TYPE) {
+        if (!LLVMBuildRetVoid(b)) {
+            goto fail;
+        }
+    }
+    else {
+        if (!LLVMBuildRet(b, retval)) {
+            goto fail;
+        }
+    }
+
+    return precheck_func;
+fail:
+    if (params != NULL) {
+        wasm_runtime_free(params);
+    }
+    aot_set_last_error("failed to build precheck wrapper function.");
+    return NULL;
+}
+
 /**
  * Add LLVM function
  */
 static LLVMValueRef
-aot_add_llvm_func(const AOTCompContext *comp_ctx, LLVMModuleRef module,
+aot_add_llvm_func(AOTCompContext *comp_ctx, LLVMModuleRef module,
                   const AOTFuncType *aot_func_type, uint32 func_index,
-                  LLVMTypeRef *p_func_type)
+                  LLVMTypeRef *p_func_type, LLVMValueRef *p_precheck_func)
 {
     LLVMValueRef func = NULL;
     LLVMTypeRef *param_types, ret_type, func_type;
-    LLVMValueRef local_value;
     LLVMTypeRef func_type_wrapper;
     LLVMValueRef func_wrapper;
     LLVMBasicBlockRef func_begin;
@@ -101,21 +567,44 @@ aot_add_llvm_func(const AOTCompContext *comp_ctx, LLVMModuleRef module,
         goto fail;
     }
 
-    /* Add LLVM function */
-    snprintf(func_name, sizeof(func_name), "%s%d", AOT_FUNC_PREFIX, func_index);
-    if (!(func = LLVMAddFunction(module, func_name, func_type))) {
-        aot_set_last_error("add LLVM function failed.");
+    bh_assert(func_index < comp_ctx->func_ctx_count);
+    bh_assert(LLVMGetReturnType(func_type) == ret_type);
+    const char *prefix = AOT_FUNC_PREFIX;
+    const bool need_precheck =
+        comp_ctx->enable_stack_bound_check || comp_ctx->enable_stack_estimation;
+    if (need_precheck) {
+        /*
+         * REVISIT: probably this breaks windows hw bound check
+         * (the RtlAddFunctionTable stuff)
+         */
+        prefix = AOT_FUNC_INTERNAL_PREFIX;
+    }
+    if (!(func = aot_add_llvm_func1(comp_ctx, module, func_index,
+                                    aot_func_type->param_count, func_type,
+                                    prefix)))
         goto fail;
-    }
 
-    j = 0;
-    local_value = LLVMGetParam(func, j++);
-    LLVMSetValueName(local_value, "exec_env");
-
-    /* Set parameter names */
-    for (i = 0; i < aot_func_type->param_count; i++) {
-        local_value = LLVMGetParam(func, j++);
-        LLVMSetValueName(local_value, "");
+    if (need_precheck) {
+        if (!comp_ctx->is_jit_mode)
+            LLVMSetLinkage(func, LLVMInternalLinkage);
+        unsigned int kind =
+            LLVMGetEnumAttributeKindForName("noinline", strlen("noinline"));
+        LLVMAttributeRef attr_noinline =
+            LLVMCreateEnumAttribute(comp_ctx->context, kind, 0);
+        LLVMAddAttributeAtIndex(func, LLVMAttributeFunctionIndex,
+                                attr_noinline);
+
+        LLVMValueRef precheck_func = aot_add_precheck_function(
+            comp_ctx, module, func_index, aot_func_type->param_count, func_type,
+            func);
+        if (!precheck_func)
+            goto fail;
+        LLVMAddAttributeAtIndex(precheck_func, LLVMAttributeFunctionIndex,
+                                attr_noinline);
+        *p_precheck_func = precheck_func;
+    }
+    else {
+        *p_precheck_func = func;
     }
 
     if (p_func_type)
@@ -454,27 +943,6 @@ create_local_variables(const AOTCompData *comp_data,
         }
     }
 
-    if (comp_ctx->enable_stack_bound_check
-        || comp_ctx->enable_stack_estimation) {
-        if (aot_func_type->param_count + func->local_count > 0) {
-            func_ctx->last_alloca = func_ctx->locals[aot_func_type->param_count
-                                                     + func->local_count - 1];
-            if (!(func_ctx->last_alloca =
-                      LLVMBuildBitCast(comp_ctx->builder, func_ctx->last_alloca,
-                                       INT8_PTR_TYPE, "stack_ptr"))) {
-                aot_set_last_error("llvm build bit cast failed.");
-                return false;
-            }
-        }
-        else {
-            if (!(func_ctx->last_alloca = LLVMBuildAlloca(
-                      comp_ctx->builder, INT8_TYPE, "stack_ptr"))) {
-                aot_set_last_error("llvm build alloca failed.");
-                return false;
-            }
-        }
-    }
-
     return true;
 }
 
@@ -904,6 +1372,90 @@ create_func_ptrs(const AOTCompContext *comp_ctx, AOTFuncContext *func_ctx)
     return true;
 }
 
+const char *aot_stack_sizes_name = AOT_STACK_SIZES_NAME;
+const char *aot_stack_sizes_alias_name = AOT_STACK_SIZES_ALIAS_NAME;
+const char *aot_stack_sizes_section_name = AOT_STACK_SIZES_SECTION_NAME;
+
+static bool
+aot_create_stack_sizes(const AOTCompData *comp_data, AOTCompContext *comp_ctx)
+{
+    LLVMValueRef stack_sizes, *values, array, alias;
+    LLVMTypeRef stack_sizes_type;
+#if LLVM_VERSION_MAJOR <= 13
+    LLVMTypeRef alias_type;
+#endif
+    uint64 size;
+    uint32 i;
+
+    stack_sizes_type = LLVMArrayType(I32_TYPE, comp_data->func_count);
+    if (!stack_sizes_type) {
+        aot_set_last_error("failed to create stack_sizes type.");
+        return false;
+    }
+
+    stack_sizes =
+        LLVMAddGlobal(comp_ctx->module, stack_sizes_type, aot_stack_sizes_name);
+    if (!stack_sizes) {
+        aot_set_last_error("failed to create stack_sizes global.");
+        return false;
+    }
+
+    size = sizeof(LLVMValueRef) * comp_data->func_count;
+    if (size >= UINT32_MAX || !(values = wasm_runtime_malloc((uint32)size))) {
+        aot_set_last_error("allocate memory failed.");
+        return false;
+    }
+
+    for (i = 0; i < comp_data->func_count; i++) {
+        /*
+         * This value is a placeholder, which will be replaced
+         * after the corresponding functions are compiled.
+         *
+         * Don't use zeros becasue LLVM can optimize them to
+         * zeroinitializer.
+         */
+        values[i] = I32_NEG_ONE;
+    }
+
+    array = LLVMConstArray(I32_TYPE, values, comp_data->func_count);
+    wasm_runtime_free(values);
+    if (!array) {
+        aot_set_last_error("failed to create stack_sizes initializer.");
+        return false;
+    }
+    LLVMSetInitializer(stack_sizes, array);
+
+    /*
+     * create an alias so that aot_resolve_stack_sizes can find it.
+     */
+#if LLVM_VERSION_MAJOR > 13
+    alias = LLVMAddAlias2(comp_ctx->module, stack_sizes_type, 0, stack_sizes,
+                          aot_stack_sizes_alias_name);
+#else
+    alias_type = LLVMPointerType(stack_sizes_type, 0);
+    if (!alias_type) {
+        aot_set_last_error("failed to create alias type.");
+        return false;
+    }
+    alias = LLVMAddAlias(comp_ctx->module, alias_type, stack_sizes,
+                         aot_stack_sizes_alias_name);
+#endif
+    if (!alias) {
+        aot_set_last_error("failed to create stack_sizes alias.");
+        return false;
+    }
+
+    /*
+     * make the original symbol internal. we mainly use this version to
+     * avoid creating extra relocations in the precheck functions.
+     */
+    LLVMSetLinkage(stack_sizes, LLVMInternalLinkage);
+    LLVMSetSection(stack_sizes, aot_stack_sizes_section_name);
+    comp_ctx->stack_sizes_type = stack_sizes_type;
+    comp_ctx->stack_sizes = stack_sizes;
+    return true;
+}
+
 /**
  * Create function compiler context
  */
@@ -917,7 +1469,6 @@ aot_create_func_context(const AOTCompData *comp_data, AOTCompContext *comp_ctx,
     WASMFunction *wasm_func = module->functions[func_index];
     AOTBlock *aot_block;
     LLVMTypeRef int8_ptr_type;
-    LLVMValueRef aot_inst_offset = I32_TWO, aot_inst_addr;
     uint64 size;
 
     /* Allocate memory for the function context */
@@ -935,9 +1486,9 @@ aot_create_func_context(const AOTCompData *comp_data, AOTCompContext *comp_ctx,
     func_ctx->module = comp_ctx->module;
 
     /* Add LLVM function */
-    if (!(func_ctx->func =
-              aot_add_llvm_func(comp_ctx, func_ctx->module, aot_func_type,
-                                func_index, &func_ctx->func_type))) {
+    if (!(func_ctx->func = aot_add_llvm_func(
+              comp_ctx, func_ctx->module, aot_func_type, func_index,
+              &func_ctx->func_type, &func_ctx->precheck_func))) {
         goto fail;
     }
 
@@ -956,22 +1507,7 @@ aot_create_func_context(const AOTCompData *comp_data, AOTCompContext *comp_ctx,
     /* Add local variables */
     LLVMPositionBuilderAtEnd(comp_ctx->builder, aot_block->llvm_entry_block);
 
-    /* Save the pameters for fast access */
-    func_ctx->exec_env = LLVMGetParam(func_ctx->func, 0);
-
-    /* Get aot inst address, the layout of exec_env is:
-       exec_env->next, exec_env->prev, exec_env->module_inst, and argv_buf */
-    if (!(aot_inst_addr = LLVMBuildInBoundsGEP2(
-              comp_ctx->builder, OPQ_PTR_TYPE, func_ctx->exec_env,
-              &aot_inst_offset, 1, "aot_inst_addr"))) {
-        aot_set_last_error("llvm build in bounds gep failed");
-        goto fail;
-    }
-
-    /* Load aot inst */
-    if (!(func_ctx->aot_inst = LLVMBuildLoad2(comp_ctx->builder, OPQ_PTR_TYPE,
-                                              aot_inst_addr, "aot_inst"))) {
-        aot_set_last_error("llvm build load failed");
+    if (!create_basic_func_context(comp_ctx, func_ctx)) {
         goto fail;
     }
 
@@ -980,28 +1516,12 @@ aot_create_func_context(const AOTCompData *comp_data, AOTCompContext *comp_ctx,
         goto fail;
     }
 
-    /* Get native stack boundary address */
-    if (comp_ctx->enable_stack_bound_check
-        && !create_native_stack_bound(comp_ctx, func_ctx)) {
-        goto fail;
-    }
-    if (comp_ctx->enable_stack_estimation
-        && !create_native_stack_top_min(comp_ctx, func_ctx)) {
-        goto fail;
-    }
-
     /* Get auxiliary stack info */
     if (wasm_func->has_op_set_global_aux_stack
         && !create_aux_stack_info(comp_ctx, func_ctx)) {
         goto fail;
     }
 
-    /* Get native symbol list */
-    if (comp_ctx->is_indirect_mode
-        && !create_native_symbol(comp_ctx, func_ctx)) {
-        goto fail;
-    }
-
     /* Create local variables */
     if (!create_local_variables(comp_data, comp_ctx, func_ctx, func)) {
         goto fail;
@@ -1070,6 +1590,11 @@ aot_create_func_contexts(const AOTCompData *comp_data, AOTCompContext *comp_ctx)
     uint64 size;
     uint32 i;
 
+    if ((comp_ctx->enable_stack_bound_check
+         || comp_ctx->enable_stack_estimation)
+        && !aot_create_stack_sizes(comp_data, comp_ctx))
+        return NULL;
+
     /* Allocate memory */
     size = sizeof(AOTFuncContext *) * (uint64)comp_data->func_count;
     if (size >= UINT32_MAX
@@ -1483,6 +2008,55 @@ fail:
     return ret;
 }
 
+static void
+jit_stack_size_callback(void *user_data, const char *name, size_t namelen,
+                        size_t stack_size)
+{
+    AOTCompContext *comp_ctx = user_data;
+    /*
+     * Note: the longest name we care is
+     * something like "aot_func_internal#4294967295".
+     */
+    char buf[64];
+    uint32 func_idx;
+    const AOTFuncContext *func_ctx;
+    bool musttail;
+    unsigned int stack_consumption_to_call_wrapped_func;
+    unsigned int call_size;
+    int ret;
+
+    bh_assert(comp_ctx != NULL);
+    bh_assert(comp_ctx->jit_stack_sizes != NULL);
+
+    if (namelen >= sizeof(buf)) {
+        LOG_DEBUG("too long name: %.*s", (int)namelen, name);
+        return;
+    }
+    /* ensure NUL termination */
+    bh_memcpy_s(buf, sizeof(buf), name, namelen);
+    buf[namelen] = 0;
+
+    ret = sscanf(buf, AOT_FUNC_INTERNAL_PREFIX "%" SCNu32, &func_idx);
+    if (ret != 1) {
+        return;
+    }
+
+    bh_assert(func_idx < comp_ctx->func_ctx_count);
+    func_ctx = comp_ctx->func_ctxes[func_idx];
+    call_size = func_ctx->stack_consumption_for_func_call;
+    musttail = aot_target_precheck_can_use_musttail(comp_ctx);
+    stack_consumption_to_call_wrapped_func =
+        musttail ? 0
+                 : aot_estimate_stack_usage_for_function_call(
+                     comp_ctx, func_ctx->aot_func->func_type);
+    LOG_VERBOSE("func %.*s stack %u + %zu + %u", (int)namelen, name,
+                stack_consumption_to_call_wrapped_func, stack_size, call_size);
+
+    /* Note: -1 == AOT_NEG_ONE from aot_create_stack_sizes */
+    bh_assert(comp_ctx->jit_stack_sizes[func_idx] == (uint32)-1);
+    comp_ctx->jit_stack_sizes[func_idx] = stack_size + call_size;
+}
+
 static bool
 orc_jit_create(AOTCompContext *comp_ctx)
 {
@@ -1498,6 +2072,10 @@ orc_jit_create(AOTCompContext *comp_ctx)
         goto fail;
     }
 
+    if (comp_ctx->enable_stack_bound_check || comp_ctx->enable_stack_estimation)
+        LLVMOrcLLJITBuilderSetCompileFuncitonCreatorWithStackSizesCallback(
+            builder, jit_stack_size_callback, comp_ctx);
+
     err = LLVMOrcJITTargetMachineBuilderDetectHost(&jtmb);
     if (err != LLVMErrorSuccess) {
         aot_handle_llvm_errmsg(
@@ -1688,14 +2266,6 @@ aot_create_comp_context(const AOTCompData *comp_data, aot_comp_option_t option)
     if (option->is_jit_mode) {
         comp_ctx->is_jit_mode = true;
 
-        /* Create TargetMachine */
-        if (!create_target_machine_detect_host(comp_ctx))
-            goto fail;
-
-        /* Create LLJIT Instance */
-        if (!orc_jit_create(comp_ctx))
-            goto fail;
-
 #ifndef OS_ENABLE_HW_BOUND_CHECK
         comp_ctx->enable_bound_check = true;
         /* Always enable stack boundary check if `bounds-checks`
@@ -1715,6 +2285,14 @@ aot_create_comp_context(const AOTCompData *comp_data, aot_comp_option_t option)
         comp_ctx->enable_stack_bound_check = false;
 #endif
 #endif
+
+        /* Create TargetMachine */
+        if (!create_target_machine_detect_host(comp_ctx))
+            goto fail;
+
+        /* Create LLJIT Instance */
+        if (!orc_jit_create(comp_ctx))
+            goto fail;
     }
     else {
         /* Create LLVM target machine */
@@ -2037,6 +2615,19 @@ aot_create_comp_context(const AOTCompData *comp_data, aot_comp_option_t option)
                 (option->stack_bounds_checks == 1) ? true : false;
         }
 
+        if ((comp_ctx->enable_stack_bound_check
+             || comp_ctx->enable_stack_estimation)
+            && option->stack_usage_file == NULL) {
+            if (!aot_generate_tempfile_name(
+                    "wamrc-su", "su", comp_ctx->stack_usage_temp_file,
+                    sizeof(comp_ctx->stack_usage_temp_file)))
+                goto fail;
+            comp_ctx->stack_usage_file = comp_ctx->stack_usage_temp_file;
+        }
+        else {
+            comp_ctx->stack_usage_file = option->stack_usage_file;
+        }
+
         os_printf("Create AoT compiler with:\n");
         os_printf("  target:        %s\n", comp_ctx->target_arch);
         os_printf("  target cpu:    %s\n", cpu);
@@ -2095,7 +2686,7 @@ aot_create_comp_context(const AOTCompData *comp_data, aot_comp_option_t option)
         if (!(comp_ctx->target_machine = LLVMCreateTargetMachineWithOpts(
                   target, triple_norm, cpu, features, opt_level,
                   LLVMRelocStatic, code_model, false,
-                  option->stack_usage_file))) {
+                  comp_ctx->stack_usage_file))) {
             aot_set_last_error("create LLVM target machine failed.");
             goto fail;
         }
@@ -2165,6 +2756,7 @@ aot_create_comp_context(const AOTCompData *comp_data, aot_comp_option_t option)
         aot_set_last_error("create LLVM target data layout failed.");
         goto fail;
     }
+    LLVMSetModuleDataLayout(comp_ctx->module, target_data_ref);
     comp_ctx->pointer_size = LLVMPointerSize(target_data_ref);
     LLVMDisposeTargetData(target_data_ref);
 
@@ -2238,6 +2830,10 @@ aot_destroy_comp_context(AOTCompContext *comp_ctx)
     if (!comp_ctx)
         return;
 
+    if (comp_ctx->stack_usage_file == comp_ctx->stack_usage_temp_file) {
+        (void)unlink(comp_ctx->stack_usage_temp_file);
+    }
+
     if (comp_ctx->target_machine)
         LLVMDisposeTargetMachine(comp_ctx->target_machine);
 
@@ -2533,8 +3129,8 @@ aot_checked_addr_list_destroy(AOTFuncContext *func_ctx)
 }
 
 bool
-aot_build_zero_function_ret(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-                            AOTFuncType *func_type)
+aot_build_zero_function_ret(const AOTCompContext *comp_ctx,
+                            AOTFuncContext *func_ctx, AOTFuncType *func_type)
 {
     LLVMValueRef ret = NULL;
 
@@ -2573,9 +3169,12 @@ aot_build_zero_function_ret(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
         return false;
     }
 #if WASM_ENABLE_DEBUG_AOT != 0
-    LLVMMetadataRef return_location =
-        dwarf_gen_func_ret_location(comp_ctx, func_ctx);
-    LLVMInstructionSetDebugLoc(ret, return_location);
+    /* debug_func is NULL for precheck function */
+    if (func_ctx->debug_func != NULL) {
+        LLVMMetadataRef return_location =
+            dwarf_gen_func_ret_location(comp_ctx, func_ctx);
+        LLVMInstructionSetDebugLoc(ret, return_location);
+    }
 #endif
     return true;
 }

+ 21 - 5
core/iwasm/compilation/aot_llvm.h

@@ -153,9 +153,8 @@ typedef struct AOTMemInfo {
 typedef struct AOTFuncContext {
     AOTFunc *aot_func;
     LLVMValueRef func;
+    LLVMValueRef precheck_func;
     LLVMTypeRef func_type;
-    /* LLVM module for this function, note that in LAZY JIT mode,
-       each aot function belongs to an individual module */
     LLVMModuleRef module;
     AOTBlockStack block_stack;
 
@@ -167,7 +166,6 @@ typedef struct AOTFuncContext {
     LLVMValueRef aux_stack_bound;
     LLVMValueRef aux_stack_bottom;
     LLVMValueRef native_symbol;
-    LLVMValueRef last_alloca;
     LLVMValueRef func_ptrs;
 
     AOTMemInfo *mem_info;
@@ -184,6 +182,9 @@ typedef struct AOTFuncContext {
 #if WASM_ENABLE_DEBUG_AOT != 0
     LLVMMetadataRef debug_func;
 #endif
+
+    unsigned int stack_consumption_for_func_call;
+
     LLVMValueRef locals[1];
 } AOTFuncContext;
 
@@ -380,6 +381,11 @@ typedef struct AOTCompContext {
     /* LLVM floating-point exception behavior metadata */
     LLVMValueRef fp_exception_behavior;
 
+    /* a global array to store stack sizes */
+    LLVMTypeRef stack_sizes_type;
+    LLVMValueRef stack_sizes;
+    uint32 *jit_stack_sizes; /* for JIT */
+
     /* LLVM data types */
     AOTLLVMTypes basic_types;
     LLVMTypeRef exec_env_type;
@@ -408,6 +414,9 @@ typedef struct AOTCompContext {
      * file for some architecture (such as arc) */
     const char *external_asm_compiler;
     const char *asm_compiler_flags;
+
+    const char *stack_usage_file;
+    char stack_usage_temp_file[64];
 } AOTCompContext;
 
 enum {
@@ -511,8 +520,8 @@ void
 aot_checked_addr_list_destroy(AOTFuncContext *func_ctx);
 
 bool
-aot_build_zero_function_ret(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-                            AOTFuncType *func_type);
+aot_build_zero_function_ret(const AOTCompContext *comp_ctx,
+                            AOTFuncContext *func_ctx, AOTFuncType *func_type);
 
 LLVMValueRef
 aot_call_llvm_intrinsic(const AOTCompContext *comp_ctx,
@@ -556,6 +565,13 @@ bool
 aot_set_cond_br_weights(AOTCompContext *comp_ctx, LLVMValueRef cond_br,
                         int32 weights_true, int32 weights_false);
 
+bool
+aot_target_precheck_can_use_musttail(const AOTCompContext *comp_ctx);
+
+unsigned int
+aot_estimate_stack_usage_for_function_call(const AOTCompContext *comp_ctx,
+                                           const AOTFuncType *callee_func_type);
+
 #ifdef __cplusplus
 } /* end of extern "C" */
 #endif

+ 12 - 2
core/iwasm/compilation/aot_llvm_extra.cpp

@@ -5,6 +5,8 @@
 
 #include <llvm/Passes/StandardInstrumentations.h>
 #include <llvm/Support/Error.h>
+#include <llvm/ADT/None.h>
+#include <llvm/ADT/Optional.h>
 #include <llvm/ADT/SmallVector.h>
 #include <llvm/ADT/Twine.h>
 #include <llvm/ADT/Triple.h>
@@ -233,7 +235,11 @@ aot_apply_llvm_new_pass_manager(AOTCompContext *comp_ctx, LLVMModuleRef module)
     PTO.SLPVectorization = true;
     PTO.LoopUnrolling = true;
 
-    Optional<PGOOptions> PGO = None;
+#if LLVM_VERSION_MAJOR >= 16
+    Optional<PGOOptions> PGO = std::nullopt;
+#else
+    Optional<PGOOptions> PGO = llvm::None;
+#endif
     if (comp_ctx->enable_llvm_pgo) {
         /* Disable static counter allocation for value profiler,
            it will be allocated by runtime */
@@ -348,12 +354,16 @@ aot_apply_llvm_new_pass_manager(AOTCompContext *comp_ctx, LLVMModuleRef module)
         FPM.addPass(LoadStoreVectorizerPass());
 
         if (comp_ctx->enable_llvm_pgo || comp_ctx->use_prof_file) {
-            LICMOptions licm_opt;
             /* LICM pass: loop invariant code motion, attempting to remove
                as much code from the body of a loop as possible. Experiments
                show it is good to enable it when pgo is enabled. */
+#if LLVM_VERSION_MAJOR >= 15
+            LICMOptions licm_opt;
             FPM.addPass(
                 createFunctionToLoopPassAdaptor(LICMPass(licm_opt), true));
+#else
+            FPM.addPass(createFunctionToLoopPassAdaptor(LICMPass(), true));
+#endif
         }
 
         /*

+ 44 - 0
core/iwasm/compilation/aot_llvm_extra2.cpp

@@ -4,7 +4,14 @@
  */
 
 #include <llvm-c/TargetMachine.h>
+#include <llvm/ADT/None.h>
+#include <llvm/ADT/Optional.h>
+#include <llvm/IR/Instructions.h>
+#if LLVM_VERSION_MAJOR >= 14
 #include <llvm/MC/TargetRegistry.h>
+#else
+#include <llvm/Support/TargetRegistry.h>
+#endif
 #include <llvm/Target/TargetMachine.h>
 
 #include "bh_assert.h"
@@ -16,7 +23,11 @@ convert(LLVMRelocMode reloc_mode)
 {
     switch (reloc_mode) {
         case LLVMRelocDefault:
+#if LLVM_VERSION_MAJOR >= 16
+            return std::nullopt;
+#else
             return llvm::None;
+#endif
         case LLVMRelocStatic:
             return llvm::Reloc::Static;
         case LLVMRelocPIC:
@@ -31,7 +42,11 @@ convert(LLVMRelocMode reloc_mode)
             return llvm::Reloc::ROPI_RWPI;
     }
     bh_assert(0);
+#if LLVM_VERSION_MAJOR >= 16
+    return std::nullopt;
+#else
     return llvm::None;
+#endif
 }
 
 static llvm::CodeGenOpt::Level
@@ -57,10 +72,18 @@ convert(LLVMCodeModel code_model, bool *jit)
     *jit = false;
     switch (code_model) {
         case LLVMCodeModelDefault:
+#if LLVM_VERSION_MAJOR >= 16
+            return std::nullopt;
+#else
             return llvm::None;
+#endif
         case LLVMCodeModelJITDefault:
             *jit = true;
+#if LLVM_VERSION_MAJOR >= 16
+            return std::nullopt;
+#else
             return llvm::None;
+#endif
         case LLVMCodeModelTiny:
             return llvm::CodeModel::Tiny;
         case LLVMCodeModelSmall:
@@ -73,7 +96,11 @@ convert(LLVMCodeModel code_model, bool *jit)
             return llvm::CodeModel::Large;
     }
     bh_assert(0);
+#if LLVM_VERSION_MAJOR >= 16
+    return std::nullopt;
+#else
     return llvm::None;
+#endif
 }
 
 LLVMTargetMachineRef
@@ -106,3 +133,20 @@ LLVMCreateTargetMachineWithOpts(LLVMTargetRef ctarget, const char *triple,
                                                      opts, rm, cm, ol, jit);
     return reinterpret_cast<LLVMTargetMachineRef>(targetmachine);
 }
+
+/* https://reviews.llvm.org/D153107 */
+#if LLVM_VERSION_MAJOR < 17
+using namespace llvm;
+
+LLVMTailCallKind
+LLVMGetTailCallKind(LLVMValueRef Call)
+{
+    return (LLVMTailCallKind)unwrap<CallInst>(Call)->getTailCallKind();
+}
+
+void
+LLVMSetTailCallKind(LLVMValueRef Call, LLVMTailCallKind kind)
+{
+    unwrap<CallInst>(Call)->setTailCallKind((CallInst::TailCallKind)kind);
+}
+#endif

+ 17 - 0
core/iwasm/compilation/aot_llvm_extra2.h

@@ -3,6 +3,7 @@
  * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
  */
 
+#include <llvm/Config/llvm-config.h>
 #include <llvm-c/TargetMachine.h>
 
 LLVM_C_EXTERN_C_BEGIN
@@ -14,4 +15,20 @@ LLVMCreateTargetMachineWithOpts(LLVMTargetRef ctarget, const char *triple,
                                 LLVMCodeModel code_model,
                                 bool EmitStackSizeSection,
                                 const char *StackUsageOutput);
+
+/* https://reviews.llvm.org/D153107 */
+#if LLVM_VERSION_MAJOR < 17
+typedef enum {
+    LLVMTailCallKindNone = 0,
+    LLVMTailCallKindTail = 1,
+    LLVMTailCallKindMustTail = 2,
+    LLVMTailCallKindNoTail = 3,
+} LLVMTailCallKind;
+
+LLVMTailCallKind
+LLVMGetTailCallKind(LLVMValueRef CallInst);
+void
+LLVMSetTailCallKind(LLVMValueRef CallInst, LLVMTailCallKind kind);
+#endif
+
 LLVM_C_EXTERN_C_END

+ 31 - 5
core/iwasm/compilation/aot_orc_extra.cpp

@@ -8,6 +8,8 @@
 #include "llvm-c/OrcEE.h"
 #include "llvm-c/TargetMachine.h"
 
+#include "llvm/ADT/None.h"
+#include "llvm/ADT/Optional.h"
 #include "llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h"
 #include "llvm/ExecutionEngine/Orc/LLJIT.h"
 #include "llvm/ExecutionEngine/Orc/ObjectTransformLayer.h"
@@ -155,13 +157,29 @@ PartitionFunction(GlobalValueSet Requested)
             const char *wrapper;
             uint32 prefix_len = strlen(AOT_FUNC_PREFIX);
 
+            LOG_DEBUG("requested func %s", gvname);
             /* Convert "aot_func#n_wrapper" to "aot_func#n" */
-            if (strstr(gvname, AOT_FUNC_PREFIX)
-                && (wrapper = strstr(gvname + prefix_len, "_wrapper"))) {
+            if (strstr(gvname, AOT_FUNC_PREFIX)) {
                 char buf[16] = { 0 };
                 char func_name[64];
                 int group_stride, i, j;
-
+                int num;
+
+                /*
+                 * if the jit wrapper (which has "_wrapper" suffix in
+                 * the name) is requested, compile others in the group too.
+                 * otherwise, only compile the requested one.
+                 * (and possibly the correspondig wrapped function,
+                 * which has AOT_FUNC_INTERNAL_PREFIX.)
+                 */
+                wrapper = strstr(gvname + prefix_len, "_wrapper");
+                if (wrapper != NULL) {
+                    num = WASM_ORC_JIT_COMPILE_THREAD_NUM;
+                }
+                else {
+                    num = 1;
+                    wrapper = strchr(gvname + prefix_len, 0);
+                }
                 bh_assert(wrapper - (gvname + prefix_len) > 0);
                 /* Get AOT function index */
                 bh_memcpy_s(buf, (uint32)sizeof(buf), gvname + prefix_len,
@@ -171,10 +189,18 @@ PartitionFunction(GlobalValueSet Requested)
                 group_stride = WASM_ORC_JIT_BACKEND_THREAD_NUM;
 
                 /* Compile some functions each time */
-                for (j = 0; j < WASM_ORC_JIT_COMPILE_THREAD_NUM; j++) {
+                for (j = 0; j < num; j++) {
+                    Function *F1;
                     snprintf(func_name, sizeof(func_name), "%s%d",
                              AOT_FUNC_PREFIX, i + j * group_stride);
-                    Function *F1 = M->getFunction(func_name);
+                    F1 = M->getFunction(func_name);
+                    if (F1) {
+                        LOG_DEBUG("compile func %s", func_name);
+                        GVsToAdd.push_back(cast<GlobalValue>(F1));
+                    }
+                    snprintf(func_name, sizeof(func_name), "%s%d",
+                             AOT_FUNC_INTERNAL_PREFIX, i + j * group_stride);
+                    F1 = M->getFunction(func_name);
                     if (F1) {
                         LOG_DEBUG("compile func %s", func_name);
                         GVsToAdd.push_back(cast<GlobalValue>(F1));

+ 5 - 0
core/iwasm/compilation/aot_orc_extra.h

@@ -71,5 +71,10 @@ LLVMOrcLLLazyJITGetIRTransformLayer(LLVMOrcLLLazyJITRef J);
 LLVMOrcObjectTransformLayerRef
 LLVMOrcLLLazyJITGetObjTransformLayer(LLVMOrcLLLazyJITRef J);
 
+void
+LLVMOrcLLJITBuilderSetCompileFuncitonCreatorWithStackSizesCallback(
+    LLVMOrcLLLazyJITBuilderRef Builder,
+    void (*cb)(void *, const char *, size_t, size_t), void *cb_data);
+
 LLVM_C_EXTERN_C_END
 #endif

+ 145 - 0
core/iwasm/compilation/aot_orc_extra2.cpp

@@ -0,0 +1,145 @@
+/*
+ * Copyright (C) 2023 Midokura Japan KK.  All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ */
+
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+#include "llvm/ExecutionEngine/Orc/CompileUtils.h"
+#include "llvm/ExecutionEngine/Orc/LLJIT.h"
+#include "llvm/IR/LegacyPassManager.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Support/SmallVectorMemoryBuffer.h"
+#include "llvm/CodeGen/Passes.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+
+#include "aot_orc_extra.h"
+#include "bh_log.h"
+
+typedef void (*cb_t)(void *, const char *, size_t, size_t);
+
+class MyCompiler : public llvm::orc::IRCompileLayer::IRCompiler
+{
+  public:
+    MyCompiler(llvm::orc::JITTargetMachineBuilder JTMB, cb_t cb, void *cb_data);
+    llvm::Expected<llvm::orc::SimpleCompiler::CompileResult> operator()(
+        llvm::Module &M) override;
+
+  private:
+    llvm::orc::JITTargetMachineBuilder JTMB;
+
+    cb_t cb;
+    void *cb_data;
+};
+
+MyCompiler::MyCompiler(llvm::orc::JITTargetMachineBuilder JTMB, cb_t cb,
+                       void *cb_data)
+  : IRCompiler(llvm::orc::irManglingOptionsFromTargetOptions(JTMB.getOptions()))
+  , JTMB(std::move(JTMB))
+  , cb(cb)
+  , cb_data(cb_data)
+{}
+
+class PrintStackSizes : public llvm::MachineFunctionPass
+{
+  public:
+    PrintStackSizes(cb_t cb, void *cb_data);
+    bool runOnMachineFunction(llvm::MachineFunction &MF) override;
+    static char ID;
+
+  private:
+    cb_t cb;
+    void *cb_data;
+};
+
+PrintStackSizes::PrintStackSizes(cb_t cb, void *cb_data)
+  : MachineFunctionPass(ID)
+  , cb(cb)
+  , cb_data(cb_data)
+{}
+
+char PrintStackSizes::ID = 0;
+
+bool
+PrintStackSizes::runOnMachineFunction(llvm::MachineFunction &MF)
+{
+    auto name = MF.getName();
+    auto MFI = &MF.getFrameInfo();
+    size_t sz = MFI->getStackSize();
+    cb(cb_data, name.data(), name.size(), sz);
+    return false;
+}
+
+class MyPassManager : public llvm::legacy::PassManager
+{
+  public:
+    void add(llvm::Pass *P) override;
+};
+
+void
+MyPassManager::add(llvm::Pass *P)
+{
+    // a hack to avoid having a copy of the whole addPassesToEmitMC.
+    // we want to add PrintStackSizes before FreeMachineFunctionPass.
+    if (P->getPassName() == "Free MachineFunction") {
+        return;
+    }
+    llvm::legacy::PassManager::add(P);
+}
+
+// a modified copy from llvm/lib/ExecutionEngine/Orc/CompileUtils.cpp
+llvm::Expected<llvm::orc::SimpleCompiler::CompileResult>
+MyCompiler::operator()(llvm::Module &M)
+{
+    auto TM = cantFail(JTMB.createTargetMachine());
+    llvm::SmallVector<char, 0> ObjBufferSV;
+
+    {
+        llvm::raw_svector_ostream ObjStream(ObjBufferSV);
+
+        MyPassManager PM;
+        llvm::MCContext *Ctx;
+        if (TM->addPassesToEmitMC(PM, Ctx, ObjStream))
+            return llvm::make_error<llvm::StringError>(
+                "Target does not support MC emission",
+                llvm::inconvertibleErrorCode());
+        PM.add(new PrintStackSizes(cb, cb_data));
+        dynamic_cast<llvm::legacy::PassManager *>(&PM)->add(
+            llvm::createFreeMachineFunctionPass());
+        PM.run(M);
+    }
+
+#if LLVM_VERSION_MAJOR > 13
+    auto ObjBuffer = std::make_unique<llvm::SmallVectorMemoryBuffer>(
+        std::move(ObjBufferSV),
+        M.getModuleIdentifier() + "-jitted-objectbuffer",
+        /*RequiresNullTerminator=*/false);
+#else
+    auto ObjBuffer = std::make_unique<llvm::SmallVectorMemoryBuffer>(
+        std::move(ObjBufferSV),
+        M.getModuleIdentifier() + "-jitted-objectbuffer");
+#endif
+
+    return std::move(ObjBuffer);
+}
+
+DEFINE_SIMPLE_CONVERSION_FUNCTIONS(llvm::orc::LLLazyJITBuilder,
+                                   LLVMOrcLLLazyJITBuilderRef)
+
+void
+LLVMOrcLLJITBuilderSetCompileFuncitonCreatorWithStackSizesCallback(
+    LLVMOrcLLLazyJITBuilderRef Builder,
+    void (*cb)(void *, const char *, size_t, size_t), void *cb_data)
+{
+    auto b = unwrap(Builder);
+    b->setCompileFunctionCreator(
+        [cb, cb_data](llvm::orc::JITTargetMachineBuilder JTMB)
+            -> llvm::Expected<
+                std::unique_ptr<llvm::orc::IRCompileLayer::IRCompiler>> {
+            return std::make_unique<MyCompiler>(
+                MyCompiler(std::move(JTMB), cb, cb_data));
+        });
+}

+ 14 - 11
core/iwasm/compilation/debug/dwarf_extractor.cpp

@@ -114,7 +114,7 @@ destroy_dwarf_extractor(dwar_extractor_handle_t handle)
 }
 
 LLVMMetadataRef
-dwarf_gen_file_info(AOTCompContext *comp_ctx)
+dwarf_gen_file_info(const AOTCompContext *comp_ctx)
 {
     dwar_extractor *extractor;
     int units_number;
@@ -191,7 +191,7 @@ dwarf_gen_mock_vm_info(AOTCompContext *comp_ctx)
 #endif
 
 LLVMMetadataRef
-dwarf_gen_comp_unit_info(AOTCompContext *comp_ctx)
+dwarf_gen_comp_unit_info(const AOTCompContext *comp_ctx)
 {
     dwar_extractor *extractor;
     int units_number;
@@ -257,7 +257,7 @@ lldb_get_basic_type_encoding(BasicType basic_type)
 }
 
 static LLVMMetadataRef
-lldb_type_to_type_dbi(AOTCompContext *comp_ctx, SBType &type)
+lldb_type_to_type_dbi(const AOTCompContext *comp_ctx, SBType &type)
 {
     LLVMMetadataRef type_info = NULL;
     BasicType basic_type = type.GetBasicType();
@@ -282,8 +282,9 @@ lldb_type_to_type_dbi(AOTCompContext *comp_ctx, SBType &type)
 }
 
 static LLVMMetadataRef
-lldb_function_to_function_dbi(AOTCompContext *comp_ctx, SBSymbolContext &sc,
-                              AOTFuncContext *func_ctx)
+lldb_function_to_function_dbi(const AOTCompContext *comp_ctx,
+                              SBSymbolContext &sc,
+                              const AOTFuncContext *func_ctx)
 {
     SBFunction function(sc.GetFunction());
     const char *function_name = function.GetName();
@@ -388,7 +389,8 @@ lldb_function_to_function_dbi(AOTCompContext *comp_ctx, SBSymbolContext &sc,
 }
 
 LLVMMetadataRef
-dwarf_gen_func_info(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx)
+dwarf_gen_func_info(const AOTCompContext *comp_ctx,
+                    const AOTFuncContext *func_ctx)
 {
     LLVMMetadataRef func_info = NULL;
     dwar_extractor *extractor;
@@ -417,8 +419,8 @@ dwarf_gen_func_info(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx)
 }
 
 void
-dwarf_get_func_name(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-                    char *name, int len)
+dwarf_get_func_name(const AOTCompContext *comp_ctx,
+                    const AOTFuncContext *func_ctx, char *name, int len)
 {
     LLVMMetadataRef func_info = NULL;
     dwar_extractor *extractor;
@@ -448,8 +450,8 @@ dwarf_get_func_name(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
 }
 
 LLVMMetadataRef
-dwarf_gen_location(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-                   uint64_t vm_offset)
+dwarf_gen_location(const AOTCompContext *comp_ctx,
+                   const AOTFuncContext *func_ctx, uint64_t vm_offset)
 {
     LLVMMetadataRef location_info = NULL;
     dwar_extractor *extractor;
@@ -487,7 +489,8 @@ dwarf_gen_location(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
 }
 
 LLVMMetadataRef
-dwarf_gen_func_ret_location(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx)
+dwarf_gen_func_ret_location(const AOTCompContext *comp_ctx,
+                            const AOTFuncContext *func_ctx)
 {
     LLVMMetadataRef func_info = NULL;
     dwar_extractor *extractor;

+ 10 - 8
core/iwasm/compilation/debug/dwarf_extractor.h

@@ -30,24 +30,26 @@ dwar_extractor_handle_t
 create_dwarf_extractor(aot_comp_data_t comp_data, char *file_name);
 
 LLVMMetadataRef
-dwarf_gen_file_info(AOTCompContext *comp_ctx);
+dwarf_gen_file_info(const AOTCompContext *comp_ctx);
 
 LLVMMetadataRef
-dwarf_gen_comp_unit_info(AOTCompContext *comp_ctx);
+dwarf_gen_comp_unit_info(const AOTCompContext *comp_ctx);
 
 LLVMMetadataRef
-dwarf_gen_func_info(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx);
+dwarf_gen_func_info(const AOTCompContext *comp_ctx,
+                    const AOTFuncContext *func_ctx);
 
 LLVMMetadataRef
-dwarf_gen_location(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-                   uint64_t vm_offset);
+dwarf_gen_location(const AOTCompContext *comp_ctx,
+                   const AOTFuncContext *func_ctx, uint64_t vm_offset);
 
 LLVMMetadataRef
-dwarf_gen_func_ret_location(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx);
+dwarf_gen_func_ret_location(const AOTCompContext *comp_ctx,
+                            const AOTFuncContext *func_ctx);
 
 void
-dwarf_get_func_name(AOTCompContext *comp_ctx, AOTFuncContext *func_ctx,
-                    char *name, int len);
+dwarf_get_func_name(const AOTCompContext *comp_ctx,
+                    const AOTFuncContext *func_ctx, char *name, int len);
 
 #ifdef __cplusplus
 }

+ 25 - 20
core/iwasm/interpreter/wasm_loader.c

@@ -9255,10 +9255,12 @@ re_scan:
 #if (WASM_ENABLE_WAMR_COMPILER != 0) || (WASM_ENABLE_JIT != 0)
             case WASM_OP_SIMD_PREFIX:
             {
-                opcode = read_uint8(p);
+                uint32 opcode1;
+
+                opcode1 = read_uint8(p);
                 /* follow the order of enum WASMSimdEXTOpcode in wasm_opcode.h
                  */
-                switch (opcode) {
+                switch (opcode1) {
                     /* memory instruction */
                     case SIMD_v128_load:
                     case SIMD_v128_load8x8_s:
@@ -9276,7 +9278,7 @@ re_scan:
 
                         read_leb_uint32(p, p_end, align); /* align */
                         if (!check_simd_memory_access_align(
-                                opcode, align, error_buf, error_buf_size)) {
+                                opcode1, align, error_buf, error_buf_size)) {
                             goto fail;
                         }
 
@@ -9295,7 +9297,7 @@ re_scan:
 
                         read_leb_uint32(p, p_end, align); /* align */
                         if (!check_simd_memory_access_align(
-                                opcode, align, error_buf, error_buf_size)) {
+                                opcode1, align, error_buf, error_buf_size)) {
                             goto fail;
                         }
 
@@ -9351,7 +9353,7 @@ re_scan:
                         uint8 pop_type[] = { VALUE_TYPE_I32, VALUE_TYPE_I32,
                                              VALUE_TYPE_I32, VALUE_TYPE_I64,
                                              VALUE_TYPE_F32, VALUE_TYPE_F64 };
-                        POP_AND_PUSH(pop_type[opcode - SIMD_i8x16_splat],
+                        POP_AND_PUSH(pop_type[opcode1 - SIMD_i8x16_splat],
                                      VALUE_TYPE_V128);
                         break;
                     }
@@ -9396,22 +9398,23 @@ re_scan:
 
                         CHECK_BUF(p, p_end, 1);
                         lane = read_uint8(p);
-                        if (!check_simd_access_lane(opcode, lane, error_buf,
+                        if (!check_simd_access_lane(opcode1, lane, error_buf,
                                                     error_buf_size)) {
                             goto fail;
                         }
 
-                        if (replace[opcode - SIMD_i8x16_extract_lane_s]) {
+                        if (replace[opcode1 - SIMD_i8x16_extract_lane_s]) {
                             if (!(wasm_loader_pop_frame_ref(
                                     loader_ctx,
-                                    replace[opcode - SIMD_i8x16_extract_lane_s],
+                                    replace[opcode1
+                                            - SIMD_i8x16_extract_lane_s],
                                     error_buf, error_buf_size)))
                                 goto fail;
                         }
 
                         POP_AND_PUSH(
                             VALUE_TYPE_V128,
-                            push_type[opcode - SIMD_i8x16_extract_lane_s]);
+                            push_type[opcode1 - SIMD_i8x16_extract_lane_s]);
                         break;
                     }
 
@@ -9512,7 +9515,7 @@ re_scan:
 
                         read_leb_uint32(p, p_end, align); /* align */
                         if (!check_simd_memory_access_align(
-                                opcode, align, error_buf, error_buf_size)) {
+                                opcode1, align, error_buf, error_buf_size)) {
                             goto fail;
                         }
 
@@ -9520,14 +9523,14 @@ re_scan:
 
                         CHECK_BUF(p, p_end, 1);
                         lane = read_uint8(p);
-                        if (!check_simd_access_lane(opcode, lane, error_buf,
+                        if (!check_simd_access_lane(opcode1, lane, error_buf,
                                                     error_buf_size)) {
                             goto fail;
                         }
 
                         POP_V128();
                         POP_I32();
-                        if (opcode < SIMD_v128_store8_lane) {
+                        if (opcode1 < SIMD_v128_store8_lane) {
                             PUSH_V128();
                         }
 #if WASM_ENABLE_JIT != 0 || WASM_ENABLE_WAMR_COMPILER != 0
@@ -9543,7 +9546,7 @@ re_scan:
 
                         read_leb_uint32(p, p_end, align); /* align */
                         if (!check_simd_memory_access_align(
-                                opcode, align, error_buf, error_buf_size)) {
+                                opcode1, align, error_buf, error_buf_size)) {
                             goto fail;
                         }
 
@@ -9900,7 +9903,7 @@ re_scan:
                             snprintf(error_buf, error_buf_size,
                                      "WASM module load failed: "
                                      "invalid opcode 0xfd %02x.",
-                                     opcode);
+                                     opcode1);
                         }
                         goto fail;
                     }
@@ -9913,15 +9916,17 @@ re_scan:
 #if WASM_ENABLE_SHARED_MEMORY != 0
             case WASM_OP_ATOMIC_PREFIX:
             {
-                opcode = read_uint8(p);
+                uint32 opcode1;
+
+                opcode1 = read_uint8(p);
 #if WASM_ENABLE_FAST_INTERP != 0
-                emit_byte(loader_ctx, opcode);
+                emit_byte(loader_ctx, opcode1);
 #endif
-                if (opcode != WASM_OP_ATOMIC_FENCE) {
+                if (opcode1 != WASM_OP_ATOMIC_FENCE) {
                     CHECK_MEMORY();
                     read_leb_uint32(p, p_end, align);      /* align */
                     read_leb_uint32(p, p_end, mem_offset); /* offset */
-                    if (!check_memory_align_equal(opcode, align, error_buf,
+                    if (!check_memory_align_equal(opcode1, align, error_buf,
                                                   error_buf_size)) {
                         goto fail;
                     }
@@ -9932,7 +9937,7 @@ re_scan:
 #if WASM_ENABLE_JIT != 0 || WASM_ENABLE_WAMR_COMPILER != 0
                 func->has_memory_operations = true;
 #endif
-                switch (opcode) {
+                switch (opcode1) {
                     case WASM_OP_ATOMIC_NOTIFY:
                         POP2_AND_PUSH(VALUE_TYPE_I32, VALUE_TYPE_I32);
                         break;
@@ -10048,7 +10053,7 @@ re_scan:
                     default:
                         set_error_buf_v(error_buf, error_buf_size,
                                         "%s %02x %02x", "unsupported opcode",
-                                        0xfe, opcode);
+                                        0xfe, opcode1);
                         goto fail;
                 }
                 break;

+ 8 - 19
core/iwasm/interpreter/wasm_runtime.c

@@ -2097,16 +2097,6 @@ wasm_instantiate(WASMModule *module, bool is_sub_inst,
     }
 #endif
 
-#if WASM_ENABLE_WASI_NN != 0
-    if (!is_sub_inst) {
-        if (!(module_inst->e->wasi_nn_ctx = wasi_nn_initialize())) {
-            set_error_buf(error_buf, error_buf_size,
-                          "wasi nn initialization failed");
-            goto fail;
-        }
-    }
-#endif
-
 #if WASM_ENABLE_DEBUG_INTERP != 0
     if (!is_sub_inst) {
         /* Add module instance into module's instance list */
@@ -2265,11 +2255,8 @@ wasm_deinstantiate(WASMModuleInstance *module_inst, bool is_sub_inst)
         wasm_runtime_free(module_inst->e->c_api_func_imports);
 
 #if WASM_ENABLE_WASI_NN != 0
-    if (!is_sub_inst) {
-        WASINNContext *wasi_nn_ctx = module_inst->e->wasi_nn_ctx;
-        if (wasi_nn_ctx)
-            wasi_nn_destroy(wasi_nn_ctx);
-    }
+    if (!is_sub_inst)
+        wasi_nn_destroy(module_inst);
 #endif
 
     wasm_runtime_free(module_inst);
@@ -3048,12 +3035,14 @@ wasm_interp_dump_call_stack(struct WASMExecEnv *exec_env, bool print, char *buf,
 
         /* function name not exported, print number instead */
         if (frame.func_name_wp == NULL) {
-            line_length = snprintf(line_buf, sizeof(line_buf), "#%02d $f%d\n",
-                                   n, frame.func_index);
+            line_length =
+                snprintf(line_buf, sizeof(line_buf),
+                         "#%02" PRIu32 " $f%" PRIu32 "\n", n, frame.func_index);
         }
         else {
-            line_length = snprintf(line_buf, sizeof(line_buf), "#%02d %s\n", n,
-                                   frame.func_name_wp);
+            line_length =
+                snprintf(line_buf, sizeof(line_buf), "#%02" PRIu32 " %s\n", n,
+                         frame.func_name_wp);
         }
 
         if (line_length >= sizeof(line_buf)) {

+ 0 - 4
core/iwasm/interpreter/wasm_runtime.h

@@ -241,10 +241,6 @@ typedef struct WASMModuleInstanceExtra {
         && WASM_ENABLE_LAZY_JIT != 0)
     WASMModuleInstance *next;
 #endif
-
-#if WASM_ENABLE_WASI_NN != 0
-    WASINNContext *wasi_nn_ctx;
-#endif
 } WASMModuleInstanceExtra;
 
 struct AOTFuncPerfProfInfo;

+ 28 - 0
core/iwasm/libraries/libc-uvwasi/FindLIBUV.cmake

@@ -0,0 +1,28 @@
+# Copyright (C) 2023 Intel Corporation.  All rights reserved.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+# Find libuv library
+# This module defines
+#  LIBUV_FOUND, if false, do not try to link to libuv
+#  LIBUV_LIBRARIES
+#  LIBUV_INCLUDE_DIR, where to find uv.h
+
+find_path(LIBUV_INCLUDE_DIR NAMES uv.h)
+find_library(LIBUV_LIBRARIES NAMES uv libuv)
+
+include(FindPackageHandleStandardArgs)
+
+find_package_handle_standard_args(
+  LIBUV
+  FOUND_VAR LIBUV_FOUND
+  REQUIRED_VARS
+    LIBUV_LIBRARIES
+    LIBUV_INCLUDE_DIR
+)
+
+if(WIN32)
+  list(APPEND LIBUV_LIBRARIES iphlpapi)
+  list(APPEND LIBUV_LIBRARIES psapi)
+  list(APPEND LIBUV_LIBRARIES userenv)
+  list(APPEND LIBUV_LIBRARIES ws2_32)
+endif()

+ 25 - 0
core/iwasm/libraries/libc-uvwasi/FindUVWASI.cmake

@@ -0,0 +1,25 @@
+# Copyright (C) 2023 Intel Corporation.  All rights reserved.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+# Find libuvwasi library
+# This module defines
+#  UVWASI_FOUND, if false, do not try to link to libuvwasi
+#  UVWASI_LIBRARIES
+#  UVWASI_INCLUDE_DIR, where to find headers
+
+find_path(UVWASI_INCLUDE_DIR NAMES uvwasi.h wasi_serdes.h wasi_types.h PATH_SUFFIXES uvwasi)
+find_library(UVWASI_LIBRARIES NAMES uvwasi_a)
+
+include(FindPackageHandleStandardArgs)
+
+find_package_handle_standard_args(
+  UVWASI
+  FOUND_VAR UVWASI_FOUND
+  REQUIRED_VARS
+    UVWASI_LIBRARIES
+    UVWASI_INCLUDE_DIR
+)
+
+if(UVWASI_FOUND)
+  set(UVWASI_INCLUDE_DIR ${UVWASI_INCLUDE_DIR}/uvwasi)
+endif()

+ 42 - 25
core/iwasm/libraries/libc-uvwasi/libc_uvwasi.cmake

@@ -9,36 +9,53 @@ add_definitions (-DWASM_ENABLE_LIBC_WASI=1 -DWASM_ENABLE_UVWASI=1)
 
 include(FetchContent)
 
+# Point CMake at the custom modules to find libuv and uvwasi
+list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_LIST_DIR}")
+
 ## libuv
-FetchContent_Declare(
-    libuv
-    GIT_REPOSITORY https://github.com/libuv/libuv.git
-    GIT_TAG ${LIBUV_VERSION}
-)
-FetchContent_GetProperties(libuv)
-if (NOT libuv_POPULATED)
-    message("-- Fetching libuv ..")
-    FetchContent_Populate(libuv)
-    include_directories("${libuv_SOURCE_DIR}/include")
-    add_subdirectory(${libuv_SOURCE_DIR} ${libuv_BINARY_DIR} EXCLUDE_FROM_ALL)
-    set (UV_A_LIBS uv_a)
-    set_target_properties(uv_a PROPERTIES POSITION_INDEPENDENT_CODE 1)
+find_package(LIBUV QUIET)
+if (LIBUV_FOUND)
+    include_directories(${LIBUV_INCLUDE_DIR})
+else()
+    FetchContent_Declare(
+        libuv
+        GIT_REPOSITORY https://github.com/libuv/libuv.git
+        GIT_TAG ${LIBUV_VERSION}
+    )
+    FetchContent_GetProperties(libuv)
+    if (NOT libuv_POPULATED)
+        message("-- Fetching libuv ..")
+        FetchContent_Populate(libuv)
+        include_directories("${libuv_SOURCE_DIR}/include")
+        add_subdirectory(${libuv_SOURCE_DIR} ${libuv_BINARY_DIR} EXCLUDE_FROM_ALL)
+        set (LIBUV_LIBRARIES uv_a)
+        set_target_properties(uv_a PROPERTIES POSITION_INDEPENDENT_CODE 1)
+    endif()
 endif()
 
 ## uvwasi
-FetchContent_Declare(
-    uvwasi
-    GIT_REPOSITORY https://github.com/nodejs/uvwasi.git
-    GIT_TAG main
-)
-FetchContent_GetProperties(uvwasi)
-if (NOT uvwasi_POPULATED)
-    message("-- Fetching uvwasi ..")
-    FetchContent_Populate(uvwasi)
-    include_directories("${uvwasi_SOURCE_DIR}/include")
-    add_subdirectory(${uvwasi_SOURCE_DIR} ${uvwasi_BINARY_DIR} EXCLUDE_FROM_ALL)
+find_package(UVWASI QUIET)
+if (UVWASI_FOUND)
+    include_directories(${UVWASI_INCLUDE_DIR})
+else()
+    FetchContent_Declare(
+        uvwasi
+        GIT_REPOSITORY https://github.com/nodejs/uvwasi.git
+        GIT_TAG main
+    )
+    FetchContent_GetProperties(uvwasi)
+    if (NOT uvwasi_POPULATED)
+        message("-- Fetching uvwasi ..")
+        FetchContent_Populate(uvwasi)
+        include_directories("${uvwasi_SOURCE_DIR}/include")
+        add_subdirectory(${uvwasi_SOURCE_DIR} ${uvwasi_BINARY_DIR} EXCLUDE_FROM_ALL)
+        set (UVWASI_LIBRARIES uvwasi_a)
+        set_target_properties(uvwasi_a PROPERTIES POSITION_INDEPENDENT_CODE 1)
+    endif()
 endif()
 
-file (GLOB_RECURSE source_all ${LIBC_WASI_DIR}/*.c ${uvwasi_SOURCE_DIR}/src/*.c)
+set (UV_A_LIBS ${LIBUV_LIBRARIES} ${UVWASI_LIBRARIES})
+
+file (GLOB_RECURSE source_all ${LIBC_WASI_DIR}/*.c)
 
 set (LIBC_WASI_SOURCE ${source_all})

+ 2 - 0
core/iwasm/libraries/libc-wasi/libc_wasi_wrapper.c

@@ -56,11 +56,13 @@ typedef struct WASIContext *wasi_ctx_t;
 wasi_ctx_t
 wasm_runtime_get_wasi_ctx(wasm_module_inst_t module_inst);
 
+#if WASM_ENABLE_THREAD_MGR != 0
 static inline uint64_t
 min_uint64(uint64_t a, uint64_t b)
 {
     return a > b ? b : a;
 }
+#endif
 
 static inline uint32_t
 min_uint32(uint32_t a, uint32_t b)

+ 8 - 4
core/iwasm/libraries/wasi-nn/README.md

@@ -55,8 +55,10 @@ Tests: passed!
 
 ```
 docker run \
-    -v $PWD/core/iwasm/libraries/wasi-nn/test:/assets wasi-nn-cpu \
-    --dir=/assets \
+    -v $PWD/core/iwasm/libraries/wasi-nn/test:/assets \
+    -v $PWD/core/iwasm/libraries/wasi-nn/test/models:/models \
+    wasi-nn-cpu \
+    --dir=/ \
     --env="TARGET=cpu" \
     /assets/test_tensorflow.wasm
 ```
@@ -66,8 +68,10 @@ docker run \
 ```
 docker run \
     --runtime=nvidia \
-    -v $PWD/core/iwasm/libraries/wasi-nn/test:/assets wasi-nn-nvidia-gpu \
-    --dir=/assets \
+    -v $PWD/core/iwasm/libraries/wasi-nn/test:/assets \
+    -v $PWD/core/iwasm/libraries/wasi-nn/test/models:/models \
+    wasi-nn-nvidia-gpu \
+    --dir=/ \
     --env="TARGET=gpu" \
     /assets/test_tensorflow.wasm
 ```

+ 23 - 26
core/iwasm/libraries/wasi-nn/cmake/Findtensorflow_lite.cmake

@@ -7,35 +7,32 @@ find_library(TENSORFLOW_LITE
 )
 
 if(NOT EXISTS ${TENSORFLOW_LITE})
-    if (NOT EXISTS "${WAMR_ROOT_DIR}/core/deps/tensorflow-src")
-        execute_process(COMMAND ${WAMR_ROOT_DIR}/core/deps/install_tensorflow.sh
-                        RESULT_VARIABLE TENSORFLOW_RESULT
-        )
-    else ()
-        message("Tensorflow is already downloaded.")
-    endif()
-    set(TENSORFLOW_SOURCE_DIR "${WAMR_ROOT_DIR}/core/deps/tensorflow-src")
-
-    if (WASI_NN_ENABLE_GPU EQUAL 1)
+  if(NOT EXISTS "${WAMR_ROOT_DIR}/core/deps/tensorflow-src")
+    execute_process(
+      COMMAND "${WAMR_ROOT_DIR}/core/deps/install_tensorflow.sh"
+      RESULT_VARIABLE TENSORFLOW_RESULT
+    )
+  else()
+    message("Tensorflow is already downloaded.")
+  endif()
+
+  set(TENSORFLOW_SOURCE_DIR "${WAMR_ROOT_DIR}/core/deps/tensorflow-src")
+
+  if(WASI_NN_ENABLE_GPU EQUAL 1)
     # Tensorflow specific:
     # * https://www.tensorflow.org/lite/guide/build_cmake#available_options_to_build_tensorflow_lite
     set (TFLITE_ENABLE_GPU ON)
-    endif ()
+  endif()
 
-    include_directories (${CMAKE_CURRENT_BINARY_DIR}/flatbuffers/include)
-    include_directories (${TENSORFLOW_SOURCE_DIR})
-    add_subdirectory(
-        "${TENSORFLOW_SOURCE_DIR}/tensorflow/lite"
-        "${CMAKE_CURRENT_BINARY_DIR}/tensorflow-lite" EXCLUDE_FROM_ALL) 
+  add_subdirectory(
+    "${TENSORFLOW_SOURCE_DIR}/tensorflow/lite"
+    "${CMAKE_CURRENT_BINARY_DIR}/tensorflow-lite"
+    EXCLUDE_FROM_ALL
+  )  
 
-else()
-    find_path(TENSORFLOW_LITE_INCLUDE_DIR
-    NAMES tensorflow/lite/interpreter.h
-    )
-    find_path(FLATBUFFER_INCLUDE_DIR
-    NAMES flatbuffers/flatbuffers.h
-    )
-    include_directories (${TENSORFLOW_LITE_INCLUDE_DIR})
-    include_directories (${FLATBUFFER_INCLUDE_DIR})    
-endif()
+  set(TENSORFLOW_LITE_INCLUDE_DIR "${TENSORFLOW_SOURCE_DIR}")
+  set(FLATBUFFER_INCLUDE_DIR "${CMAKE_CURRENT_BINARY_DIR}/flatbuffers/include")
 
+  include_directories(${TENSORFLOW_LITE_INCLUDE_DIR})
+  include_directories(${FLATBUFFER_INCLUDE_DIR})
+endif()

+ 46 - 0
core/iwasm/libraries/wasi-nn/cmake/iwasm_helper.cmake

@@ -0,0 +1,46 @@
+# Copyright (C) 2019 Intel Corporation.  All rights reserved.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+if (NOT DEFINED WAMR_BUILD_PLATFORM)
+  string (TOLOWER ${CMAKE_HOST_SYSTEM_NAME} WAMR_BUILD_PLATFORM)
+endif ()
+
+set (CMAKE_SHARED_LIBRARY_LINK_C_FLAGS "")
+set (CMAKE_SHARED_LIBRARY_LINK_CXX_FLAGS "")
+
+set (CMAKE_C_STANDARD 99)
+
+if (NOT DEFINED WAMR_BUILD_TARGET)
+  if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm64|aarch64)")
+    set (WAMR_BUILD_TARGET "AARCH64")
+  elseif (CMAKE_SYSTEM_PROCESSOR STREQUAL "riscv64")
+    set (WAMR_BUILD_TARGET "RISCV64")
+  elseif (CMAKE_SIZEOF_VOID_P EQUAL 8)
+    set (WAMR_BUILD_TARGET "X86_64")
+  elseif (CMAKE_SIZEOF_VOID_P EQUAL 4)
+    set (WAMR_BUILD_TARGET "X86_32")
+  else ()
+    message(SEND_ERROR "Unsupported build target platform!")
+  endif ()
+endif ()
+
+set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -Wformat -Wformat-security -Wshadow -Wno-unused-parameter")
+
+set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Wformat -Wformat-security -Wno-unused")
+
+if (WAMR_BUILD_TARGET MATCHES "X86_.*" OR WAMR_BUILD_TARGET STREQUAL "AMD_64")
+  if (NOT (CMAKE_C_COMPILER MATCHES ".*clang.*" OR CMAKE_C_COMPILER_ID MATCHES ".*Clang"))
+    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mindirect-branch-register")
+  endif ()
+endif ()
+
+set (WAMR_BUILD_INTERP 1)
+set (WAMR_BUILD_AOT 1)
+set (WAMR_BUILD_JIT 0)
+set (WAMR_BUILD_LIBC_WASI 1)
+set (WAMR_BUILD_FAST_INTERP 1)
+
+if (NOT (CMAKE_C_COMPILER MATCHES ".*clang.*" OR CMAKE_C_COMPILER_ID MATCHES ".*Clang"))
+  set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gc-sections")
+endif ()
+set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -Wformat -Wformat-security")

+ 22 - 0
core/iwasm/libraries/wasi-nn/cmake/wasi_nn.cmake

@@ -0,0 +1,22 @@
+# Copyright (C) 2019 Intel Corporation.  All rights reserved.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_LIST_DIR})
+
+# Find tensorflow-lite
+find_package(tensorflow_lite REQUIRED)
+
+set(WASI_NN_ROOT_DIR ${CMAKE_CURRENT_LIST_DIR}/..)
+
+include_directories (${WASI_NN_ROOT_DIR}/include)
+include_directories (${WASI_NN_ROOT_DIR}/src)
+include_directories (${WASI_NN_ROOT_DIR}/src/utils)
+
+set (
+  WASI_NN_SOURCES
+  ${WASI_NN_ROOT_DIR}/src/wasi_nn.c
+  ${WASI_NN_ROOT_DIR}/src/wasi_nn_tensorflowlite.cpp
+  ${WASI_NN_ROOT_DIR}/src/utils/wasi_nn_app_native.c
+)
+
+set (WASI_NN_LIBS tensorflow-lite)

+ 58 - 0
core/iwasm/libraries/wasi-nn/external/CMakeLists.txt

@@ -0,0 +1,58 @@
+# Copyright (C) 2019 Intel Corporation.  All rights reserved.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+cmake_minimum_required(VERSION 3.16)
+project(wasi-nn C CXX)
+
+set(CMAKE_POSITION_INDEPENDENT_CODE ON)
+
+set(WAMR_ROOT_DIR ${CMAKE_CURRENT_LIST_DIR}/../../../../..)
+set(WASI_NN_ROOT_DIR ${CMAKE_CURRENT_LIST_DIR}/..)
+
+if(NOT CMAKE_BUILD_TYPE)
+  set(CMAKE_BUILD_TYPE Debug)
+endif()
+
+#### libvmlib ####
+# NOTE: we build vmlib as a shared library here so that it can be
+# shared between iwasm and native libraries.
+include(${WASI_NN_ROOT_DIR}/cmake/iwasm_helper.cmake)
+include(${WAMR_ROOT_DIR}/build-scripts/runtime_lib.cmake)
+
+add_library(vmlib SHARED ${WAMR_RUNTIME_LIB_SOURCE})
+
+# iwasm
+include(${SHARED_DIR}/utils/uncommon/shared_uncommon.cmake)
+set(RUNTIME_SOURCE_ALL
+  ${WAMR_ROOT_DIR}/product-mini/platforms/${WAMR_BUILD_PLATFORM}/main.c
+  ${UNCOMMON_SHARED_SOURCE}
+)
+
+add_executable(iwasm ${RUNTIME_SOURCE_ALL})
+target_link_libraries(iwasm vmlib -lpthread -lm -ldl)
+
+#### TensorFlow ####
+
+include(${WASI_NN_ROOT_DIR}/cmake/wasi_nn.cmake)
+
+#### WASI-NN ####
+
+include_directories(
+  ${WAMR_ROOT_DIR}/core/iwasm/include
+  ${WAMR_ROOT_DIR}/core/shared/utils
+  ${WAMR_ROOT_DIR}/core/shared/platform/linux
+)
+
+add_library(wasi-nn SHARED
+  ${WASI_NN_SOURCES}
+)
+
+# Add `get_native_lib` symbol
+target_compile_definitions(wasi-nn PUBLIC
+  WASI_NN_SHARED
+)
+
+target_link_libraries(wasi-nn
+  ${WASI_NN_LIBS}
+  vmlib
+)

+ 13 - 0
core/iwasm/libraries/wasi-nn/external/README.md

@@ -0,0 +1,13 @@
+# wasi-nn as shared library
+
+Example on how to create libwasi-nn (external library) instead of embedding wasi-nn inside iwasm
+
+From folder `core/iwasm/libraries/wasi-nn/test`, build the test and run
+
+```sh
+../external/build/iwasm \
+    --dir=. \
+    --env="TARGET=cpu" \
+    --native-lib=../external/build/libwasi-nn.so \
+    test_tensorflow.wasm 
+```

+ 0 - 0
core/iwasm/libraries/wasi-nn/wasi_nn.h → core/iwasm/libraries/wasi-nn/include/wasi_nn.h


+ 3 - 0
core/iwasm/libraries/wasi-nn/wasi_nn_types.h → core/iwasm/libraries/wasi-nn/include/wasi_nn_types.h

@@ -6,6 +6,9 @@
 #ifndef WASI_NN_TYPES_H
 #define WASI_NN_TYPES_H
 
+#include <stdint.h>
+#include <stdbool.h>
+
 /**
  * ERRORS
  *

+ 133 - 63
core/iwasm/libraries/wasi-nn/src/wasi_nn.c

@@ -9,16 +9,18 @@
 #include <assert.h>
 #include <errno.h>
 #include <string.h>
+#include <stdint.h>
 
 #include "wasi_nn.h"
+#include "wasi_nn_private.h"
 #include "wasi_nn_app_native.h"
-#include "logger.h"
 #include "wasi_nn_tensorflowlite.hpp"
+#include "logger.h"
 
 #include "bh_platform.h"
 #include "wasm_export.h"
-#include "wasm_runtime.h"
-#include "aot_runtime.h"
+
+#define HASHMAP_INITIAL_SIZE 20
 
 /* Definition of 'wasi_nn.h' structs in WASM app format (using offset) */
 
@@ -51,6 +53,119 @@ static api_function lookup[] = {
       tensorflowlite_get_output }
 };
 
+static HashMap *hashmap;
+
+static void
+wasi_nn_ctx_destroy(WASINNContext *wasi_nn_ctx);
+
+/* Get wasi-nn context from module instance */
+
+static uint32
+hash_func(const void *key)
+{
+    // fnv1a_hash
+    const uint32 FNV_PRIME = 16777619;
+    const uint32 FNV_OFFSET_BASIS = 2166136261U;
+
+    uint32 hash = FNV_OFFSET_BASIS;
+    const unsigned char *bytes = (const unsigned char *)key;
+
+    for (size_t i = 0; i < sizeof(uintptr_t); ++i) {
+        hash ^= bytes[i];
+        hash *= FNV_PRIME;
+    }
+
+    return hash;
+}
+
+static bool
+key_equal_func(void *key1, void *key2)
+{
+    return key1 == key2;
+}
+
+static void
+key_destroy_func(void *key1)
+{}
+
+static void
+value_destroy_func(void *value)
+{
+    wasi_nn_ctx_destroy((WASINNContext *)value);
+}
+
+static WASINNContext *
+wasi_nn_initialize_context()
+{
+    NN_DBG_PRINTF("Initializing wasi-nn context");
+    WASINNContext *wasi_nn_ctx =
+        (WASINNContext *)wasm_runtime_malloc(sizeof(WASINNContext));
+    if (wasi_nn_ctx == NULL) {
+        NN_ERR_PRINTF("Error when allocating memory for WASI-NN context");
+        return NULL;
+    }
+    wasi_nn_ctx->is_model_loaded = false;
+    tensorflowlite_initialize(&wasi_nn_ctx->tflite_ctx);
+    return wasi_nn_ctx;
+}
+
+static bool
+wasi_nn_initialize()
+{
+    NN_DBG_PRINTF("Initializing wasi-nn");
+    hashmap = bh_hash_map_create(HASHMAP_INITIAL_SIZE, true, hash_func,
+                                 key_equal_func, key_destroy_func,
+                                 value_destroy_func);
+    if (hashmap == NULL) {
+        NN_ERR_PRINTF("Error while initializing hashmap");
+        return false;
+    }
+    return true;
+}
+
+static WASINNContext *
+wasm_runtime_get_wasi_nn_ctx(wasm_module_inst_t instance)
+{
+    WASINNContext *wasi_nn_ctx =
+        (WASINNContext *)bh_hash_map_find(hashmap, (void *)instance);
+    if (wasi_nn_ctx == NULL) {
+        wasi_nn_ctx = wasi_nn_initialize_context();
+        if (wasi_nn_ctx == NULL)
+            return NULL;
+        bool ok =
+            bh_hash_map_insert(hashmap, (void *)instance, (void *)wasi_nn_ctx);
+        if (!ok) {
+            NN_ERR_PRINTF("Error while storing context");
+            wasi_nn_ctx_destroy(wasi_nn_ctx);
+            return NULL;
+        }
+    }
+    NN_DBG_PRINTF("Returning ctx");
+    return wasi_nn_ctx;
+}
+
+static void
+wasi_nn_ctx_destroy(WASINNContext *wasi_nn_ctx)
+{
+    if (wasi_nn_ctx == NULL) {
+        NN_ERR_PRINTF(
+            "Error when deallocating memory. WASI-NN context is NULL");
+        return;
+    }
+    NN_DBG_PRINTF("Freeing wasi-nn");
+    NN_DBG_PRINTF("-> is_model_loaded: %d", wasi_nn_ctx->is_model_loaded);
+    NN_DBG_PRINTF("-> current_encoding: %d", wasi_nn_ctx->current_encoding);
+    tensorflowlite_destroy(wasi_nn_ctx->tflite_ctx);
+    wasm_runtime_free(wasi_nn_ctx);
+}
+
+void
+wasi_nn_destroy(wasm_module_inst_t instance)
+{
+    WASINNContext *wasi_nn_ctx = wasm_runtime_get_wasi_nn_ctx(instance);
+    wasi_nn_ctx_destroy(wasi_nn_ctx);
+}
+
 /* Utils */
 
 static bool
@@ -64,36 +179,13 @@ is_encoding_implemented(graph_encoding encoding)
 static error
 is_model_initialized(WASINNContext *wasi_nn_ctx)
 {
-    if (!wasi_nn_ctx->is_initialized) {
+    if (!wasi_nn_ctx->is_model_loaded) {
         NN_ERR_PRINTF("Model not initialized.");
         return runtime_error;
     }
     return success;
 }
 
-WASINNContext *
-wasm_runtime_get_wasi_nn_ctx(wasm_module_inst_t instance)
-{
-    WASINNContext *wasi_nn_ctx = NULL;
-#if WASM_ENABLE_INTERP != 0
-    if (instance->module_type == Wasm_Module_Bytecode) {
-        NN_DBG_PRINTF("Getting ctx from WASM");
-        WASMModuleInstance *module_inst = (WASMModuleInstance *)instance;
-        wasi_nn_ctx = ((WASMModuleInstanceExtra *)module_inst->e)->wasi_nn_ctx;
-    }
-#endif
-#if WASM_ENABLE_AOT != 0
-    if (instance->module_type == Wasm_Module_AoT) {
-        NN_DBG_PRINTF("Getting ctx from AOT");
-        AOTModuleInstance *module_inst = (AOTModuleInstance *)instance;
-        wasi_nn_ctx = ((AOTModuleInstanceExtra *)module_inst->e)->wasi_nn_ctx;
-    }
-#endif
-    bh_assert(wasi_nn_ctx != NULL);
-    NN_DBG_PRINTF("Returning ctx");
-    return wasi_nn_ctx;
-}
-
 /* WASI-NN implementation */
 
 error
@@ -131,7 +223,7 @@ wasi_nn_load(wasm_exec_env_t exec_env, graph_builder_array_wasm *builder,
     NN_DBG_PRINTF("wasi_nn_load finished with status %d [graph=%d]", res, *g);
 
     wasi_nn_ctx->current_encoding = encoding;
-    wasi_nn_ctx->is_initialized = true;
+    wasi_nn_ctx->is_model_loaded = true;
 
 fail:
     // XXX: Free intermediate structure pointers
@@ -250,39 +342,6 @@ wasi_nn_get_output(wasm_exec_env_t exec_env, graph_execution_context ctx,
     return res;
 }
 
-/* Non-exposed public functions */
-
-WASINNContext *
-wasi_nn_initialize()
-{
-    NN_DBG_PRINTF("Initializing wasi-nn");
-    WASINNContext *wasi_nn_ctx =
-        (WASINNContext *)wasm_runtime_malloc(sizeof(WASINNContext));
-    if (wasi_nn_ctx == NULL) {
-        NN_ERR_PRINTF("Error when allocating memory for WASI-NN context");
-        return NULL;
-    }
-    wasi_nn_ctx->is_initialized = true;
-    wasi_nn_ctx->current_encoding = 3;
-    tensorflowlite_initialize(&wasi_nn_ctx->tflite_ctx);
-    return wasi_nn_ctx;
-}
-
-void
-wasi_nn_destroy(WASINNContext *wasi_nn_ctx)
-{
-    if (wasi_nn_ctx == NULL) {
-        NN_ERR_PRINTF(
-            "Error when deallocating memory. WASI-NN context is NULL");
-        return;
-    }
-    NN_DBG_PRINTF("Freeing wasi-nn");
-    NN_DBG_PRINTF("-> is_initialized: %d", wasi_nn_ctx->is_initialized);
-    NN_DBG_PRINTF("-> current_encoding: %d", wasi_nn_ctx->current_encoding);
-    tensorflowlite_destroy(wasi_nn_ctx->tflite_ctx);
-    wasm_runtime_free(wasi_nn_ctx);
-}
-
 /* Register WASI-NN in WAMR */
 
 /* clang-format off */
@@ -299,8 +358,19 @@ static NativeSymbol native_symbols_wasi_nn[] = {
 };
 
 uint32_t
-get_wasi_nn_export_apis(NativeSymbol **p_libc_wasi_apis)
+get_wasi_nn_export_apis(NativeSymbol **p_native_symbols)
 {
-    *p_libc_wasi_apis = native_symbols_wasi_nn;
+    if (!wasi_nn_initialize())
+        return 0;
+    *p_native_symbols = native_symbols_wasi_nn;
     return sizeof(native_symbols_wasi_nn) / sizeof(NativeSymbol);
 }
+
+#if defined(WASI_NN_SHARED)
+uint32_t
+get_native_lib(char **p_module_name, NativeSymbol **p_native_symbols)
+{
+    *p_module_name = "wasi_nn";
+    return get_wasi_nn_export_apis(p_native_symbols);
+}
+#endif

+ 3 - 8
core/iwasm/libraries/wasi-nn/src/wasi_nn_private.h

@@ -7,25 +7,20 @@
 #define WASI_NN_PRIVATE_H
 
 #include "wasi_nn_types.h"
+#include "wasm_export.h"
 
 typedef struct {
-    bool is_initialized;
+    bool is_model_loaded;
     graph_encoding current_encoding;
     void *tflite_ctx;
 } WASINNContext;
 
-/**
- * @brief Initialize wasi-nn
- *
- */
-WASINNContext *
-wasi_nn_initialize();
 /**
  * @brief Destroy wasi-nn on app exists
  *
  */
 
 void
-wasi_nn_destroy(WASINNContext *wasi_nn_ctx);
+wasi_nn_destroy(wasm_module_inst_t instance);
 
 #endif

+ 1 - 2
core/iwasm/libraries/wasi-nn/src/wasi_nn_tensorflowlite.cpp

@@ -7,9 +7,8 @@
 #include "wasi_nn_tensorflowlite.hpp"
 #include "logger.h"
 
-#include "bh_common.h"
 #include "bh_platform.h"
-#include "platform_common.h"
+#include "wasm_export.h"
 
 #include <tensorflow/lite/interpreter.h>
 #include <tensorflow/lite/kernels/register.h>

+ 0 - 173
core/iwasm/libraries/wasi-nn/test/CMakeLists.txt

@@ -1,173 +0,0 @@
-# Copyright (C) 2019 Intel Corporation.  All rights reserved.
-# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-cmake_minimum_required (VERSION 2.9)
-
-project (iwasm)
-
-set (CMAKE_VERBOSE_MAKEFILE OFF)
-# Reset default linker flags
-set (CMAKE_C_STANDARD 99)
-set (CMAKE_CXX_STANDARD 14)
-set (CMAKE_SHARED_LIBRARY_LINK_C_FLAGS "")
-set (CMAKE_SHARED_LIBRARY_LINK_CXX_FLAGS "")
-
-if (NOT DEFINED WAMR_BUILD_PLATFORM)
-  set (WAMR_BUILD_PLATFORM "linux")
-endif ()
-
-# Set WAMR_BUILD_TARGET, currently values supported:
-# "X86_64", "AMD_64", "X86_32", "AARCH64[sub]", "ARM[sub]", "THUMB[sub]",
-# "MIPS", "XTENSA", "RISCV64[sub]", "RISCV32[sub]"
-if (NOT DEFINED WAMR_BUILD_TARGET)
-  if (CMAKE_SYSTEM_PROCESSOR MATCHES "^(arm64|aarch64)")
-    set (WAMR_BUILD_TARGET "AARCH64")
-  elseif (CMAKE_SYSTEM_PROCESSOR STREQUAL "riscv64")
-    set (WAMR_BUILD_TARGET "RISCV64")
-  elseif (CMAKE_SIZEOF_VOID_P EQUAL 8)
-    # Build as X86_64 by default in 64-bit platform
-    set (WAMR_BUILD_TARGET "X86_64")
-  elseif (CMAKE_SIZEOF_VOID_P EQUAL 4)
-    # Build as X86_32 by default in 32-bit platform
-    set (WAMR_BUILD_TARGET "X86_32")
-  else ()
-    message(SEND_ERROR "Unsupported build target platform!")
-  endif ()
-endif ()
-
-if (NOT CMAKE_BUILD_TYPE)
-  set(CMAKE_BUILD_TYPE Release)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_INTERP)
-  # Enable Interpreter by default
-  set (WAMR_BUILD_INTERP 1)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_AOT)
-  # Enable AOT by default.
-  set (WAMR_BUILD_AOT 1)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_JIT)
-  # Disable JIT by default.
-  set (WAMR_BUILD_JIT 0)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_FAST_JIT)
-  # Disable Fast JIT by default
-  set (WAMR_BUILD_FAST_JIT 0)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_LIBC_BUILTIN)
-  # Enable libc builtin support by default
-  set (WAMR_BUILD_LIBC_BUILTIN 1)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_LIBC_WASI)
-  # Enable libc wasi support by default
-  set (WAMR_BUILD_LIBC_WASI 1)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_FAST_INTERP)
-  # Enable fast interpreter
-  set (WAMR_BUILD_FAST_INTERP 1)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_MULTI_MODULE)
-  # Disable multiple modules by default
-  set (WAMR_BUILD_MULTI_MODULE 0)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_LIB_PTHREAD)
-  # Disable pthread library by default
-  set (WAMR_BUILD_LIB_PTHREAD 0)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_MINI_LOADER)
-  # Disable wasm mini loader by default
-  set (WAMR_BUILD_MINI_LOADER 0)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_SIMD)
-  # Enable SIMD by default
-  set (WAMR_BUILD_SIMD 1)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_REF_TYPES)
-  # Disable reference types by default
-  set (WAMR_BUILD_REF_TYPES 0)
-endif ()
-
-if (NOT DEFINED WAMR_BUILD_DEBUG_INTERP)
-  # Disable Debug feature by default
-  set (WAMR_BUILD_DEBUG_INTERP 0)
-endif ()
-
-if (WAMR_BUILD_DEBUG_INTERP EQUAL 1)
-  set (WAMR_BUILD_FAST_INTERP 0)
-  set (WAMR_BUILD_MINI_LOADER 0)
-  set (WAMR_BUILD_SIMD 0)
-endif ()
-
-set (WAMR_ROOT_DIR ${CMAKE_CURRENT_SOURCE_DIR}/../../../../..)
-
-include (${WAMR_ROOT_DIR}/build-scripts/runtime_lib.cmake)
-add_library(vmlib ${WAMR_RUNTIME_LIB_SOURCE})
-
-set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gc-sections -pie -fPIE")
-
-set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wall -Wextra -Wformat -Wformat-security -Wshadow")
-# set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wconversion -Wsign-conversion")
-
-set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -Wextra -Wformat -Wformat-security -Wno-unused")
-
-if (WAMR_BUILD_TARGET MATCHES "X86_.*" OR WAMR_BUILD_TARGET STREQUAL "AMD_64")
-  if (NOT (CMAKE_C_COMPILER MATCHES ".*clang.*" OR CMAKE_C_COMPILER_ID MATCHES ".*Clang"))
-    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mindirect-branch-register")
-    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mindirect-branch-register")
-    # UNDEFINED BEHAVIOR, refer to https://en.cppreference.com/w/cpp/language/ub
-    if(CMAKE_BUILD_TYPE STREQUAL "Debug" AND NOT WAMR_BUILD_JIT EQUAL 1)
-      set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsanitize=undefined \
-                                          -fno-sanitize=bounds,bounds-strict,alignment \
-                                          -fno-sanitize-recover")
-      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=undefined \
-                                              -fno-sanitize=bounds,bounds-strict,alignment \
-                                              -fno-sanitize-recover")
-    endif()
-  else ()
-    # UNDEFINED BEHAVIOR, refer to https://en.cppreference.com/w/cpp/language/ub
-    if(CMAKE_BUILD_TYPE STREQUAL "Debug" AND NOT WAMR_BUILD_JIT EQUAL 1)
-      set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fsanitize=undefined \
-                                          -fno-sanitize=bounds,alignment \
-                                          -fno-sanitize-recover")
-      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=undefined \
-                                              -fno-sanitize=bounds,alignment \
-                                              -fno-sanitize-recover")
-    endif()
-  endif ()
-endif ()
-
-# The following flags are to enhance security, but it may impact performance,
-# we disable them by default.
-#if (WAMR_BUILD_TARGET MATCHES "X86_.*" OR WAMR_BUILD_TARGET STREQUAL "AMD_64")
-#  set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -ftrapv -D_FORTIFY_SOURCE=2")
-#endif ()
-#set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fstack-protector-strong --param ssp-buffer-size=4")
-#set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wl,-z,noexecstack,-z,relro,-z,now")
-
-include (${SHARED_DIR}/utils/uncommon/shared_uncommon.cmake)
-
-add_executable (iwasm ${WAMR_ROOT_DIR}/product-mini/platforms/${WAMR_BUILD_PLATFORM}/main.c ${UNCOMMON_SHARED_SOURCE})
-
-install (TARGETS iwasm DESTINATION bin)
-
-target_link_libraries (iwasm vmlib ${LLVM_AVAILABLE_LIBS} ${UV_A_LIBS} ${TENSORFLOW_LIB} -lm -ldl -lpthread)
-
-add_library (libiwasm SHARED ${WAMR_RUNTIME_LIB_SOURCE})
-
-install (TARGETS libiwasm DESTINATION lib)
-
-set_target_properties (libiwasm PROPERTIES OUTPUT_NAME iwasm)
-
-target_link_libraries (libiwasm ${LLVM_AVAILABLE_LIBS} ${UV_A_LIBS} -lm -ldl -lpthread)

+ 5 - 2
core/iwasm/libraries/wasi-nn/test/Dockerfile.compile

@@ -5,8 +5,11 @@ FROM ubuntu:20.04
 
 ENV DEBIAN_FRONTEND=noninteractive
 
+# hadolint ignore=DL3008
 RUN apt-get update && apt-get install -y \
-    cmake build-essential git wget python3.10 python3-pip
+  cmake build-essential git wget python3.10 python3-pip --no-install-recommends \
+  && apt-get clean -y \
+  && rm -rf /var/lib/apt/lists/*
 
 ARG WASI_SDK_VER=19
 RUN wget -c --progress=dot:giga https://github.com/WebAssembly/wasi-sdk/releases/download/wasi-sdk-${WASI_SDK_VER}/wasi-sdk-${WASI_SDK_VER}.0-linux.tar.gz -P /opt \
@@ -18,6 +21,6 @@ WORKDIR /wasi-nn/test
 
 COPY core/iwasm/libraries/wasi-nn/test/requirements.txt .
 
-RUN pip3 install -r requirements.txt && rm requirements.txt
+RUN pip3 install --no-cache-dir -r requirements.txt && rm requirements.txt
 
 ENTRYPOINT [ "bash", "./build.sh" ]

+ 14 - 5
core/iwasm/libraries/wasi-nn/test/Dockerfile.cpu

@@ -5,23 +5,32 @@ FROM ubuntu:20.04 AS base
 
 ENV DEBIAN_FRONTEND=noninteractive
 
+# hadolint ignore=DL3008
 RUN apt-get update && apt-get install -y \
-    cmake build-essential git
+  cmake build-essential git --no-install-recommends
 
 WORKDIR /home/wamr
 
 COPY . .
 
-WORKDIR /home/wamr/core/iwasm/libraries/wasi-nn/test/build
+WORKDIR /home/wamr/product-mini/platforms/linux/build
+
+# hadolint ignore=DL3008
+RUN apt-get install -y wget ca-certificates --no-install-recommends \
+  && mkdir /usr/local/share/ca-certificates/cacert.org \
+  && wget -qP /usr/local/share/ca-certificates/cacert.org http://www.cacert.org/certs/root.crt http://www.cacert.org/certs/class3.crt \
+  && update-ca-certificates \
+  && git config --global http.sslCAinfo /etc/ssl/certs/ca-certificates.crt
 
 RUN cmake \
   -DWAMR_BUILD_WASI_NN=1 \
   ..
 
-RUN make -j $(grep -c ^processor /proc/cpuinfo)
+RUN make -j "$(grep -c ^processor /proc/cpuinfo)"
 
 FROM ubuntu:22.04
 
-COPY --from=base /home/wamr/core/iwasm/libraries/wasi-nn/test/build/iwasm /run/iwasm
+COPY --from=base /home/wamr/product-mini/platforms/linux/build/libvmlib.so /libvmlib.so
+COPY --from=base /home/wamr/product-mini/platforms/linux/build/iwasm /iwasm
 
-ENTRYPOINT [ "/run/iwasm" ]
+ENTRYPOINT [ "/iwasm" ]

+ 21 - 11
core/iwasm/libraries/wasi-nn/test/Dockerfile.nvidia-gpu

@@ -5,28 +5,37 @@ FROM ubuntu:20.04 AS base
 
 ENV DEBIAN_FRONTEND=noninteractive
 
+# hadolint ignore=DL3008
 RUN apt-get update && apt-get install -y \
-    cmake build-essential git
+    cmake build-essential git --no-install-recommends
 
 WORKDIR /home/wamr
 
 COPY . .
 
-WORKDIR /home/wamr/core/iwasm/libraries/wasi-nn/test/build
+WORKDIR /home/wamr/product-mini/platforms/linux/build
+
+# hadolint ignore=DL3008
+RUN apt-get install -y wget ca-certificates --no-install-recommends \
+  && mkdir /usr/local/share/ca-certificates/cacert.org \
+  && wget -qP /usr/local/share/ca-certificates/cacert.org http://www.cacert.org/certs/root.crt http://www.cacert.org/certs/class3.crt \
+  && update-ca-certificates \
+  && git config --global http.sslCAinfo /etc/ssl/certs/ca-certificates.crt
 
 RUN cmake \
-  -DWAMR_BUILD_WASI_NN=1 \
-  -DWASI_NN_ENABLE_GPU=1 \
-  ..
+    -DWAMR_BUILD_WASI_NN=1 \
+    -DWASI_NN_ENABLE_GPU=1 \
+    ..
 
-RUN make -j $(grep -c ^processor /proc/cpuinfo)
+RUN make -j "$(grep -c ^processor /proc/cpuinfo)"
 
 FROM nvidia/cuda:11.3.0-runtime-ubuntu20.04
 
+# hadolint ignore=DL3008
 RUN apt-get update && apt-get install -y --no-install-recommends \
-        ocl-icd-libopencl1 \
-        ocl-icd-opencl-dev \
-        clinfo && \
+    ocl-icd-libopencl1 \
+    ocl-icd-opencl-dev \
+    clinfo && \
     rm -rf /var/lib/apt/lists/*
 
 RUN mkdir -p /etc/OpenCL/vendors && \
@@ -35,6 +44,7 @@ RUN mkdir -p /etc/OpenCL/vendors && \
 ENV NVIDIA_VISIBLE_DEVICES=all
 ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
 
-COPY --from=base /home/wamr/core/iwasm/libraries/wasi-nn/test/build/iwasm /run/iwasm
+COPY --from=base /home/wamr/product-mini/platforms/linux/build/libvmlib.so /libvmlib.so
+COPY --from=base /home/wamr/product-mini/platforms/linux/build/iwasm /iwasm
 
-ENTRYPOINT [ "/run/iwasm" ]
+ENTRYPOINT [ "/iwasm" ]

+ 52 - 38
core/iwasm/libraries/wasi-nn/test/Dockerfile.vx-delegate

@@ -6,22 +6,33 @@ FROM ubuntu:20.04 AS base
 ENV DEBIAN_FRONTEND=noninteractive
 
 
+# hadolint ignore=DL3008
 RUN apt-get update && apt-get install -y \
-    cmake build-essential git curl libssl-dev python3
-
+    cmake build-essential git curl libssl-dev python3 --no-install-recommends \
+    && apt-get clean -y \
+    && rm -rf /var/lib/apt/lists/*
+
+# hadolint ignore=DL3008
+RUN apt-get update && apt-get install -y wget ca-certificates --no-install-recommends \
+    && apt-get clean -y \
+    && rm -rf /var/lib/apt/lists/* \
+    && mkdir /usr/local/share/ca-certificates/cacert.org \
+    && wget -qP /usr/local/share/ca-certificates/cacert.org http://www.cacert.org/certs/root.crt http://www.cacert.org/certs/class3.crt \
+    && update-ca-certificates \
+    && git config --global http.sslCAinfo /etc/ssl/certs/ca-certificates.crt
 
 # Build TensorFlow Lite VX delegate default built for x86-64 simulator
 WORKDIR /tmp
-RUN git clone https://github.com/VeriSilicon/TIM-VX.git tim-vx
-RUN git clone https://github.com/VeriSilicon/tflite-vx-delegate.git
-RUN git clone https://github.com/tensorflow/tensorflow.git
+RUN git clone https://github.com/VeriSilicon/TIM-VX.git tim-vx \
+    && git clone https://github.com/VeriSilicon/tflite-vx-delegate.git \
+    && git clone https://github.com/tensorflow/tensorflow.git
 
 
 # Build TIM-VX
 WORKDIR /tmp/tim-vx/host_build
-RUN cmake -DCMAKE_INSTALL_PREFIX=/usr/local  ../
-RUN make -j$(grep -c ^processor /proc/cpuinfo)
-RUN make install
+RUN cmake -DCMAKE_INSTALL_PREFIX=/usr/local  ../ \
+    && make -j "$(grep -c ^processor /proc/cpuinfo)" \
+    && make install
 
 WORKDIR /tmp/tim-vx
 #RUN mkdir -p prebuilt-sdk/x86_64_linux/lib/include 
@@ -31,22 +42,23 @@ WORKDIR /tmp/tim-vx
 # Build TensorFlow Lite
 WORKDIR /tmp/tensorflow/build
 RUN cmake \
-  -DBUILD_SHARED_LIBS=ON=on \
-  -DTFLITE_ENABLE_RUY=on \
-  -DTFLITE_ENABLE_NNAPI=off \
-  -DTFLITE_ENABLE_XNNPACK=on \
-  -DTFLITE_ENABLE_EXTERNAL_DELEGATE=on \
-  ../tensorflow/lite/
-RUN make -j$(grep -c ^processor /proc/cpuinfo)
-RUN make install
-RUN cp --no-preserve=ownership -d lib*.so* /usr/local/lib
-RUN cp -r --no-preserve=ownership -d flatbuffers/include/flatbuffers /usr/local/include
+    -DBUILD_SHARED_LIBS=ON=on \
+    -DTFLITE_ENABLE_RUY=on \
+    -DTFLITE_ENABLE_NNAPI=off \
+    -DTFLITE_ENABLE_XNNPACK=on \
+    -DTFLITE_ENABLE_EXTERNAL_DELEGATE=on \
+    ../tensorflow/lite/
+RUN make -j "$(grep -c ^processor /proc/cpuinfo)" \
+    && make install \
+    && cp --no-preserve=ownership -d lib*.so* /usr/local/lib \ 
+    && cp -r --no-preserve=ownership -d flatbuffers/include/flatbuffers /usr/local/include
 # install header files
-RUN install -d /usr/local/include/tensorflow/lite && \
-    cd /tmp/tensorflow/tensorflow/lite && \
-    cp --parents \
-        $(find . -name "*.h*") \
-        /usr/local/include/tensorflow/lite
+RUN install -d /usr/local/include/tensorflow/lite 
+WORKDIR /tmp/tensorflow/tensorflow/lite 
+# hadolint ignore=SC2046
+RUN cp --parents \
+    $(find . -name "*.h*") \
+    /usr/local/include/tensorflow/lite
 # install version.h from core
 RUN install -d /usr/local/include/tensorflow/core/public && \
     cp /tmp/tensorflow/tensorflow/core/public/version.h /usr/local/include/tensorflow/core/public
@@ -55,21 +67,22 @@ RUN install -d /usr/local/include/tensorflow/core/public && \
 # Build Vx Delegate default built for x86-64 simulator
 WORKDIR /tmp/tflite-vx-delegate/build
 RUN cmake \
-   -DBUILD_SHARED_LIBS=ON \
-   -DFETCHCONTENT_SOURCE_DIR_TENSORFLOW=/tmp/tensorflow \
-   -DTFLITE_LIB_LOC=/usr/local/lib/libtensorflow-lite.so \
-   -DTIM_VX_INSTALL=/usr/local \
-   -DCMAKE_INSTALL_PREFIX=/usr/  \
-   ../
-RUN make vx_delegate -j$(grep -c ^processor /proc/cpuinfo)
-RUN make install
-RUN cp --no-preserve=ownership -d lib*.so* /usr/lib
+    -DBUILD_SHARED_LIBS=ON \
+    -DFETCHCONTENT_SOURCE_DIR_TENSORFLOW=/tmp/tensorflow \
+    -DTFLITE_LIB_LOC=/usr/local/lib/libtensorflow-lite.so \
+    -DTIM_VX_INSTALL=/usr/local \
+    -DCMAKE_INSTALL_PREFIX=/usr/  \
+    ../
+RUN make vx_delegate -j "$(grep -c ^processor /proc/cpuinfo)" \
+    && make install \
+    && cp --no-preserve=ownership -d lib*.so* /usr/lib
 # install header files
-RUN install -d /usr/local/include/tensorflow-lite-vx-delegate && \
-    cd  /tmp/tflite-vx-delegate/ && \
-    cp --parents \
-        $(find . -name "*.h*") \
-        /usr/local/include/tensorflow-lite-vx-delegate
+RUN install -d /usr/local/include/tensorflow-lite-vx-delegate
+WORKDIR /tmp/tflite-vx-delegate/
+# hadolint ignore=SC2046
+RUN cp --parents \
+    $(find . -name "*.h*") \
+    /usr/local/include/tensorflow-lite-vx-delegate
 
 ENV VIVANTE_SDK_DIR=/tmp/tim-vx/prebuilt-sdk/x86_64_linux/
 ENV VSIMULATOR_CONFIG=czl
@@ -84,6 +97,7 @@ COPY . .
 
 WORKDIR /home/wamr/core/iwasm/libraries/wasi-nn/test/build
 
+# hadolint ignore=SC2086
 RUN cmake \
     -DCMAKE_LIBRARY_PATH=${CMAKE_LIBRARY_PATH}:/usr/local/lib/ \
     -DCMAKE_INCLUDE_PATH=${CMAKE_INCLUDE_PATH}:/usr/local/include/ \
@@ -92,7 +106,7 @@ RUN cmake \
     -DWASI_NN_EXT_DELEGATE_PATH="/usr/lib/libvx_delegate.so" \
     ..
 
-RUN make -j $(grep -c ^processor /proc/cpuinfo)
+RUN make -j "$(grep -c ^processor /proc/cpuinfo)"
 
 RUN cp /home/wamr/core/iwasm/libraries/wasi-nn/test/build/iwasm /run/iwasm
 

+ 1 - 1
core/iwasm/libraries/wasi-nn/test/build.sh

@@ -7,7 +7,7 @@
     -Wl,--allow-undefined \
     -Wl,--strip-all,--no-entry \
     --sysroot=/opt/wasi-sdk/share/wasi-sysroot \
-    -I.. -I../src/utils \
+    -I../include -I../src/utils \
     -o test_tensorflow.wasm \
     test_tensorflow.c utils.c
 

+ 8 - 11
core/iwasm/libraries/wasi-nn/test/test_tensorflow.c

@@ -20,7 +20,7 @@ test_sum(execution_target target)
 
     uint32_t output_size = 0;
     float *output = run_inference(target, input.input_tensor, input.dim,
-                                  &output_size, "/assets/models/sum.tflite", 1);
+                                  &output_size, "./models/sum.tflite", 1);
 
     assert(output_size == 1);
     assert(fabs(output[0] - 300.0) < EPSILON);
@@ -38,7 +38,7 @@ test_max(execution_target target)
 
     uint32_t output_size = 0;
     float *output = run_inference(target, input.input_tensor, input.dim,
-                                  &output_size, "/assets/models/max.tflite", 1);
+                                  &output_size, "./models/max.tflite", 1);
 
     assert(output_size == 1);
     assert(fabs(output[0] - 24.0) < EPSILON);
@@ -56,9 +56,8 @@ test_average(execution_target target)
     input_info input = create_input(dims);
 
     uint32_t output_size = 0;
-    float *output =
-        run_inference(target, input.input_tensor, input.dim, &output_size,
-                      "/assets/models/average.tflite", 1);
+    float *output = run_inference(target, input.input_tensor, input.dim,
+                                  &output_size, "./models/average.tflite", 1);
 
     assert(output_size == 1);
     assert(fabs(output[0] - 12.0) < EPSILON);
@@ -76,9 +75,8 @@ test_mult_dimensions(execution_target target)
     input_info input = create_input(dims);
 
     uint32_t output_size = 0;
-    float *output =
-        run_inference(target, input.input_tensor, input.dim, &output_size,
-                      "/assets/models/mult_dim.tflite", 1);
+    float *output = run_inference(target, input.input_tensor, input.dim,
+                                  &output_size, "./models/mult_dim.tflite", 1);
 
     assert(output_size == 9);
     for (int i = 0; i < 9; i++)
@@ -96,9 +94,8 @@ test_mult_outputs(execution_target target)
     input_info input = create_input(dims);
 
     uint32_t output_size = 0;
-    float *output =
-        run_inference(target, input.input_tensor, input.dim, &output_size,
-                      "/assets/models/mult_out.tflite", 2);
+    float *output = run_inference(target, input.input_tensor, input.dim,
+                                  &output_size, "./models/mult_out.tflite", 2);
 
     assert(output_size == 8);
     // first tensor check

+ 0 - 22
core/iwasm/libraries/wasi-nn/wasi_nn.cmake

@@ -1,22 +0,0 @@
-# Copyright (C) 2019 Intel Corporation.  All rights reserved.
-# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-
-list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_LIST_DIR}/cmake)
-
-# Find tensorflow-lite
-find_package(tensorflow_lite REQUIRED)
-
-set (WASI_NN_DIR ${CMAKE_CURRENT_LIST_DIR})
-
-include_directories (${WASI_NN_DIR})
-include_directories (${WASI_NN_DIR}/src)
-include_directories (${WASI_NN_DIR}/src/utils)
-
-set (
-    LIBC_WASI_NN_SOURCE
-    ${WASI_NN_DIR}/src/wasi_nn.c
-    ${WASI_NN_DIR}/src/wasi_nn_tensorflowlite.cpp
-    ${WASI_NN_DIR}/src/utils/wasi_nn_app_native.c
-)
-
-set (TENSORFLOW_LIB tensorflow-lite)

+ 1 - 0
core/shared/mem-alloc/ems/ems_alloc.c

@@ -564,6 +564,7 @@ gc_realloc_vo_internal(void *vheap, void *ptr, gc_size_t size, const char *file,
                         os_mutex_unlock(&heap->lock);
                         return NULL;
                     }
+                    hmu_mark_pinuse(hmu_next);
                 }
                 os_mutex_unlock(&heap->lock);
                 return obj_old;

+ 4 - 1
core/shared/mem-alloc/ems/ems_gc_internal.h

@@ -214,13 +214,16 @@ set_hmu_normal_node_next(hmu_normal_node_t *node, hmu_normal_node_t *next)
 #if defined(_MSC_VER)
 __pragma(pack(push, 1));
 #define __attr_packed
+#define __attr_aligned(a)
 #elif defined(__GNUC__) || defined(__clang__)
 #define __attr_packed __attribute__((packed))
+#define __attr_aligned(a) __attribute__((aligned(a)))
 #else
 #error "packed attribute isn't used to define struct hmu_tree_node"
 #endif
 #else /* else of UINTPTR_MAX == UINT64_MAX */
 #define __attr_packed
+#define __attr_aligned(a)
 #endif
 
 typedef struct hmu_tree_node {
@@ -229,7 +232,7 @@ typedef struct hmu_tree_node {
     struct hmu_tree_node *right;
     struct hmu_tree_node *parent;
     gc_size_t size;
-} __attr_packed hmu_tree_node_t;
+} __attr_packed __attr_aligned(4) hmu_tree_node_t;
 
 #if UINTPTR_MAX == UINT64_MAX
 #if defined(_MSC_VER)

+ 1 - 0
core/shared/platform/android/platform_internal.h

@@ -27,6 +27,7 @@
 #include <sched.h>
 #include <errno.h>
 #include <netinet/in.h>
+#include <sys/epoll.h>
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <sys/mman.h>

+ 2 - 0
core/shared/platform/linux/platform_internal.h

@@ -63,6 +63,7 @@ typedef sem_t korp_sem;
 
 #define bh_socket_t int
 
+#if WASM_DISABLE_WRITE_GS_BASE == 0
 #if defined(BUILD_TARGET_X86_64) || defined(BUILD_TARGET_AMD_64)
 #define os_writegsbase(base_addr)                                 \
     do {                                                          \
@@ -76,6 +77,7 @@ typedef sem_t korp_sem;
     _writegsbase_u64(((uint64)(uintptr_t)base_addr))
 #endif
 #endif
+#endif
 
 #if WASM_DISABLE_HW_BOUND_CHECK == 0
 #if defined(BUILD_TARGET_X86_64) || defined(BUILD_TARGET_AMD_64)            \

+ 8 - 1
doc/build_wamr.md

@@ -98,7 +98,7 @@ cmake -DWAMR_BUILD_PLATFORM=linux -DWAMR_BUILD_TARGET=ARM
 
 #### **Disable boundary check with hardware trap**
 - **WAMR_DISABLE_HW_BOUND_CHECK**=1/0, default to enable if not set and supported by platform
-> Note: by default only platform linux/darwin/android/windows/vxworks 64-bit will enable the boundary check with hardware trap feature, and the wamrc tool will generate AOT code without boundary check instructions in all 64-bit targets except SGX to improve performance. The boundary check includes linear memory access boundary and native stack access boundary, if `WAMR_DISABLE_STACK_HW_BOUND_CHECK` below isn't set.
+> Note: by default only platform [linux/darwin/android/windows/vxworks 64-bit](https://github.com/bytecodealliance/wasm-micro-runtime/blob/5fb5119239220b0803e7045ca49b0a29fe65e70e/core/shared/platform/linux/platform_internal.h#L81) will enable the boundary check with hardware trap feature, for 32-bit platforms it's automatically disabled even when the flag is set to 0, and the wamrc tool will generate AOT code without boundary check instructions in all 64-bit targets except SGX to improve performance. The boundary check includes linear memory access boundary and native stack access boundary, if `WAMR_DISABLE_STACK_HW_BOUND_CHECK` below isn't set.
 
 #### **Disable native stack boundary check with hardware trap**
 - **WAMR_DISABLE_STACK_HW_BOUND_CHECK**=1/0, default to enable if not set and supported by platform, same as `WAMR_DISABLE_HW_BOUND_CHECK`.
@@ -198,6 +198,13 @@ Currently we only profile the memory consumption of module, module_instance and
 - **WAMR_BUILD_STACK_GUARD_SIZE**=n, default to N/A if not set.
 > Note: By default, the stack guard size is 1K (1024) or 24K (if uvwasi enabled).
 
+### **Disable the writing linear memory base address to x86 GS segment register
+- **WAMR_DISABLE_WRITE_GS_BASE**=1/0, default to enable if not set and supported by platform
+> Note: by default only platform [linux x86-64](https://github.com/bytecodealliance/wasm-micro-runtime/blob/5fb5119239220b0803e7045ca49b0a29fe65e70e/core/shared/platform/linux/platform_internal.h#L67) will enable this feature, for 32-bit platforms it's automatically disabled even when the flag is set to 0. In linux x86-64, writing the linear memory base address to x86 GS segment register may be used to speedup the linear memory access for LLVM AOT/JIT, when `--enable-segue=[<flags>]` option is added for `wamrc` or `iwasm`.
+
+### **Enable running PGO(Profile-Guided Optimization) instrumented AOT file**
+- **WAMR_BUILD_STATIC_PGO**=1/0, default to disable if not set
+
 **Combination of configurations:**
 
 We can combine the configurations. For example, if we want to disable interpreter, enable AOT and WASI, we can run command:

+ 74 - 0
doc/perf_tune.md

@@ -0,0 +1,74 @@
+# Tune the performance of running wasm/aot file
+
+Normally there are some methods to tune the performance:
+
+## 1. Use `wasm-opt` tool 
+
+Download the [binaryen release](https://github.com/WebAssembly/binaryen/releases), and use the `wasm-opt` tool in it to optimize the wasm file, for example:
+
+```bash
+wasm-opt -O4 -o test_opt.wasm test.wasm
+```
+
+## 2. Enable `simd128` option when compiling wasm source files
+
+WebAssembly [128-bit SIMD](https://github.com/WebAssembly/simd) is supported by WAMR on x86-64 and aarch64 targets, enabling it when compiling wasm source files may greatly improve the performance. For [wasi-sdk](https://github.com/WebAssembly/wasi-sdk) and [emsdk](https://github.com/emscripten-core/emsdk), please add `-msimd128` flag for `clang` and `emcc/em++`:
+
+```bash
+/opt/wasi-sdk/bin/clang -msimd128 -O3 -o <wasm_file> <c/c++ source files>
+
+emcc -msimd128 -O3 -o <wasm_file> <c/c++ source files>
+```
+
+## 3. Enable segue optimization for wamrc when generating the aot file
+
+[Segue](https://plas2022.github.io/files/pdf/SegueColorGuard.pdf) is an optimization technology which uses x86 segment register to store the WebAssembly linear memory base address, so as to remove most of the cost of SFI (Software-based Fault Isolation) base addition and free up a general purpose register, by this way it may:
+- Improve the performance of JIT/AOT
+- Reduce the footprint of JIT/AOT, the JIT/AOT code generated is smaller
+- Reduce the compilation time of JIT/AOT
+
+Currently it is supported on linux x86-64, developer can use `--enable-segue=[<flags>]` for wamrc:
+```bash
+wamrc --enable-segue -o aot_file wasm_file
+# or
+wamrc --enable-segue=[<flags>] -o aot_file wasm_file
+```
+`flags` can be: i32.load, i64.load, f32.load, f64.load, v128.load, i32.store, i64.store, f32.store, f64.store and v128.store, use comma to separate them, e.g. `--enable-segue=i32.load,i64.store`, and `--enable-segue` means all flags are added.
+
+> Note: Normally for most cases, using `--enable-segue` is enough, but for some cases, using `--enable-segue=<flags>` may be better, for example for CoreMark benchmark, `--enable-segue=i32.store` may lead to better performance than `--enable-segue`.
+
+## 4. Enable segue optimization for iwasm when running wasm file
+
+Similar to segue optimization for wamrc, run:
+``` bash
+iwasm --enable-segue wasm_file      (iwasm is built with llvm-jit enabled)
+# or
+iwasm --enable-segue=[<flags>] wasm_file
+```
+
+## 5. Use the AOT static PGO method
+
+LLVM PGO (Profile-Guided Optimization) allows the compiler to better optimize code for how it actually runs. WAMR supports AOT static PGO, currently it is tested on Linux x86-64 and x86-32. The basic steps are:
+
+1. Use `wamrc --enable-llvm-pgo -o <aot_file_of_pgo> <wasm_file>` to generate an instrumented aot file.
+
+2. Compile iwasm with `cmake -DWAMR_BUILD_STATIC_PGO=1` and run `iwasm --gen-prof-file=<raw_profile_file> <aot_file_of_pgo>` to generate the raw profile file.
+
+> Note: Directly dumping raw profile data to file system may be unsupported in some environments, developer can dump the profile data into memory buffer instead and try outputting it through network (e.g. uart or socket):
+```C
+uint32_t
+wasm_runtime_get_pgo_prof_data_size(wasm_module_inst_t module_inst);
+
+uint32_t
+wasm_runtime_dump_pgo_prof_data_to_buf(wasm_module_inst_t module_inst, char *buf, uint32_t len);
+```
+
+3. Install or compile `llvm-profdata` tool,refer to [here](../tests/benchmarks/README.md#install-llvm-profdata) for the details.
+
+4. Run `llvm-profdata merge -output=<profile_file> <raw_profile_file>` to merge the raw profile file into the profile file.
+
+5. Run `wamrc --use-prof-file=<profile_file> -o <aot_file> <wasm_file>` to generate the optimized aot file.
+
+6. Run the optimized aot_file: `iwasm <aot_file>`.
+
+Developer can refer to the `test_pgo.sh` files under each benchmark folder for more details, e.g. [test_pgo.sh](../tests/benchmarks/coremark/test_pgo.sh) of CoreMark benchmark.

+ 1 - 1
product-mini/platforms/darwin/CMakeLists.txt

@@ -34,7 +34,7 @@ if (NOT CMAKE_BUILD_TYPE)
   set(CMAKE_BUILD_TYPE Release)
 endif ()
 
-set(CMAKE_CXX_STANDARD 14)
+set(CMAKE_CXX_STANDARD 17)
 
 if (NOT DEFINED WAMR_BUILD_INTERP)
   # Enable Interpreter by default

+ 1 - 1
product-mini/platforms/freebsd/CMakeLists.txt

@@ -34,7 +34,7 @@ if (NOT CMAKE_BUILD_TYPE)
   set(CMAKE_BUILD_TYPE Release)
 endif ()
 
-set(CMAKE_CXX_STANDARD 14)
+set(CMAKE_CXX_STANDARD 17)
 
 if (NOT DEFINED WAMR_BUILD_INTERP)
   # Enable Interpreter by default

+ 1 - 1
product-mini/platforms/ios/CMakeLists.txt

@@ -41,7 +41,7 @@ if (NOT CMAKE_BUILD_TYPE)
   set(CMAKE_BUILD_TYPE Release)
 endif ()
 
-set(CMAKE_CXX_STANDARD 14)
+set(CMAKE_CXX_STANDARD 17)
 
 if (NOT DEFINED WAMR_BUILD_INTERP)
   # Enable Interpreter by default

+ 17 - 0
product-mini/platforms/linux-sgx/CMakeLists.txt

@@ -89,6 +89,11 @@ if (NOT DEFINED WAMR_BUILD_SGX_IPFS)
   set (WAMR_BUILD_SGX_IPFS 0)
 endif ()
 
+if (NOT DEFINED WAMR_BUILD_STATIC_PGO)
+  # Disable static PGO by default
+  set (WAMR_BUILD_STATIC_PGO 0)
+endif ()
+
 set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--gc-sections")
 set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -std=gnu11 -ffunction-sections -fdata-sections \
                                      -Wall -Wno-unused-parameter -Wno-pedantic \
@@ -107,6 +112,18 @@ add_custom_command (
 
 add_custom_target (vmlib_untrusted ALL DEPENDS libvmlib_untrusted.a)
 
+if ((WAMR_BUILD_STATIC_PGO EQUAL 1) AND (WAMR_BUILD_AOT EQUAL 1))
+    execute_process(
+        COMMAND bash -c "sed -i -E 's/^WAMR_BUILD_STATIC_PGO = 0/WAMR_BUILD_STATIC_PGO = 1/g' ${CMAKE_CURRENT_SOURCE_DIR}/enclave-sample/Makefile"
+        OUTPUT_VARIABLE cmdOutput
+    )
+else()
+    execute_process(
+        COMMAND bash -c "sed -i -E 's/^WAMR_BUILD_STATIC_PGO = 1/WAMR_BUILD_STATIC_PGO = 0/g' ${CMAKE_CURRENT_SOURCE_DIR}/enclave-sample/Makefile"
+        OUTPUT_VARIABLE cmdOutput
+    )
+endif()
+
 if (DEFINED WAMR_BUILD_GLOBAL_HEAP_POOL)
   execute_process(
       COMMAND bash -c "sed -i -E 's/^WAMR_BUILD_GLOBAL_HEAP_POOL = .*/WAMR_BUILD_GLOBAL_HEAP_POOL = ${WAMR_BUILD_GLOBAL_HEAP_POOL}/g' ${CMAKE_CURRENT_SOURCE_DIR}/enclave-sample/Makefile"

+ 81 - 1
product-mini/platforms/linux-sgx/enclave-sample/App/App.cpp

@@ -232,6 +232,9 @@ print_help()
     printf("                         for example:\n");
     printf("                           --addr-pool=1.2.3.4/15,2.3.4.5/16\n");
     printf("  --max-threads=n        Set maximum thread number per cluster, default is 4\n");
+#if WASM_ENABLE_STATIC_PGO != 0
+    printf("  --gen-prof-file=<path> Generate LLVM PGO (Profile-Guided Optimization) profile file\n");
+#endif
     printf("  --version              Show version information\n");
     return 1;
 }
@@ -294,6 +297,10 @@ typedef enum EcallCmd {
     CMD_SET_WASI_ARGS,        /* wasm_runtime_set_wasi_args() */
     CMD_SET_LOG_LEVEL,        /* bh_log_set_verbose_level() */
     CMD_GET_VERSION,          /* wasm_runtime_get_version() */
+#if WASM_ENABLE_STATIC_PGO != 0
+    CMD_GET_PGO_PROF_BUF_SIZE,  /* wasm_runtime_get_pro_prof_data_size() */
+    CMD_DUMP_PGO_PROF_BUF_DATA, /* wasm_runtime_dump_pgo_prof_data_to_buf() */
+#endif
 } EcallCmd;
 
 static void
@@ -598,6 +605,64 @@ get_version(uint64_t *major, uint64_t *minor, uint64_t *patch)
     *patch = ecall_args[2];
 }
 
+#if WASM_ENABLE_STATIC_PGO != 0
+static void
+dump_pgo_prof_data(void *module_inst, const char *path)
+{
+    char *buf;
+    uint32_t len;
+    FILE *file;
+
+    uint64_t ecall_args[1];
+    ecall_args[0] = (uint64_t)(uintptr_t)module_inst;
+    if (SGX_SUCCESS
+        != ecall_handle_command(g_eid, CMD_GET_PGO_PROF_BUF_SIZE,
+                                (uint8_t *)ecall_args, sizeof(ecall_args))) {
+        printf("Call ecall_handle_command() failed.\n");
+        return;
+    }
+    if (!(len = ecall_args[0])) {
+        printf("failed to get LLVM PGO profile data size\n");
+        return;
+    }
+
+    if (!(buf = (char *)malloc(len))) {
+        printf("allocate memory failed\n");
+        return;
+    }
+
+    uint64_t ecall_args_2[3];
+    ecall_args_2[0] = (uint64_t)(uintptr_t)module_inst;
+    ecall_args_2[1] = (uint64_t)(uintptr_t)buf;
+    ecall_args_2[2] = len;
+    if (SGX_SUCCESS
+        != ecall_handle_command(g_eid, CMD_DUMP_PGO_PROF_BUF_DATA,
+                                (uint8_t *)ecall_args_2,
+                                sizeof(ecall_args_2))) {
+        printf("Call ecall_handle_command() failed.\n");
+        free(buf);
+        return;
+    }
+    if (!(len = ecall_args_2[0])) {
+        printf("failed to dump LLVM PGO profile data\n");
+        free(buf);
+        return;
+    }
+
+    if (!(file = fopen(path, "wb"))) {
+        printf("failed to create file %s", path);
+        free(buf);
+        return;
+    }
+    fwrite(buf, len, 1, file);
+    fclose(file);
+
+    free(buf);
+
+    printf("LLVM raw profile file %s was generated.\n", path);
+}
+#endif
+
 int
 main(int argc, char *argv[])
 {
@@ -619,6 +684,9 @@ main(int argc, char *argv[])
     const char *addr_pool[8] = { NULL };
     uint32_t addr_pool_size = 0;
     uint32_t max_thread_num = 4;
+#if WASM_ENABLE_STATIC_PGO != 0
+    const char *gen_prof_file = NULL;
+#endif
 
     if (enclave_init(&g_eid) < 0) {
         std::cout << "Fail to initialize enclave." << std::endl;
@@ -718,6 +786,13 @@ main(int argc, char *argv[])
                 return print_help();
             max_thread_num = atoi(argv[0] + 14);
         }
+#if WASM_ENABLE_STATIC_PGO != 0
+        else if (!strncmp(argv[0], "--gen-prof-file=", 16)) {
+            if (argv[0][16] == '\0')
+                return print_help();
+            gen_prof_file = argv[0] + 16;
+        }
+#endif
         else if (!strncmp(argv[0], "--version", 9)) {
             uint64_t major = 0, minor = 0, patch = 0;
             get_version(&major, &minor, &patch);
@@ -779,6 +854,11 @@ main(int argc, char *argv[])
     else
         app_instance_main(wasm_module_inst, argc, argv);
 
+#if WASM_ENABLE_STATIC_PGO != 0
+    if (gen_prof_file)
+        dump_pgo_prof_data(wasm_module_inst, gen_prof_file);
+#endif
+
     ret = 0;
 
     /* Deinstantiate module */
@@ -836,7 +916,7 @@ wamr_pal_create_process(struct wamr_pal_create_process_args *args)
     int stdoutfd = -1;
     int stderrfd = -1;
 
-    int argc = 2;
+    const int argc = 2;
     char *argv[argc] = { (char *)"./iwasm", (char *)args->argv[0] };
 
     uint8_t *wasm_files_buf = NULL;

+ 42 - 0
product-mini/platforms/linux-sgx/enclave-sample/Enclave/Enclave.cpp

@@ -49,6 +49,10 @@ typedef enum EcallCmd {
     CMD_SET_WASI_ARGS,        /* wasm_runtime_set_wasi_args() */
     CMD_SET_LOG_LEVEL,        /* bh_log_set_verbose_level() */
     CMD_GET_VERSION,          /* wasm_runtime_get_version() */
+#if WASM_ENABLE_STATIC_PGO != 0
+    CMD_GET_PGO_PROF_BUF_SIZE,  /* wasm_runtime_get_pro_prof_data_size() */
+    CMD_DUMP_PGO_PROF_BUF_DATA, /* wasm_runtime_dump_pgo_prof_data_to_buf() */
+#endif
 } EcallCmd;
 
 typedef struct EnclaveModule {
@@ -597,6 +601,36 @@ handle_cmd_get_version(uint64 *args, uint32 argc)
     args[2] = patch;
 }
 
+#if WASM_ENABLE_STATIC_PGO != 0
+static void
+handle_cmd_get_pgo_prof_buf_size(uint64 *args, int32 argc)
+{
+    wasm_module_inst_t module_inst = *(wasm_module_inst_t *)args;
+    uint32 buf_len;
+
+    bh_assert(argc == 1);
+
+    buf_len = wasm_runtime_get_pgo_prof_data_size(module_inst);
+    args[0] = buf_len;
+}
+
+static void
+handle_cmd_get_pro_prof_buf_data(uint64 *args, int32 argc)
+{
+    uint64 *args_org = args;
+    wasm_module_inst_t module_inst = *(wasm_module_inst_t *)args++;
+    char *buf = *(char **)args++;
+    uint32 len = *(uint32 *)args++;
+    uint32 bytes_dumped;
+
+    bh_assert(argc == 3);
+
+    bytes_dumped =
+        wasm_runtime_dump_pgo_prof_data_to_buf(module_inst, buf, len);
+    args_org[0] = bytes_dumped;
+}
+#endif
+
 void
 ecall_handle_command(unsigned cmd, unsigned char *cmd_buf,
                      unsigned cmd_buf_size)
@@ -647,6 +681,14 @@ ecall_handle_command(unsigned cmd, unsigned char *cmd_buf,
         case CMD_GET_VERSION:
             handle_cmd_get_version(args, argc);
             break;
+#if WASM_ENABLE_STATIC_PGO != 0
+        case CMD_GET_PGO_PROF_BUF_SIZE:
+            handle_cmd_get_pgo_prof_buf_size(args, argc);
+            break;
+        case CMD_DUMP_PGO_PROF_BUF_DATA:
+            handle_cmd_get_pro_prof_buf_data(args, argc);
+            break;
+#endif
         default:
             LOG_ERROR("Unknown command %d\n", cmd);
             break;

+ 3 - 2
product-mini/platforms/linux-sgx/enclave-sample/Makefile

@@ -15,6 +15,7 @@ WAMR_BUILD_SGX_IPFS = 0
 WAMR_BUILD_LIB_RATS = 0
 WAMR_BUILD_GLOBAL_HEAP_POOL = 0
 WAMR_BUILD_GLOBAL_HEAP_SIZE = 10485760
+WAMR_BUILD_STATIC_PGO = 0
 
 VMLIB_BUILD_DIR ?= $(CURDIR)/../build
 LIB_RATS_SRC ?= $(VMLIB_BUILD_DIR)/_deps/librats-build
@@ -65,7 +66,7 @@ ifeq ($(WAMR_BUILD_LIB_RATS), 1)
 	App_Include_Paths += -I$(LIB_RATS_INCLUDE_DIR)
 endif
 
-App_C_Flags := $(SGX_COMMON_CFLAGS) -fPIC -Wno-attributes $(App_Include_Paths)
+App_C_Flags := $(SGX_COMMON_CFLAGS) -fPIC -Wno-attributes $(App_Include_Paths) -DWASM_ENABLE_STATIC_PGO=$(WAMR_BUILD_STATIC_PGO)
 
 # Three configuration modes - Debug, prerelease, release
 #   Debug - Macro DEBUG enabled.
@@ -134,7 +135,7 @@ ifeq ($(WAMR_BUILD_LIB_RATS), 1)
 	Enclave_Include_Paths += -I$(LIB_RATS_INCLUDE_DIR) -I$(SGX_SSL)/include
 endif
 
-Enclave_C_Flags := $(SGX_COMMON_CFLAGS) -nostdinc -fvisibility=hidden -fpie -fstack-protector $(Enclave_Include_Paths) -DWASM_GLOBAL_HEAP_SIZE=$(WAMR_BUILD_GLOBAL_HEAP_SIZE) -DWASM_ENABLE_GLOBAL_HEAP_POOL=$(WAMR_BUILD_GLOBAL_HEAP_POOL) -DWASM_ENABLE_LIB_RATS=$(WAMR_BUILD_LIB_RATS)
+Enclave_C_Flags := $(SGX_COMMON_CFLAGS) -nostdinc -fvisibility=hidden -fpie -fstack-protector $(Enclave_Include_Paths) -DWASM_GLOBAL_HEAP_SIZE=$(WAMR_BUILD_GLOBAL_HEAP_SIZE) -DWASM_ENABLE_GLOBAL_HEAP_POOL=$(WAMR_BUILD_GLOBAL_HEAP_POOL) -DWASM_ENABLE_LIB_RATS=$(WAMR_BUILD_LIB_RATS) -DWASM_ENABLE_STATIC_PGO=$(WAMR_BUILD_STATIC_PGO)
 ifeq ($(SPEC_TEST), 1)
 	Enclave_C_Flags += -DWASM_ENABLE_SPEC_TEST=1
 else

+ 3 - 3
product-mini/platforms/linux/CMakeLists.txt

@@ -16,7 +16,7 @@ set (CMAKE_SHARED_LIBRARY_LINK_C_FLAGS "")
 set (CMAKE_SHARED_LIBRARY_LINK_CXX_FLAGS "")
 
 set (CMAKE_C_STANDARD 99)
-set (CMAKE_CXX_STANDARD 14)
+set (CMAKE_CXX_STANDARD 17)
 
 # Set WAMR_BUILD_TARGET, currently values supported:
 # "X86_64", "AMD_64", "X86_32", "AARCH64[sub]", "ARM[sub]", "THUMB[sub]",
@@ -155,7 +155,7 @@ set_target_properties (iwasm PROPERTIES POSITION_INDEPENDENT_CODE ON)
 
 install (TARGETS iwasm DESTINATION bin)
 
-target_link_libraries (iwasm vmlib ${LLVM_AVAILABLE_LIBS} ${UV_A_LIBS} -lm -ldl -lpthread)
+target_link_libraries (iwasm vmlib ${LLVM_AVAILABLE_LIBS} ${UV_A_LIBS} ${WASI_NN_LIBS} -lm -ldl -lpthread)
 
 add_library (libiwasm SHARED ${WAMR_RUNTIME_LIB_SOURCE})
 
@@ -163,4 +163,4 @@ install (TARGETS libiwasm DESTINATION lib)
 
 set_target_properties (libiwasm PROPERTIES OUTPUT_NAME iwasm)
 
-target_link_libraries (libiwasm ${LLVM_AVAILABLE_LIBS} ${UV_A_LIBS} -lm -ldl -lpthread)
+target_link_libraries (libiwasm ${LLVM_AVAILABLE_LIBS} ${UV_A_LIBS} ${WASI_NN_LIBS} -lm -ldl -lpthread)

+ 22 - 0
samples/mem_allocator/CMakeLists.txt

@@ -0,0 +1,22 @@
+# Copyright (C) 2023 Midokura Japan KK.  All rights reserved.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+cmake_minimum_required(VERSION 3.0)
+project(mem_allocator_create)
+
+string (TOLOWER ${CMAKE_HOST_SYSTEM_NAME} WAMR_BUILD_PLATFORM)
+if(APPLE)
+  add_definitions(-DBH_PLATFORM_DARWIN)
+endif()
+
+set(WAMR_BUILD_INTERP 1)
+set(WAMR_BUILD_LIBC_BUILTIN 0)
+
+set(WAMR_ROOT_DIR ${CMAKE_CURRENT_SOURCE_DIR}/../..)
+include(${WAMR_ROOT_DIR}/build-scripts/runtime_lib.cmake)
+
+add_library(vmlib ${WAMR_RUNTIME_LIB_SOURCE})
+
+add_executable(mem_alloc_test main.c)
+
+target_link_libraries(mem_alloc_test vmlib -lm -lpthread)

+ 58 - 0
samples/mem_allocator/main.c

@@ -0,0 +1,58 @@
+/*
+ * Copyright (C) 2023 Midokura Japan KK.  All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+#include "mem_alloc.h"
+
+char store[1000];
+
+int
+main(int argc, char **argv)
+{
+    mem_allocator_t a = mem_allocator_create(store, sizeof(store));
+    uint8_t *p;
+    uint8_t *p2;
+
+    p = mem_allocator_malloc(a, 256);
+    printf("%p\n", p);
+    if (p == NULL) {
+        exit(1);
+    }
+    p = mem_allocator_realloc(a, p, 256 + 12);
+    printf("%p\n", p);
+    if (p == NULL) {
+        exit(1);
+    }
+
+    /*
+     * write some values to confuse the ems allocator.
+     *
+     * hmu = p + 256
+     * hmu_set_ut(hmu, HMU_FC)
+     * hmu_set_size(hmu, 256)
+     * hmu_set_free_size(hmu)
+     */
+    *(uint32_t *)(p + 256) = (1 << 30) | 0x20;
+    *(uint32_t *)(p + 256 + 12 - 4) = 12;
+
+    p2 = mem_allocator_malloc(a, 256);
+    printf("%p\n", p2);
+    if (p2 == NULL) {
+        exit(1);
+    }
+    mem_allocator_free(a, p2);
+
+    p2 = mem_allocator_malloc(a, 256);
+    printf("%p\n", p2);
+    if (p2 == NULL) {
+        exit(1);
+    }
+    mem_allocator_free(a, p2);
+
+    mem_allocator_free(a, p);
+}

+ 1 - 1
samples/wasm-c-api/CMakeLists.txt

@@ -16,7 +16,7 @@ if(NOT CMAKE_BUILD_TYPE)
   set(CMAKE_BUILD_TYPE Release)
 endif()
 
-set(CMAKE_CXX_STANDARD 14)
+set(CMAKE_CXX_STANDARD 17)
 ################  runtime settings  ################
 
 string (TOLOWER ${CMAKE_HOST_SYSTEM_NAME} WAMR_BUILD_PLATFORM)

+ 6 - 1
test-tools/wamr-ide/VSCode-Extension/.gitignore

@@ -4,4 +4,9 @@ node_modules
 .vscode-test/
 *.vsix
 package-lock.json
-src/test
+.vscode
+resource/debug/**
+!resource/debug/darwin/.placeholder
+!resource/debug/linux/.placeholder
+!resource/debug/windows/.placeholder
+resource/test/test.wasm

+ 1 - 0
test-tools/wamr-ide/VSCode-Extension/.npmrc

@@ -0,0 +1 @@
+engine-strict=true

+ 11 - 0
test-tools/wamr-ide/VSCode-Extension/.vscode/launch.json

@@ -10,6 +10,17 @@
             "args": ["--extensionDevelopmentPath=${workspaceFolder}"],
             "outFiles": ["${workspaceFolder}/out/**/*.js"],
             "preLaunchTask": "${defaultBuildTask}"
+        },
+        {
+            "name": "Launch Extension Tests",
+            "type": "extensionHost",
+            "request": "launch",
+            "runtimeExecutable": "${execPath}",
+            "args": [
+                "--extensionDevelopmentPath=${workspaceFolder}",
+                "--extensionTestsPath=${workspaceFolder}/out/test/suite/index"
+            ],
+            "outFiles": ["${workspaceFolder}/out/test/**/*.js"]
         }
     ]
 }

+ 8 - 0
test-tools/wamr-ide/VSCode-Extension/.vscodeignore

@@ -9,3 +9,11 @@ out/test/**
 **/.eslintrc.json
 **/*.map
 **/*.ts
+
+src
+
+resource/test
+resource/debug/**
+!resource/debug/darwin/.placeholder
+!resource/debug/linux/.placeholder
+!resource/debug/windows/.placeholder

+ 747 - 0
test-tools/wamr-ide/VSCode-Extension/formatters/rust.py

@@ -0,0 +1,747 @@
+'''
+Copyright (c) 2016 Vadim Chugunov
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+'''
+
+from __future__ import print_function, division
+import sys
+import logging
+import lldb
+import weakref
+
+if sys.version_info[0] == 2:
+    # python2-based LLDB accepts utf8-encoded ascii strings only.
+    def to_lldb_str(s): return s.encode('utf8', 'backslashreplace') if isinstance(s, unicode) else s
+    range = xrange
+else:
+    to_lldb_str = str
+
+log = logging.getLogger(__name__)
+
+module = sys.modules[__name__]
+rust_category = None
+
+
+def initialize_category(debugger, internal_dict):
+    global module, rust_category
+
+    rust_category = debugger.CreateCategory('Rust')
+    # rust_category.AddLanguage(lldb.eLanguageTypeRust)
+    rust_category.SetEnabled(True)
+
+    attach_summary_to_type(tuple_summary_provider, r'^\(.*\)$', True)
+    attach_synthetic_to_type(MsvcTupleSynthProvider, r'^tuple\$?<.+>$',
+                             True)  # *-windows-msvc uses this name since 1.47
+
+    attach_synthetic_to_type(StrSliceSynthProvider, '&str')
+    attach_synthetic_to_type(StrSliceSynthProvider, 'str*')
+    attach_synthetic_to_type(StrSliceSynthProvider, 'str')  # *-windows-msvc uses this name since 1.5?
+
+    attach_synthetic_to_type(StdStringSynthProvider, '^(collections|alloc)::string::String$', True)
+    attach_synthetic_to_type(StdVectorSynthProvider, r'^(collections|alloc)::vec::Vec<.+>$', True)
+    attach_synthetic_to_type(StdVecDequeSynthProvider,
+                             r'^(collections|alloc::collections)::vec_deque::VecDeque<.+>$', True)
+
+    attach_synthetic_to_type(MsvcEnumSynthProvider, r'^enum\$<.+>$', True)
+    attach_synthetic_to_type(MsvcEnum2SynthProvider, r'^enum2\$<.+>$', True)
+
+    attach_synthetic_to_type(SliceSynthProvider, r'^&(mut *)?\[.*\]$', True)
+    attach_synthetic_to_type(MsvcSliceSynthProvider, r'^(mut *)?slice\$?<.+>.*$', True)
+
+    attach_synthetic_to_type(StdCStringSynthProvider, '^(std|alloc)::ffi::c_str::CString$', True)
+    attach_synthetic_to_type(StdCStrSynthProvider, '^&?(std|core)::ffi::c_str::CStr$', True)
+
+    attach_synthetic_to_type(StdOsStringSynthProvider, 'std::ffi::os_str::OsString')
+    attach_synthetic_to_type(StdOsStrSynthProvider, '^&?std::ffi::os_str::OsStr', True)
+
+    attach_synthetic_to_type(StdPathBufSynthProvider, 'std::path::PathBuf')
+    attach_synthetic_to_type(StdPathSynthProvider, '^&?std::path::Path', True)
+
+    attach_synthetic_to_type(StdRcSynthProvider, r'^alloc::rc::Rc<.+>$', True)
+    attach_synthetic_to_type(StdRcSynthProvider, r'^alloc::rc::Weak<.+>$', True)
+    attach_synthetic_to_type(StdArcSynthProvider, r'^alloc::(sync|arc)::Arc<.+>$', True)
+    attach_synthetic_to_type(StdArcSynthProvider, r'^alloc::(sync|arc)::Weak<.+>$', True)
+    attach_synthetic_to_type(StdMutexSynthProvider, r'^std::sync::mutex::Mutex<.+>$', True)
+
+    attach_synthetic_to_type(StdCellSynthProvider, r'^core::cell::Cell<.+>$', True)
+    attach_synthetic_to_type(StdRefCellSynthProvider, r'^core::cell::RefCell<.+>$', True)
+    attach_synthetic_to_type(StdRefCellBorrowSynthProvider, r'^core::cell::Ref<.+>$', True)
+    attach_synthetic_to_type(StdRefCellBorrowSynthProvider, r'^core::cell::RefMut<.+>$', True)
+
+    attach_synthetic_to_type(StdHashMapSynthProvider, r'^std::collections::hash::map::HashMap<.+>$', True)
+    attach_synthetic_to_type(StdHashSetSynthProvider, r'^std::collections::hash::set::HashSet<.+>$', True)
+
+    attach_synthetic_to_type(GenericEnumSynthProvider, r'^core::option::Option<.+>$', True)
+    attach_synthetic_to_type(GenericEnumSynthProvider, r'^core::result::Result<.+>$', True)
+    attach_synthetic_to_type(GenericEnumSynthProvider, r'^alloc::borrow::Cow<.+>$', True)
+
+    if 'rust' in internal_dict.get('source_languages', []):
+        lldb.SBDebugger.SetInternalVariable('target.process.thread.step-avoid-regexp',
+                                            '^<?(std|core|alloc)::', debugger.GetInstanceName())
+
+
+def attach_synthetic_to_type(synth_class, type_name, is_regex=False):
+    global module, rust_category
+    # log.debug('attaching synthetic %s to "%s", is_regex=%s', synth_class.__name__, type_name, is_regex)
+    synth = lldb.SBTypeSynthetic.CreateWithClassName(__name__ + '.' + synth_class.__name__)
+    synth.SetOptions(lldb.eTypeOptionCascade)
+    rust_category.AddTypeSynthetic(lldb.SBTypeNameSpecifier(type_name, is_regex), synth)
+
+    def summary_fn(valobj, dict): return get_synth_summary(synth_class, valobj, dict)
+    # LLDB accesses summary fn's by name, so we need to create a unique one.
+    summary_fn.__name__ = '_get_synth_summary_' + synth_class.__name__
+    setattr(module, summary_fn.__name__, summary_fn)
+    attach_summary_to_type(summary_fn, type_name, is_regex)
+
+
+def attach_summary_to_type(summary_fn, type_name, is_regex=False):
+    global module, rust_category
+    # log.debug('attaching summary %s to "%s", is_regex=%s', summary_fn.__name__, type_name, is_regex)
+    summary = lldb.SBTypeSummary.CreateWithFunctionName(__name__ + '.' + summary_fn.__name__)
+    summary.SetOptions(lldb.eTypeOptionCascade)
+    rust_category.AddTypeSummary(lldb.SBTypeNameSpecifier(type_name, is_regex), summary)
+
+
+# 'get_summary' is annoyingly not a part of the standard LLDB synth provider API.
+# This trick allows us to share data extraction logic between synth providers and their sibling summary providers.
+def get_synth_summary(synth_class, valobj, dict):
+    try:
+        obj_id = valobj.GetIndexOfChildWithName('$$object-id$$')
+        summary = RustSynthProvider.synth_by_id[obj_id].get_summary()
+        return to_lldb_str(summary)
+    except Exception as e:
+        log.exception('%s', e)
+        raise
+
+
+# Chained GetChildMemberWithName lookups
+def gcm(valobj, *chain):
+    for name in chain:
+        valobj = valobj.GetChildMemberWithName(name)
+    return valobj
+
+
+# Get a pointer out of core::ptr::Unique<T>
+def read_unique_ptr(valobj):
+    pointer = valobj.GetChildMemberWithName('pointer')
+    if pointer.TypeIsPointerType():  # Between 1.33 and 1.63 pointer was just *const T
+        return pointer
+    return pointer.GetChildAtIndex(0)
+
+
+def string_from_ptr(pointer, length):
+    if length <= 0:
+        return u''
+    error = lldb.SBError()
+    process = pointer.GetProcess()
+    data = process.ReadMemory(pointer.GetValueAsUnsigned(), length, error)
+    if error.Success():
+        return data.decode('utf8', 'replace')
+    else:
+        log.error('ReadMemory error: %s', error.GetCString())
+
+
+def get_template_params(type_name):
+    params = []
+    level = 0
+    start = 0
+    for i, c in enumerate(type_name):
+        if c == '<':
+            level += 1
+            if level == 1:
+                start = i + 1
+        elif c == '>':
+            level -= 1
+            if level == 0:
+                params.append(type_name[start:i].strip())
+        elif c == ',' and level == 1:
+            params.append(type_name[start:i].strip())
+            start = i + 1
+    return params
+
+
+def obj_summary(valobj, unavailable='{...}'):
+    summary = valobj.GetSummary()
+    if summary is not None:
+        return summary
+    summary = valobj.GetValue()
+    if summary is not None:
+        return summary
+    return unavailable
+
+
+def sequence_summary(childern, maxsize=32):
+    s = ''
+    for child in childern:
+        if len(s) > 0:
+            s += ', '
+        s += obj_summary(child)
+        if len(s) > maxsize:
+            s += ', ...'
+            break
+    return s
+
+
+def tuple_summary(obj, skip_first=0):
+    fields = [obj_summary(obj.GetChildAtIndex(i)) for i in range(skip_first, obj.GetNumChildren())]
+    return '(%s)' % ', '.join(fields)
+
+
+# ----- Summaries -----
+
+def tuple_summary_provider(valobj, dict={}):
+    return tuple_summary(valobj)
+
+
+# ----- Synth providers ------
+
+
+class RustSynthProvider(object):
+    synth_by_id = weakref.WeakValueDictionary()
+    next_id = 0
+
+    def __init__(self, valobj, dict={}):
+        self.valobj = valobj
+        self.obj_id = RustSynthProvider.next_id
+        RustSynthProvider.synth_by_id[self.obj_id] = self
+        RustSynthProvider.next_id += 1
+
+    def update(self):
+        return True
+
+    def has_children(self):
+        return False
+
+    def num_children(self):
+        return 0
+
+    def get_child_at_index(self, index):
+        return None
+
+    def get_child_index(self, name):
+        if name == '$$object-id$$':
+            return self.obj_id
+
+        try:
+            return self.get_index_of_child(name)
+        except Exception as e:
+            log.exception('%s', e)
+            raise
+
+    def get_summary(self):
+        return None
+
+
+class ArrayLikeSynthProvider(RustSynthProvider):
+    '''Base class for providers that represent array-like objects'''
+
+    def update(self):
+        self.ptr, self.len = self.ptr_and_len(self.valobj)  # type: ignore
+        self.item_type = self.ptr.GetType().GetPointeeType()
+        self.item_size = self.item_type.GetByteSize()
+
+    def ptr_and_len(self, obj):
+        pass  # abstract
+
+    def num_children(self):
+        return self.len
+
+    def has_children(self):
+        return True
+
+    def get_child_at_index(self, index):
+        try:
+            if not 0 <= index < self.len:
+                return None
+            offset = index * self.item_size
+            return self.ptr.CreateChildAtOffset('[%s]' % index, offset, self.item_type)
+        except Exception as e:
+            log.exception('%s', e)
+            raise
+
+    def get_index_of_child(self, name):
+        return int(name.lstrip('[').rstrip(']'))
+
+    def get_summary(self):
+        return '(%d)' % (self.len,)
+
+
+class StdVectorSynthProvider(ArrayLikeSynthProvider):
+    def ptr_and_len(self, vec):
+        return (
+            read_unique_ptr(gcm(vec, 'buf', 'ptr')),
+            gcm(vec, 'len').GetValueAsUnsigned()
+        )
+
+    def get_summary(self):
+        return '(%d) vec![%s]' % (self.len, sequence_summary((self.get_child_at_index(i) for i in range(self.len))))
+
+
+class StdVecDequeSynthProvider(RustSynthProvider):
+    def update(self):
+        self.ptr = read_unique_ptr(gcm(self.valobj, 'buf', 'ptr'))
+        self.cap = gcm(self.valobj, 'buf', 'cap').GetValueAsUnsigned()
+
+        head = gcm(self.valobj, 'head').GetValueAsUnsigned()
+
+        # rust 1.67 changed from a head, tail implementation to a head, length impl
+        # https://github.com/rust-lang/rust/pull/102991
+        vd_len = gcm(self.valobj, 'len')
+        if vd_len.IsValid():
+            self.len = vd_len.GetValueAsUnsigned()
+            self.startptr = head
+        else:
+            tail = gcm(self.valobj, 'tail').GetValueAsUnsigned()
+            self.len = head - tail
+            self.startptr = tail
+
+        self.item_type = self.ptr.GetType().GetPointeeType()
+        self.item_size = self.item_type.GetByteSize()
+
+    def num_children(self):
+        return self.len
+
+    def has_children(self):
+        return True
+
+    def get_child_at_index(self, index):
+        try:
+            if not 0 <= index < self.num_children():
+                return None
+            offset = ((self.startptr + index) % self.cap) * self.item_size
+            return self.ptr.CreateChildAtOffset('[%s]' % index, offset, self.item_type)
+        except Exception as e:
+            log.exception('%s', e)
+            raise
+
+    def get_index_of_child(self, name):
+        return int(name.lstrip('[').rstrip(']'))
+
+    def get_summary(self):
+        return '(%d) VecDeque[%s]' % (self.num_children(), sequence_summary((self.get_child_at_index(i) for i in range(self.num_children()))))
+
+##################################################################################################################
+
+
+class SliceSynthProvider(ArrayLikeSynthProvider):
+    def ptr_and_len(self, vec):
+        return (
+            gcm(vec, 'data_ptr'),
+            gcm(vec, 'length').GetValueAsUnsigned()
+        )
+
+    def get_summary(self):
+        return '(%d) &[%s]' % (self.len, sequence_summary((self.get_child_at_index(i) for i in range(self.len))))
+
+
+class MsvcSliceSynthProvider(SliceSynthProvider):
+    def get_type_name(self):
+        tparams = get_template_params(self.valobj.GetTypeName())
+        return '&[' + tparams[0] + ']'
+
+
+# Base class for *String providers
+class StringLikeSynthProvider(ArrayLikeSynthProvider):
+    def get_child_at_index(self, index):
+        ch = ArrayLikeSynthProvider.get_child_at_index(self, index)
+        ch.SetFormat(lldb.eFormatChar)
+        return ch
+
+    def get_summary(self):
+        # Limit string length to 1000 characters to cope with uninitialized values whose
+        # length field contains garbage.
+        strval = string_from_ptr(self.ptr, min(self.len, 1000))
+        if strval == None:
+            return None
+        if self.len > 1000:
+            strval += u'...'
+        return u'"%s"' % strval
+
+
+class StrSliceSynthProvider(StringLikeSynthProvider):
+    def ptr_and_len(self, valobj):
+        return (
+            gcm(valobj, 'data_ptr'),
+            gcm(valobj, 'length').GetValueAsUnsigned()
+        )
+
+
+class StdStringSynthProvider(StringLikeSynthProvider):
+    def ptr_and_len(self, valobj):
+        vec = gcm(valobj, 'vec')
+        return (
+            read_unique_ptr(gcm(vec, 'buf', 'ptr')),
+            gcm(vec, 'len').GetValueAsUnsigned()
+        )
+
+
+class StdCStringSynthProvider(StringLikeSynthProvider):
+    def ptr_and_len(self, valobj):
+        vec = gcm(valobj, 'inner')
+        return (
+            gcm(vec, 'data_ptr'),
+            gcm(vec, 'length').GetValueAsUnsigned() - 1
+        )
+
+
+class StdOsStringSynthProvider(StringLikeSynthProvider):
+    def ptr_and_len(self, valobj):
+        vec = gcm(valobj, 'inner', 'inner')
+        tmp = gcm(vec, 'bytes')  # Windows OSString has an extra layer
+        if tmp.IsValid():
+            vec = tmp
+        return (
+            read_unique_ptr(gcm(vec, 'buf', 'ptr')),
+            gcm(vec, 'len').GetValueAsUnsigned()
+        )
+
+
+class FFISliceSynthProvider(StringLikeSynthProvider):
+    def ptr_and_len(self, valobj):
+        process = valobj.GetProcess()
+        slice_ptr = valobj.GetLoadAddress()
+        data_ptr_type = valobj.GetTarget().GetBasicType(lldb.eBasicTypeChar).GetPointerType()
+        # Unsized slice objects have incomplete debug info, so here we just assume standard slice
+        # reference layout: [<pointer to data>, <data size>]
+        error = lldb.SBError()
+        pointer = valobj.CreateValueFromAddress('data', slice_ptr, data_ptr_type)
+        length = process.ReadPointerFromMemory(slice_ptr + process.GetAddressByteSize(), error)
+        return pointer, length
+
+
+class StdCStrSynthProvider(FFISliceSynthProvider):
+    def ptr_and_len(self, valobj):
+        ptr, len = FFISliceSynthProvider.ptr_and_len(self, valobj)
+        return (ptr, len-1)  # drop terminaing '\0'
+
+
+class StdOsStrSynthProvider(FFISliceSynthProvider):
+    pass
+
+
+class StdPathBufSynthProvider(StdOsStringSynthProvider):
+    def ptr_and_len(self, valobj):
+        return StdOsStringSynthProvider.ptr_and_len(self, gcm(valobj, 'inner'))
+
+
+class StdPathSynthProvider(FFISliceSynthProvider):
+    pass
+
+##################################################################################################################
+
+
+class DerefSynthProvider(RustSynthProvider):
+    deref = lldb.SBValue()
+
+    def has_children(self):
+        return self.deref.MightHaveChildren()
+
+    def num_children(self):
+        return self.deref.GetNumChildren()
+
+    def get_child_at_index(self, index):
+        return self.deref.GetChildAtIndex(index)
+
+    def get_index_of_child(self, name):
+        return self.deref.GetIndexOfChildWithName(name)
+
+    def get_summary(self):
+        return obj_summary(self.deref)
+
+# Base for Rc and Arc
+
+
+class StdRefCountedSynthProvider(DerefSynthProvider):
+    weak = 0
+    strong = 0
+
+    def get_summary(self):
+        if self.weak != 0:
+            s = '(refs:%d,weak:%d) ' % (self.strong, self.weak)
+        else:
+            s = '(refs:%d) ' % self.strong
+        if self.strong > 0:
+            s += obj_summary(self.deref)
+        else:
+            s += '<disposed>'
+        return s
+
+
+class StdRcSynthProvider(StdRefCountedSynthProvider):
+    def update(self):
+        inner = read_unique_ptr(gcm(self.valobj, 'ptr'))
+        self.strong = gcm(inner, 'strong', 'value', 'value').GetValueAsUnsigned()
+        self.weak = gcm(inner, 'weak', 'value', 'value').GetValueAsUnsigned()
+        if self.strong > 0:
+            self.deref = gcm(inner, 'value')
+            self.weak -= 1  # There's an implicit weak reference communally owned by all the strong pointers
+        else:
+            self.deref = lldb.SBValue()
+        self.deref.SetPreferSyntheticValue(True)
+
+
+class StdArcSynthProvider(StdRefCountedSynthProvider):
+    def update(self):
+        inner = read_unique_ptr(gcm(self.valobj, 'ptr'))
+        self.strong = gcm(inner, 'strong', 'v', 'value').GetValueAsUnsigned()
+        self.weak = gcm(inner, 'weak', 'v', 'value').GetValueAsUnsigned()
+        if self.strong > 0:
+            self.deref = gcm(inner, 'data')
+            self.weak -= 1  # There's an implicit weak reference communally owned by all the strong pointers
+        else:
+            self.deref = lldb.SBValue()
+        self.deref.SetPreferSyntheticValue(True)
+
+
+class StdMutexSynthProvider(DerefSynthProvider):
+    def update(self):
+        self.deref = gcm(self.valobj, 'data', 'value')
+        self.deref.SetPreferSyntheticValue(True)
+
+
+class StdCellSynthProvider(DerefSynthProvider):
+    def update(self):
+        self.deref = gcm(self.valobj, 'value', 'value')
+        self.deref.SetPreferSyntheticValue(True)
+
+
+class StdRefCellSynthProvider(DerefSynthProvider):
+    def update(self):
+        self.deref = gcm(self.valobj, 'value', 'value')
+        self.deref.SetPreferSyntheticValue(True)
+
+    def get_summary(self):
+        borrow = gcm(self.valobj, 'borrow', 'value', 'value').GetValueAsSigned()
+        s = ''
+        if borrow < 0:
+            s = '(borrowed:mut) '
+        elif borrow > 0:
+            s = '(borrowed:%d) ' % borrow
+        return s + obj_summary(self.deref)
+
+
+class StdRefCellBorrowSynthProvider(DerefSynthProvider):
+    def update(self):
+        self.deref = gcm(self.valobj, 'value', 'pointer').Dereference()
+        self.deref.SetPreferSyntheticValue(True)
+
+##################################################################################################################
+
+
+class EnumSynthProvider(RustSynthProvider):
+    variant = lldb.SBValue()
+    summary = ''
+    skip_first = 0
+
+    def has_children(self):
+        return self.variant.MightHaveChildren()
+
+    def num_children(self):
+        return self.variant.GetNumChildren() - self.skip_first
+
+    def get_child_at_index(self, index):
+        return self.variant.GetChildAtIndex(index + self.skip_first)
+
+    def get_index_of_child(self, name):
+        return self.variant.GetIndexOfChildWithName(name) - self.skip_first
+
+    def get_summary(self):
+        return self.summary
+
+
+class GenericEnumSynthProvider(EnumSynthProvider):
+    def update(self):
+        dyn_type_name = self.valobj.GetTypeName()
+        variant_name = dyn_type_name[dyn_type_name.rfind(':')+1:]
+        self.variant = self.valobj
+
+        if self.variant.IsValid() and self.variant.GetNumChildren() > self.skip_first:
+            if self.variant.GetChildAtIndex(self.skip_first).GetName() in ['0', '__0']:
+                self.summary = variant_name + tuple_summary(self.variant)
+            else:
+                self.summary = variant_name + '{...}'
+        else:
+            self.summary = variant_name
+
+
+class MsvcTupleSynthProvider(RustSynthProvider):
+    def update(self):
+        tparams = get_template_params(self.valobj.GetTypeName())
+        self.type_name = '(' + ', '.join(tparams) + ')'
+
+    def has_children(self):
+        return self.valobj.MightHaveChildren()
+
+    def num_children(self):
+        return self.valobj.GetNumChildren()
+
+    def get_child_at_index(self, index):
+        child = self.valobj.GetChildAtIndex(index)
+        return child.CreateChildAtOffset(str(index), 0, child.GetType())
+
+    def get_index_of_child(self, name):
+        return str(name)
+
+    def get_summary(self):
+        return tuple_summary(self.valobj)
+
+    def get_type_name(self):
+        return self.type_name
+
+
+class MsvcEnumSynthProvider(EnumSynthProvider):
+    is_tuple_variant = False
+
+    def update(self):
+        tparams = get_template_params(self.valobj.GetTypeName())
+        if len(tparams) == 1:  # Regular enum
+            discr = gcm(self.valobj, 'discriminant')
+            self.variant = gcm(self.valobj, 'variant' + str(discr.GetValueAsUnsigned()))
+            variant_name = discr.GetValue()
+        else:  # Niche enum
+            dataful_min = int(tparams[1])
+            dataful_max = int(tparams[2])
+            dataful_var = tparams[3]
+            discr = gcm(self.valobj, 'discriminant')
+            if dataful_min <= discr.GetValueAsUnsigned() <= dataful_max:
+                self.variant = gcm(self.valobj, 'dataful_variant')
+                variant_name = dataful_var
+            else:
+                variant_name = discr.GetValue()
+
+        self.type_name = tparams[0]
+
+        if self.variant.IsValid() and self.variant.GetNumChildren() > self.skip_first:
+            if self.variant.GetChildAtIndex(self.skip_first).GetName() == '__0':
+                self.is_tuple_variant = True
+                self.summary = variant_name + tuple_summary(self.variant, skip_first=self.skip_first)
+            else:
+                self.summary = variant_name + '{...}'
+        else:
+            self.summary = variant_name
+
+    def get_child_at_index(self, index):
+        child = self.variant.GetChildAtIndex(index + self.skip_first)
+        if self.is_tuple_variant:
+            return child.CreateChildAtOffset(str(index), 0, child.GetType())
+        else:
+            return child
+
+    def get_index_of_child(self, name):
+        if self.is_tuple_variant:
+            return int(name)
+        else:
+            return self.variant.GetIndexOfChildWithName(name) - self.skip_first
+
+    def get_type_name(self):
+        return self.type_name
+
+
+class MsvcEnum2SynthProvider(EnumSynthProvider):
+    is_tuple_variant = False
+
+    def update(self):
+        tparams = get_template_params(self.valobj.GetTypeName())
+        self.type_name = tparams[0]
+
+    def get_child_at_index(self, index):
+        return self.valobj.GetChildAtIndex(index)
+
+    def get_index_of_child(self, name):
+        return self.valobj.GetChildIndex(name)
+
+    def get_type_name(self):
+        return self.type_name
+
+
+##################################################################################################################
+
+
+class StdHashMapSynthProvider(RustSynthProvider):
+    def update(self):
+        self.initialize_table(gcm(self.valobj, 'base', 'table'))
+
+    def initialize_table(self, table):
+        assert table.IsValid()
+
+        if table.type.GetNumberOfTemplateArguments() > 0:
+            item_ty = table.type.GetTemplateArgumentType(0)
+        else:  # we must be on windows-msvc - try to look up item type by name
+            table_ty_name = table.GetType().GetName()  # "hashbrown::raw::RawTable<ITEM_TY>"
+            item_ty_name = get_template_params(table_ty_name)[0]
+            item_ty = table.GetTarget().FindTypes(item_ty_name).GetTypeAtIndex(0)
+
+        if item_ty.IsTypedefType():
+            item_ty = item_ty.GetTypedefedType()
+
+        inner_table = table.GetChildMemberWithName('table')
+        if inner_table.IsValid():
+            self.initialize_hashbrown_v2(inner_table, item_ty)  # 1.52 <= std_version
+        else:
+            if not table.GetChildMemberWithName('data'):
+                self.initialize_hashbrown_v2(table, item_ty)  # ? <= std_version < 1.52
+            else:
+                self.initialize_hashbrown_v1(table, item_ty)  # 1.36 <= std_version < ?
+
+    def initialize_hashbrown_v2(self, table, item_ty):
+        self.num_buckets = gcm(table, 'bucket_mask').GetValueAsUnsigned() + 1
+        ctrl_ptr = gcm(table, 'ctrl', 'pointer')
+        ctrl = ctrl_ptr.GetPointeeData(0, self.num_buckets)
+        # Buckets are located above `ctrl`, in reverse order.
+        start_addr = ctrl_ptr.GetValueAsUnsigned() - item_ty.GetByteSize() * self.num_buckets
+        buckets_ty = item_ty.GetArrayType(self.num_buckets)
+        self.buckets = self.valobj.CreateValueFromAddress('data', start_addr, buckets_ty)
+        error = lldb.SBError()
+        self.valid_indices = []
+        for i in range(self.num_buckets):
+            if ctrl.GetUnsignedInt8(error, i) & 0x80 == 0:
+                self.valid_indices.append(self.num_buckets - 1 - i)
+
+    def initialize_hashbrown_v1(self, table, item_ty):
+        self.num_buckets = gcm(table, 'bucket_mask').GetValueAsUnsigned() + 1
+        ctrl_ptr = gcm(table, 'ctrl', 'pointer')
+        ctrl = ctrl_ptr.GetPointeeData(0, self.num_buckets)
+        buckets_ty = item_ty.GetArrayType(self.num_buckets)
+        self.buckets = gcm(table, 'data', 'pointer').Dereference().Cast(buckets_ty)
+        error = lldb.SBError()
+        self.valid_indices = []
+        for i in range(self.num_buckets):
+            if ctrl.GetUnsignedInt8(error, i) & 0x80 == 0:
+                self.valid_indices.append(i)
+
+    def has_children(self):
+        return True
+
+    def num_children(self):
+        return len(self.valid_indices)
+
+    def get_child_at_index(self, index):
+        bucket_idx = self.valid_indices[index]
+        item = self.buckets.GetChildAtIndex(bucket_idx)
+        return item.CreateChildAtOffset('[%d]' % index, 0, item.GetType())
+
+    def get_index_of_child(self, name):
+        return int(name.lstrip('[').rstrip(']'))
+
+    def get_summary(self):
+        return 'size=%d, capacity=%d' % (self.num_children(), self.num_buckets)
+
+
+class StdHashSetSynthProvider(StdHashMapSynthProvider):
+    def update(self):
+        table = gcm(self.valobj, 'base', 'map', 'table')  # std_version >= 1.48
+        if not table.IsValid():
+            table = gcm(self.valobj, 'map', 'base', 'table')  # std_version < 1.48
+        self.initialize_table(table)
+
+    def get_child_at_index(self, index):
+        bucket_idx = self.valid_indices[index]
+        item = self.buckets.GetChildAtIndex(bucket_idx).GetChildAtIndex(0)
+        return item.CreateChildAtOffset('[%d]' % index, 0, item.GetType())
+
+##################################################################################################################
+
+
+def __lldb_init_module(debugger_obj, internal_dict):
+    log.info('Initializing')
+    initialize_category(debugger_obj, internal_dict)

+ 10 - 5
test-tools/wamr-ide/VSCode-Extension/package.json

@@ -6,10 +6,12 @@
     },
     "displayName": "WAMR-IDE",
     "description": "An Integrated Development Environment for WASM",
-    "version": "1.2.1",
+    "version": "1.2.2",
     "engines": {
-        "vscode": "^1.59.0"
+        "vscode": "^1.59.0",
+        "node": ">=16.0.0"
     },
+    "engineStrict": true,
     "categories": [
         "Other"
     ],
@@ -235,6 +237,7 @@
         "prettier-format-apply": "prettier --config .prettierrc.json 'src/**/*.ts' --write"
     },
     "devDependencies": {
+        "@types/chai": "^4.3.5",
         "@types/glob": "^7.1.3",
         "@types/mocha": "^8.2.2",
         "@types/node": "14.x",
@@ -243,12 +246,14 @@
         "@types/yauzl": "^2.10.0",
         "@typescript-eslint/eslint-plugin": "^4.26.0",
         "@typescript-eslint/parser": "^4.26.0",
+        "@vscode/debugprotocol": "^1.61.0",
+        "@vscode/test-electron": "^2.3.3",
+        "chai": "^4.3.7",
         "eslint": "^7.32.0",
         "glob": "^7.1.7",
-        "mocha": "^8.4.0",
+        "mocha": "^10.2.0",
         "prettier": "2.5.1",
-        "typescript": "^4.3.2",
-        "vscode-test": "^1.5.2"
+        "typescript": "^4.3.2"
     },
     "dependencies": {
         "@vscode/webview-ui-toolkit": "^0.8.4",

+ 2 - 0
test-tools/wamr-ide/VSCode-Extension/resource/test/build.sh

@@ -0,0 +1,2 @@
+# compile with debug symbols and no optimization
+rustc --target wasm32-wasi ./test.rs -g -C opt-level=0

+ 35 - 0
test-tools/wamr-ide/VSCode-Extension/resource/test/test.rs

@@ -0,0 +1,35 @@
+/*
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ */
+
+use std::collections::HashMap;
+use std::collections::VecDeque;
+use std::cell::RefCell;
+
+fn main() {
+    let mut vector = Vec::from([1, 2, 3, 4]);
+    vector.push(12);
+
+    let mut map: HashMap<&str, f64> = HashMap::from([
+        ("Mercury", 0.4),
+        ("Venus", 0.7),
+        ("Earth", 1.0),
+        ("Mars", 1.5),
+    ]); 
+    map.insert("Venus", 2.5);
+    map.insert("Sun", 312.2);
+
+    let string = "this is a string";
+
+    let tmp = String::from("hello world");
+    let slice = &tmp[1..5];
+
+    let mut deque = VecDeque::from([1, 2, 3]);
+    deque.push_back(4);
+    deque.push_back(5);   
+
+    let ref_cell = RefCell::new(5);
+
+    println!("Hello, world!"); // BP_MARKER_1
+}

+ 29 - 5
test-tools/wamr-ide/VSCode-Extension/src/debugConfigurationProvider.ts

@@ -6,23 +6,47 @@
 import * as vscode from 'vscode';
 import * as os from 'os';
 
+/* see https://github.com/llvm/llvm-project/tree/main/lldb/tools/lldb-vscode#attaching-settings */
+export interface WasmDebugConfig {
+    type: string,
+    name: string,
+    request: string,
+    program? : string,
+    pid?: string,
+    stopOnEntry?: boolean,
+    waitFor?: boolean,
+    initCommands?: string[],
+    preRunCommands?: string[],
+    stopCommands?: string[], 
+    exitCommands?: string[],
+    terminateCommands?: string[],
+    attachCommands?: string[]
+}
+
 export class WasmDebugConfigurationProvider
     implements vscode.DebugConfigurationProvider {
-    private wasmDebugConfig = {
+    private wasmDebugConfig: WasmDebugConfig = {
         type: 'wamr-debug',
         name: 'Attach',
         request: 'attach',
         stopOnEntry: true,
-        initCommands: os.platform() === 'win32' || os.platform() === 'darwin' ?
-            /* linux and windows has different debug configuration */
-            ['platform select remote-linux'] :
-            undefined,
         attachCommands: [
             /* default port 1234 */
             'process connect -p wasm connect://127.0.0.1:1234',
         ]
     };
 
+    constructor(extensionPath: string) {
+        this.wasmDebugConfig.initCommands = [
+            /* Add rust formatters -> https://lldb.llvm.org/use/variable.html */
+            `command script import ${extensionPath}/formatters/rust.py`
+        ];
+
+        if (os.platform() === 'win32' || os.platform() === 'darwin') {
+            this.wasmDebugConfig.initCommands.push('platform select remote-linux');
+        }
+    }
+
     public resolveDebugConfiguration(
         _: vscode.WorkspaceFolder | undefined,
         debugConfiguration: vscode.DebugConfiguration,

+ 6 - 6
test-tools/wamr-ide/VSCode-Extension/src/extension.ts

@@ -40,7 +40,7 @@ let isWasmProject = false;
 export async function activate(context: vscode.ExtensionContext) {
     const extensionPath = context.extensionPath;
     const osPlatform = os.platform();
-    const wamrVersion = getWAMRExtensionVersion(context);
+    const wamrVersion = getWAMRExtensionVersion(context.extensionPath);
     const typeMap = new Map<string, string>();
     const scriptMap = new Map<string, string>();
     /* set relative path of build.bat|sh script */
@@ -170,7 +170,7 @@ export async function activate(context: vscode.ExtensionContext) {
     }
 
     /* register debug configuration */
-    wasmDebugConfigProvider = new WasmDebugConfigurationProvider();
+    wasmDebugConfigProvider = new WasmDebugConfigurationProvider(context.extensionPath);
 
     vscode.debug.registerDebugConfigurationProvider(
         'wamr-debug',
@@ -409,13 +409,13 @@ export async function activate(context: vscode.ExtensionContext) {
 
             /* we should check again whether the user installed lldb, as this can be skipped during activation */
             try {
-                if (!isLLDBInstalled(context)) {
+                if (!isLLDBInstalled(context.extensionPath)) {
                     /**NOTE - if users select to skip install,
                      *        we should return rather than continue
                      *        the execution
                      */
                     if (
-                        (await promptInstallLLDB(context)) ===
+                        (await promptInstallLLDB(context.extensionPath)) ===
                         SelectionOfPrompt.skip
                     ) {
                         return;
@@ -772,8 +772,8 @@ export async function activate(context: vscode.ExtensionContext) {
     );
 
     try {
-        if (!isLLDBInstalled(context)) {
-            await promptInstallLLDB(context);
+        if (!isLLDBInstalled(context.extensionPath)) {
+            await promptInstallLLDB(context.extensionPath);
         }
 
         if (

+ 33 - 0
test-tools/wamr-ide/VSCode-Extension/src/test/runTest.ts

@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ */
+
+import * as path from 'path';
+import * as os from 'os';
+
+import { runTests } from '@vscode/test-electron';
+
+async function main() {
+	try {
+		// The folder containing the Extension Manifest package.json
+		// Passed to `--extensionDevelopmentPath`
+		const extensionDevelopmentPath = path.resolve(__dirname, '../../');
+
+		// The path to the extension test script
+		// Passed to --extensionTestsPath
+		const extensionTestsPath = path.resolve(__dirname, './suite/index');
+
+		// Download VS Code, unzip it and run the integration test
+		await runTests({ 
+			extensionDevelopmentPath, 
+			extensionTestsPath,
+			launchArgs: ['--user-data-dir', `${os.tmpdir()}`]
+		});
+	} catch (err) {
+		console.error('Failed to run tests');
+		process.exit(1);
+	}
+}
+
+main();

+ 183 - 0
test-tools/wamr-ide/VSCode-Extension/src/test/suite/extension.test.ts

@@ -0,0 +1,183 @@
+/*
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ */
+
+import {DebugProtocol} from '@vscode/debugprotocol';
+import {after, before, test, suite} from 'mocha';
+import {assert} from 'chai';
+import * as vscode from 'vscode';
+import * as cp from 'child_process';
+import * as path from "path";
+import * as os from 'os';
+import {WasmDebugConfig, WasmDebugConfigurationProvider} from "../../debugConfigurationProvider";
+import {EXTENSION_PATH, clearAllBp, setBpAtMarker, compileRustToWasm} from "./utils";
+import {downloadLldb, isLLDBInstalled} from '../../utilities/lldbUtilities';
+
+suite('Unit Tests', function () {
+    test('DebugConfigurationProvider init commands', function () {
+        const testExtensionPath = "/test/path/";
+        const provider = new WasmDebugConfigurationProvider(testExtensionPath);
+
+        assert.includeMembers(
+            // eslint-disable-next-line @typescript-eslint/no-non-null-assertion
+            provider.getDebugConfig().initCommands!,
+            [`command script import ${testExtensionPath}/formatters/rust.py`],
+            "Debugger init commands did not contain "
+        );
+    });
+
+    test('DebugConfigurationProvider resolve configuration', function () {
+        const testExtensionPath = "/test/path/";
+        const provider = new WasmDebugConfigurationProvider(testExtensionPath);
+
+        const actual = provider.resolveDebugConfiguration(undefined, {
+            type: "wamr-debug",
+            name: "Attach",
+            request: "attach",
+            initCommands: [],
+            attachCommands: [
+                'process connect -p wasm connect://123.456.789.1:1237',
+            ]
+        });
+
+        assert.deepEqual(
+            actual,
+            {
+                type: "wamr-debug",
+                name: "Attach",
+                request: "attach",
+                stopOnEntry: true,
+                initCommands: [],
+                attachCommands: [
+                    'process connect -p wasm connect://123.456.789.1:1237',
+                ]
+            },
+            "Configuration did not match the expected configuration after calling resolveDebugConfiguration()"
+        );
+    });
+});
+
+suite('Inegration Tests', function () {
+    let debuggerProcess: cp.ChildProcessWithoutNullStreams;
+    const port = 1239;
+    const downloadTimeout = 60 * 1000;
+
+    before(async function () {
+        // timeout of 20 seconds
+        this.timeout(20 * 1000);
+        // Download LLDB if necessary. Should be available in the CI. Only for local execution.
+        if (!isLLDBInstalled(EXTENSION_PATH)) {
+            this.timeout(downloadTimeout);
+            console.log("Downloading LLDB. This might take a moment...");
+            await downloadLldb(EXTENSION_PATH);
+            assert.isTrue(isLLDBInstalled(EXTENSION_PATH), "LLDB was not installed correctly");
+        }
+
+        compileRustToWasm();
+
+        const platform = os.platform();
+        assert.isTrue(platform === "darwin" || platform === "linux", `Tests do not support your platform: ${platform}`);
+        const iWasmPath = path.resolve(`${EXTENSION_PATH}/../../../product-mini/platforms/${platform}/build/iwasm`);
+        const testWasmFilePath = `${EXTENSION_PATH}/resource/test/test.wasm`;
+
+        debuggerProcess = cp.spawn(
+            iWasmPath,
+            [`-g=127.0.0.1:${port}`, testWasmFilePath],
+            {}
+        );
+
+        debuggerProcess.stderr.on('data', (data) => {
+            console.log(`Error from debugger process: ${data}`);
+        });
+    });
+
+    after(async function () {
+        await vscode.debug.stopDebugging();
+        debuggerProcess.kill();
+    });
+
+    test('Rust formatters', async function () {
+        // timeout of 1 minutes
+        this.timeout(60 * 1000);
+        clearAllBp();
+        setBpAtMarker(`${EXTENSION_PATH}/resource/test/test.rs`, "BP_MARKER_1");
+
+        const getVariables = new Promise<DebugProtocol.Variable[]>((resolve, reject) => {
+            vscode.debug.registerDebugAdapterTrackerFactory("wamr-debug", {
+                createDebugAdapterTracker: function () {
+                    return {
+                        // The debug adapter has sent a Debug Adapter Protocol message to the editor.
+                        onDidSendMessage: (message: DebugProtocol.ProtocolMessage) => {
+                            if (message.type === "response") {
+                                const m = message as DebugProtocol.Response;
+                                if (m.command === "variables") {
+                                    const res = m as DebugProtocol.VariablesResponse;
+                                    resolve(res.body.variables);
+                                }
+                            }
+                        },
+                        onError: (error: Error) => {
+                            reject("An error occurred before vscode reached the breakpoint: " + error);
+                        },
+                        onExit: (code: number | undefined) => {
+                            reject(`Debugger exited before vscode reached the breakpoint with code: ${code}`);
+                        },
+                    };
+                }
+            });
+        });
+
+        const config: WasmDebugConfig = {
+            type: "wamr-debug",
+            request: "attach",
+            name: "Attach Debugger",
+            stopOnEntry: false,
+            initCommands: [
+                `command script import ${EXTENSION_PATH}/formatters/rust.py`
+            ],
+            attachCommands: [
+                `process connect -p wasm connect://127.0.0.1:${port}`
+            ]
+        };
+
+        if (os.platform() === 'win32' || os.platform() === 'darwin') {
+            config.initCommands?.push('platform select remote-linux');
+        }
+
+        try {
+            await vscode.debug.startDebugging(undefined, config);
+        } catch (e) {
+            assert.fail("Could not connect to debug adapter");
+        }
+
+        // wait until vs code has reached breakpoint and has requested the variables.
+        const variables = await getVariables;
+        const namesToVariables = variables.reduce((acc: { [name: string]: DebugProtocol.Variable }, c) => {
+            if (c.evaluateName) {
+                acc[c.evaluateName] = c;
+            }
+            return acc;
+        }, {});
+
+        assert.includeMembers(Object.keys(namesToVariables), ["vector", "map", "string", "slice", "deque", "ref_cell"], "The Debugger did not return all expected debugger variables.");
+
+        // Vector
+        assert.equal(namesToVariables["vector"].value, " (5) vec![1, 2, 3, 4, 12]", "The Vector summary string looks different than expected");
+
+        // Map
+        assert.equal(namesToVariables["map"].value, " size=5, capacity=8", "The Map summary string looks different than expected");
+
+        // String
+        assert.equal(namesToVariables["string"].value, " \"this is a string\"", "The String summary string looks different than expected");
+
+        // Slice
+        assert.equal(namesToVariables["slice"].value, " \"ello\"", "The Slice summary string looks different than expected");
+
+        // Deque
+        assert.equal(namesToVariables["deque"].value, " (5) VecDeque[1, 2, 3, 4, 5]", "The Deque summary string looks different than expected");
+
+        // RefCell
+        assert.equal(namesToVariables["ref_cell"].value, " 5", "The RefCell summary string looks different than expected");
+    });
+});

+ 42 - 0
test-tools/wamr-ide/VSCode-Extension/src/test/suite/index.ts

@@ -0,0 +1,42 @@
+/*
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ */
+
+import * as path from 'path';
+import * as Mocha from 'mocha';
+import * as glob from 'glob';
+
+export function run(): Promise<void> {
+	// Create the mocha test
+	const mocha = new Mocha({
+		ui: 'tdd'
+	});
+ 
+	const testsRoot = path.resolve(__dirname, '..');
+
+	return new Promise((c, e) => {
+		glob('**/**.test.js', { cwd: testsRoot }, (err, files) => {
+			if (err) {
+				return e(err);
+			}
+
+			// Add files to the test suite
+			files.forEach(f => mocha.addFile(path.resolve(testsRoot, f)));
+
+			try {
+				// Run the mocha test
+				mocha.run(failures => {
+					if (failures > 0) {
+						e(new Error(`${failures} tests failed.`));
+					} else {
+						c();
+					}
+				});
+			} catch (err) {
+				console.error(err);
+				e(err);
+			}
+		});
+	});
+}

+ 43 - 0
test-tools/wamr-ide/VSCode-Extension/src/test/suite/utils.ts

@@ -0,0 +1,43 @@
+/*
+ * Copyright (C) 2019 Intel Corporation. All rights reserved.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ */
+
+import {assert} from 'chai';
+import * as vscode from 'vscode';
+import {Range, SourceBreakpoint} from "vscode";
+import * as fs from "fs";
+import path = require('path');
+import * as cp from 'child_process';
+
+export const EXTENSION_PATH = path.resolve(`${__dirname}/../../..`);
+
+// clears all set breakpoints
+export function clearAllBp(): void {
+    vscode.debug.removeBreakpoints(vscode.debug.breakpoints);
+}
+
+// Inserts a breakpoint in a file at the first occurrence of bpMarker
+export function setBpAtMarker(file: string, bpMarker: string): void {
+    const uri = vscode.Uri.file(file);
+    const data = fs.readFileSync(uri.path, "utf8");
+    const line = data.split("\n").findIndex(line => line.includes(bpMarker));
+    assert.notStrictEqual(line, -1, "Could not find breakpoint marker in source file");
+    const position = new vscode.Position(line, 0);
+    const bp = new SourceBreakpoint(new vscode.Location(uri, new Range(position, position)), true);
+    vscode.debug.addBreakpoints([bp]);
+}
+
+// compiles resources/test/test.rs to test.wasm
+export function compileRustToWasm(): void {
+    const testResourceFolder = `${EXTENSION_PATH}/resource/test`;
+    // compile with debug symbols and no optimization
+    const cmd = `rustc --target wasm32-wasi ${testResourceFolder}/test.rs -g -C opt-level=0 -o ${testResourceFolder}/test.wasm`;
+
+    try {
+        cp.execSync(cmd, {stdio: [null, null, process.stderr]});
+    } catch (e) {
+        assert.fail(`Compilation of example rust file failed with error: ${e}`);
+    }
+    assert.isTrue(fs.existsSync(`${testResourceFolder}/test.wasm`), "Could not find wasm file WASM file to run debugger on.");
+}

+ 2 - 2
test-tools/wamr-ide/VSCode-Extension/src/utilities/dockerUtilities.ts

@@ -102,7 +102,7 @@ export async function checkIfDockerImagesExist(
 ): Promise<boolean> {
     try {
         /* the tag of images is equal to extension's version */
-        const imageTag = getWAMRExtensionVersion(context);
+        const imageTag = getWAMRExtensionVersion(context.extensionPath);
         await execShell(
             `docker image inspect wasm-debug-server:${imageTag} wasm-toolchain:${imageTag}`
         );
@@ -115,7 +115,7 @@ export async function checkIfDockerImagesExist(
 function getDockerImagesDownloadUrl(
     context: vscode.ExtensionContext
 ): string[] {
-    const wamrVersion = getWAMRExtensionVersion(context);
+    const wamrVersion = getWAMRExtensionVersion(context.extensionPath);
     const wamrReleaseUrl = `https://github.com/bytecodealliance/wasm-micro-runtime/releases/download/WAMR`;
 
     return [

+ 15 - 10
test-tools/wamr-ide/VSCode-Extension/src/utilities/lldbUtilities.ts

@@ -36,14 +36,14 @@ function getLLDBUnzipFilePath(destinationFolder: string, filename: string) {
 }
 
 export function getWAMRExtensionVersion(
-    context: vscode.ExtensionContext
+    extensionPath: string
 ): string {
     // eslint-disable-next-line @typescript-eslint/no-var-requires
-    return require(path.join(context.extensionPath, 'package.json')).version;
+    return require(path.join(extensionPath, 'package.json')).version;
 }
 
-function getLLDBDownloadUrl(context: vscode.ExtensionContext): string {
-    const wamrVersion = getWAMRExtensionVersion(context);
+function getLLDBDownloadUrl(extensionPath: string): string {
+    const wamrVersion = getWAMRExtensionVersion(extensionPath);
     const lldbOsUrlSuffix = LLDB_OS_DOWNLOAD_URL_SUFFIX_MAP[os.platform()];
 
     if (!lldbOsUrlSuffix) {
@@ -53,8 +53,7 @@ function getLLDBDownloadUrl(context: vscode.ExtensionContext): string {
     return `https://github.com/bytecodealliance/wasm-micro-runtime/releases/download/WAMR-${wamrVersion}/wamr-lldb-${wamrVersion}-${lldbOsUrlSuffix}.zip`;
 }
 
-export function isLLDBInstalled(context: vscode.ExtensionContext): boolean {
-    const extensionPath = context.extensionPath;
+export function isLLDBInstalled(extensionPath: string): boolean {
     const lldbOSDir = os.platform();
     const lldbBinaryPath = path.join(
         extensionPath,
@@ -67,9 +66,8 @@ export function isLLDBInstalled(context: vscode.ExtensionContext): boolean {
 }
 
 export async function promptInstallLLDB(
-    context: vscode.ExtensionContext
+    extensionPath: string
 ): Promise<SelectionOfPrompt> {
-    const extensionPath = context.extensionPath;
 
     const response = await vscode.window.showWarningMessage(
         'No LLDB instance found. Setup now?',
@@ -81,7 +79,15 @@ export async function promptInstallLLDB(
         return response;
     }
 
-    const downloadUrl = getLLDBDownloadUrl(context);
+    await downloadLldb(extensionPath);
+
+    return SelectionOfPrompt.setUp;
+}
+
+export async function downloadLldb(
+    extensionPath: string
+): Promise<void> {
+    const downloadUrl = getLLDBDownloadUrl(extensionPath);
     const destinationDir = os.platform();
 
     if (!downloadUrl) {
@@ -115,5 +121,4 @@ export async function promptInstallLLDB(
 
     // Remove the bundle.zip
     fs.unlinkSync(lldbZipPath);
-    return SelectionOfPrompt.setUp;
 }

+ 4 - 0
tests/benchmarks/coremark/README.md

@@ -19,3 +19,7 @@ And then run `./build.sh` to build the source code, file `coremark.exe`, `corema
 Run `./run.sh` to test the benchmark, the native mode, iwasm aot mode and iwasm interpreter mode will be tested respectively.
 
 Run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled, please refer [here](../README.md#install-llvm-profdata) to install tool `llvm-profdata` and build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`.
+
+- For Linux, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled.
+
+- For Linux-sgx, similarly, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then `make` in the directory `enclave-sample`. And run `./test_pgo.sh --sgx` to test the benchmark.

+ 7 - 2
tests/benchmarks/coremark/test_pgo.sh

@@ -5,8 +5,13 @@
 
 PLATFORM=$(uname -s | tr A-Z a-z)
 
-IWASM="../../../product-mini/platforms/${PLATFORM}/build/iwasm"
-WAMRC="../../../wamr-compiler/build/wamrc"
+if [ "$1" = "--sgx" ] && [ "$PLATFORM" = "linux" ]; then
+    IWASM="../../../product-mini/platforms/${PLATFORM}-sgx/enclave-sample/iwasm"
+    WAMRC="../../../wamr-compiler/build/wamrc -sgx"
+else
+    IWASM="../../../product-mini/platforms/${PLATFORM}/build/iwasm"
+    WAMRC="../../../wamr-compiler/build/wamrc"
+fi
 
 if [ ! -e "coremark.wasm" ]; then
     echo "coremark.wasm doesn't exist, please run build.sh first"

+ 7 - 2
tests/benchmarks/dhrystone/test_pgo.sh

@@ -5,8 +5,13 @@
 
 PLATFORM=$(uname -s | tr A-Z a-z)
 
-IWASM="../../../product-mini/platforms/${PLATFORM}/build/iwasm"
-WAMRC="../../../wamr-compiler/build/wamrc"
+if [ "$1" = "--sgx" ] && [ "$PLATFORM" = "linux" ]; then
+    IWASM="../../../product-mini/platforms/${PLATFORM}-sgx/enclave-sample/iwasm"
+    WAMRC="../../../wamr-compiler/build/wamrc -sgx"
+else
+    IWASM="../../../product-mini/platforms/${PLATFORM}/build/iwasm"
+    WAMRC="../../../wamr-compiler/build/wamrc"
+fi
 
 if [ ! -e "dhrystone.wasm" ]; then
     echo "dhrystone.wasm doesn't exist, please run build.sh first"

+ 4 - 0
tests/benchmarks/jetstream/README.md

@@ -29,3 +29,7 @@ And then run `./build.sh` to build the source code, the folder `out` will be cre
 Run `./run_aot.sh` to test the benchmark, the native mode and iwasm aot mode will be tested for each workload, and the file `report.txt` will be generated.
 
 Run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled, please refer [here](../README.md#install-llvm-profdata) to install tool `llvm-profdata` and build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`.
+
+- For Linux, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled.
+
+- For Linux-sgx, similarly, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then `make` in the directory `enclave-sample`. And run `./test_pgo.sh --sgx` to test the benchmark.

+ 7 - 2
tests/benchmarks/jetstream/test_pgo.sh

@@ -9,8 +9,13 @@ REPORT=$CUR_DIR/report.txt
 TIME=/usr/bin/time
 
 PLATFORM=$(uname -s | tr A-Z a-z)
-IWASM_CMD=$CUR_DIR/../../../product-mini/platforms/${PLATFORM}/build/iwasm
-WAMRC_CMD=$CUR_DIR/../../../wamr-compiler/build/wamrc
+if [ "$1" = "--sgx" ] && [ "$PLATFORM" = "linux" ]; then
+    IWASM_CMD="$CUR_DIR/../../../product-mini/platforms/${PLATFORM}-sgx/enclave-sample/iwasm"
+    WAMRC_CMD="$CUR_DIR/../../../wamr-compiler/build/wamrc -sgx"
+else
+    IWASM_CMD="$CUR_DIR/../../../product-mini/platforms/${PLATFORM}/build/iwasm"
+    WAMRC_CMD="$CUR_DIR/../../../wamr-compiler/build/wamrc"
+fi
 
 BENCH_NAME_MAX_LEN=20
 

+ 6 - 0
tests/benchmarks/libsodium/README.md

@@ -18,6 +18,12 @@ And then run `./build.sh` to build the source code, the libsodium source code wi
 
 Run `./run_aot.sh` to test the benchmark, the native mode and iwasm aot mode will be tested respectively.
 
+Run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled, please refer [here](../README.md#install-llvm-profdata) to install tool `llvm-profdata` and build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`.
+
+- For Linux, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled.
+
+- For Linux-sgx, similarly, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then `make` in the directory `enclave-sample`. And run `./test_pgo.sh --sgx` to test the benchmark.
+
 # Others
 
 Refer to [Performance of WebAssembly runtimes in 2023](https://00f.net/2023/01/04/webassembly-benchmark-2023) for more about the performance comparison of wasm runtimes on running the libsodium benchmarks.

+ 7 - 2
tests/benchmarks/libsodium/test_pgo.sh

@@ -19,8 +19,13 @@ PLATFORM=$(uname -s | tr A-Z a-z)
 
 readonly OUT_DIR=$PWD/libsodium/zig-out/bin
 readonly REPORT=$PWD/report.txt
-readonly IWASM_CMD=$PWD/../../../product-mini/platforms/${PLATFORM}/build/iwasm
-readonly WAMRC_CMD=$PWD/../../../wamr-compiler/build/wamrc
+if [ "$1" = "--sgx" ] && [ "$PLATFORM" = "linux" ]; then
+    readonly IWASM_CMD="$PWD/../../../product-mini/platforms/${PLATFORM}-sgx/enclave-sample/iwasm"
+    readonly WAMRC_CMD="$PWD/../../../wamr-compiler/build/wamrc -sgx"
+else
+    readonly IWASM_CMD="$PWD/../../../product-mini/platforms/${PLATFORM}/build/iwasm"
+    readonly WAMRC_CMD="$PWD/../../../wamr-compiler/build/wamrc"
+fi
 readonly TIME=/usr/bin/time
 
 BENCH_NAME_MAX_LEN=20

+ 6 - 0
tests/benchmarks/polybench/README.md

@@ -19,3 +19,9 @@ And then run `./build.sh` to build the source code, the folder `out` will be cre
 Run `./run_aot.sh` to test the benchmark, the native mode and iwasm aot mode will be tested for each workload, and the file `report.txt` will be generated.
 
 Run `./run_interp.sh` to test the benchmark, the native mode and iwasm interpreter mode will be tested for each workload, and the file `report.txt` will be generated.
+
+Run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled, please refer [here](../README.md#install-llvm-profdata) to install tool `llvm-profdata` and build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`.
+
+- For Linux, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled.
+
+- For Linux-sgx, similarly, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then `make` in the directory `enclave-sample`. And run `./test_pgo.sh --sgx` to test the benchmark.

+ 7 - 2
tests/benchmarks/polybench/test_pgo.sh

@@ -9,8 +9,13 @@ REPORT=$CUR_DIR/report.txt
 TIME=/usr/bin/time
 
 PLATFORM=$(uname -s | tr A-Z a-z)
-IWASM_CMD=$CUR_DIR/../../../product-mini/platforms/${PLATFORM}/build/iwasm
-WAMRC_CMD=$CUR_DIR/../../../wamr-compiler/build/wamrc
+if [ "$1" = "--sgx" ] && [ "$PLATFORM" = "linux" ]; then
+    IWASM_CMD="$CUR_DIR/../../../product-mini/platforms/${PLATFORM}-sgx/enclave-sample/iwasm"
+    WAMRC_CMD="$CUR_DIR/../../../wamr-compiler/build/wamrc -sgx"
+else
+    IWASM_CMD="$CUR_DIR/../../../product-mini/platforms/${PLATFORM}/build/iwasm"
+    WAMRC_CMD="$CUR_DIR/../../../wamr-compiler/build/wamrc"
+fi
 
 BENCH_NAME_MAX_LEN=20
 

+ 4 - 0
tests/benchmarks/sightglass/README.md

@@ -21,3 +21,7 @@ Run `./run_aot.sh` to test the benchmark, the native mode and iwasm aot mode wil
 Run `./run_interp.sh` to test the benchmark, the native mode and iwasm interpreter mode will be tested for each workload, and the file `report.txt` will be generated.
 
 Run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled, please refer [here](../README.md#install-llvm-profdata) to install tool `llvm-profdata` and build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`.
+
+- For Linux, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then run `./test_pgo.sh` to test the benchmark with AOT static PGO (Profile-Guided Optimization) enabled.
+
+- For Linux-sgx, similarly, build `iwasm` with `cmake -DWAMR_BUILD_STATIC_PGO=1`, then `make` in the directory `enclave-sample`. And run `./test_pgo.sh --sgx` to test the benchmark.

Algunos archivos no se mostraron porque demasiados archivos cambiaron en este cambio