Skip to content

Wasm r2 rtesting#127875

Draft
davidwrighton wants to merge 46 commits intodotnet:mainfrom
davidwrighton:WasmR2Rtesting
Draft

Wasm r2 rtesting#127875
davidwrighton wants to merge 46 commits intodotnet:mainfrom
davidwrighton:WasmR2Rtesting

Conversation

@davidwrighton
Copy link
Copy Markdown
Member

No description provided.

davidwrighton and others added 30 commits April 23, 2026 15:03
…nk mappings

This adds a new ReadyToRun fixup that enables mapping UTF-8 strings to
pregenerated code thunks embedded in R2R images. The fixup is placed in
the eager imports section and processed at module load time.

Changes across all layers:

Format definition:
- Add READYTORUN_FIXUP_InjectStringThunks = 0x39 to readytorun.h and
  ReadyToRunConstants.cs
- Bump R2R minor version from 5 to 6 in all three locations

Runtime (CoreCLR VM):
- Refactor StringThunkSHashTraits from wasm/helpers.cpp into shared
  stringthunkhash.h header, available to all platforms
- Add pregeneratedstringthunks.cpp/.h with global hash table using
  copy-on-write CAS pattern for lock-free concurrent reads
- InitializePregeneratedStringThunkHash() called at EE startup
- LookupPregeneratedThunkByString() API returns PCODE or NULL
- ProcessInjectStringThunksFixup() handles the fixup in
  LoadDynamicInfoEntry, merging new entries with existing ones

Crossgen2 compiler:
- Add abstract StringDiscoverableAssemblyStubNode (derives from
  AssemblyStubNode) with LookupString property; instances register
  themselves via OnMarked
- Add InjectStringThunksSignature that collects all registered stubs
  at emission time and encodes them as (UTF8 string, RVA) pairs
- Root the InjectStringThunks import eagerly in NodeFactory
- Sort stubs by LookupString for deterministic compilation

Tooling and documentation:
- Add r2rdump parser case for InjectStringThunks signatures
- Update readytorun-format.md with fixup table entry and format spec

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of unconditionally rooting the InjectStringThunks import, store
it on the NodeFactory and have each StringDiscoverableAssemblyStubNode
declare a dependency on it via ComputeNonRelocationBasedDependencies.
The import is only pulled into the graph when at least one stub is marked.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change GetSignature to return (WasmFuncType, string) where the string
is a compact serialization of the signature:

Return type: 'v' (void), 'i'/'l'/'f'/'d'/'V' (primitives), 'S<N>'
(struct by ref with N bytes).

Hidden params (this, retbuf, generic context, async continuation):
'i' or 'l' based on pointer size.

Explicit params: 'i'/'l'/'f'/'d'/'V' (by value), 'S<N>' (by ref),
'e' (empty struct, not emitted to WasmFuncType).

Suffix 'p' indicates SP and PE params are generated (managed calls).

Add IsEmptyStruct helper (stub returning false) for detecting empty
structs by field count per the BasicCABI spec. Handle empty structs
for both parameters ('e' encoding) and returns (treated as void).
See dotnet#127361.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduce WasmSignature readonly struct implementing IEquatable and
IComparable. Equality and comparison are based on the signature string
(with Debug.Assert that FuncType agrees when strings match). This
enables sorting and deduplication of signatures by string alone.

Update WasmLowering.GetSignature to return WasmSignature and update
callers in WasmObjectWriter and ReadyToRunCodegenNodeFactory.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- WasmImportThunk now takes WasmSignature and uses it for mangled name
  and comparison operations
- WasmImportThunkPortableEntrypoint uses static WasmSignature values
- RaiseSignature rewritten to parse signature string instead of WasmFuncType
- Added CompilerTypeSystemContext.Wasm.cs with GetValueTupleStructOfSize
  cache using tree-based ValueTuple construction
- Unmanaged calling convention flag set when 'p' suffix is absent
- Roundtrip assert: raised signature re-lowered must equal original
- Cache first empty struct found during lowering for 'e' roundtrip

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of iterating the wasm-level _typeNode params, iterate the
raised MethodSignature. This enables:
- Indirect struct args: zero-fill the transition block slot on store,
  and pass the original byref local directly on restore
- Empty struct args: skip entirely (no wasm local exists)
- Made WasmLowering.IsEmptyStruct public for cross-file access

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Site api.

Co-authored-by: Copilot <copilot@github.com>
The 'this' parameter is now encoded with a distinct 'T' character
instead of 'i'/'l'. On raise, 'T' sets HasThis on the MethodSignature
rather than adding an explicit parameter. This enables proper
roundtripping and allows ArgIterator to correctly compute offsets
(e.g. GetRetBuffArgOffset with hasThis).

Also fix build errors in CorInfoImpl.ReadyToRun.cs: qualify
LoweringFlags and cast getCallConv() to int.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add WasmR2RToInterpreterThunkNode, a StringDiscoverableAssemblyStubNode
that captures arguments into a transition block and dispatches to the
interpreter via READYTORUN_HELPER_InitInstClass.

Key details:
- Thunk keyed by WasmSignature, discoverable by 'I'-prefixed signature string
- Arguments area is 16-byte aligned; TransitionBlock is 8-byte aligned
- Indirect struct args copied with memory.copy + memory.fill padding
- Stack pointer global saved/restored around helper call
- V128 return uses 16-byte aligned buffer; others use 8-byte i64 store

Also adds memory.copy, memory.fill, and i64.const WASM instructions,
and updates WasmImportThunk to use memory.fill for indirect struct
zero-filling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…all site dependency

- Add WasmInterpreterToR2RThunkNode: a StringDiscoverableAssemblyStubNode that
  bridges from interpreter calling convention to R2R compiled functions. Uses
  ArgIterator offsets (minus TransitionBlock size) to locate args in the
  interpreter buffer, sets up a TERMINATE_R2R_STACK_WALK frame, and dispatches
  via call_indirect.
- Fix retbuf detection in both WasmR2RToInterpreterThunkNode and
  WasmInterpreterToR2RThunkNode to check SignatureString[0] == 'S' instead of
  using ArgIterator.HasRetBuffArg/GetRetBuffArgOffset. The R2R-to-interpreter
  thunk now passes the retbuf wasm local directly.
- Add factory cache and accessor for WasmInterpreterToR2RThunk on
  ReadyToRunCodegenNodeFactory.
- Fix recordCallSite TODO: wire up WasmR2RToInterpreterThunk dependency.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add AddAdditionalDependency helper for lazily adding to _additionalDependencies
- Move WasmR2RToInterpreterThunk from AddPrecodeFixup to AddAdditionalDependency
  in recordCallSite
- Add WasmInterpreterToR2RThunk dependency for every compiled managed
  non-UnmanagedCallersOnly method on Wasm, using GetSignature(MethodDesc)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
- Replace ValueTuple-based struct size construction with a cache of real
  struct types encountered during GetSignature. ValueTuples have auto
  layout which causes padding, making roundtrip size assertions fail.
  The cache uses a locked Dictionary for thread safety.

- Fix RaiseSignature to skip the hidden retbuf pointer parameter when
  the return type is a struct (S<N> encoding). Previously it was included
  in the raised MethodSignature parameters, causing GetSignature to emit
  a duplicate retbuf pointer on re-encoding.

- Fix WasmImportThunk to handle 'this' pointer correctly: store/restore
  it separately before the explicit parameter loop, and start
  wasmLocalIndex past both 'this' and retbuf locals.

- Fix WasmImportThunkPortableEntrypoint to strip IsUnmanagedCallersOnly
  flag when computing thunk signatures, since thunks always use managed
  calling convention.

- Fix DelayLoadHelperImport to skip creating WasmImportThunk for
  GenericLookupSignature on WASM, as these are eager fixups that don't
  need import thunks.

- Fix WasmR2RToInterpreterThunkNode to skip 'this' and retbuf wasm
  locals before iterating explicit parameters.

- Skip creating R2R-to-interpreter thunks for unmanaged call sites,
  as they don't go through interpreter transitions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The ForceSigWalk method had two bugs in its Wasm-specific path for
accounting unnamed arguments (this, retbuf, generic context, etc.)
when no named arguments are present:

1. The check 'maxOffset == 0' could never be true because maxOffset
   is initialized to OffsetOfArgs (8 on Wasm32). Changed to compare
   against OffsetOfArgs.

2. The fallback 'maxOffset = _wasmOfsStack' was incorrect because
   _wasmOfsStack is relative to OffsetOfArgs, but maxOffset is an
   absolute offset. Changed to 'OffsetOfArgs + _wasmOfsStack'.

These bugs caused GCRefMapBuilder to allocate a zero-length fake
stack for methods with only unnamed arguments (e.g. parameterless
instance methods), leading to IndexOutOfRangeException when writing
the 'this' pointer GC ref at ThisOffset.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Replace 'n' encoding with 'S' for multi-field structs passed by ref
- Add hardcoded struct sizes for QCallModule (8), QCallAssembly (8),
  GCHeapHardLimitInfo (64) so signatures produce S<N> format
- Add ParseSignatureTokens tokenizer to handle multi-char S<N> tokens
- Add Token-based API (TokenToNativeType/TokenToNameType/TokenToArgType)
- Update InterpToNativeGenerator to use token-based parsing
- Unknown struct types log a diagnostic at High importance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- TokenToNameType returns the full S<N> token (e.g. S8, S64) so
  generated function names encode the struct size
- ArgsWithSlotOffsets computes running slot indices: structs consume
  max(size/8, 1) slots instead of always 1
- Add TokenToSlotCount helper
- Remove IsBlittable gate from TypeToChar — multi-field structs are
  always passed by pointer, matching crossgen2 WasmLowering behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- helpers.cpp: Refactor GetSignatureKey to support S<N> struct tokens,
  LowerTypeHandle for recursive single-field unwrapping, caller prefix
  parameter ('M' for calli, 'I' for PE-to-interpreter)
- helpers.cpp: Use 'T' for this pointer encoding (was 'i')
- WasmLowering.cs: Remove redundant hidden retbuf pointer from signature
  string (implied by S<N> return type)
- RaiseSignature: Remove hasReturnBuffer skip logic (no longer in string)
- SignatureMapper.cs: Use 'T' for this pointer, add T to token maps
- InterpToNativeGenerator.cs: Add 'M' prefix to g_wasmThunks entries
- clr-abi.md: Document Type Lowering and Signature String Encoding spec

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion

Replace dynamic alloca-based initial buffer sizing with a fixed 64-byte
stack buffer. Fall back to alloca only when S<N> tokens make the key
exceed the initial buffer.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…EntryPointThunk

Both functions now check the process-startup thunk cache first, then
fall back to LookupPregeneratedThunkByString for thunks injected via
READYTORUN_FIXUP_InjectStringThunks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a MethodDesc's PortableEntryPoint is initialized before the R2R module
containing its thunk is loaded, the method is tracked on a per-LoaderAllocator
SArray and resolved later when new R2R thunks are injected.

- Add TrySetInterpreterThunk CAS-based thunk installation on PortableEntryPoint
- Track pending methods per-LoaderAllocator using SArray<MethodDesc*> with
  NULL-compaction on resolve
- Single global lock (s_pendingThunkResolutionLock) protects both the LA
  registry and per-LA pending arrays, keeping LAs alive during scans
- Registration flag on LoaderAllocator avoids duplicate list scans
- Unregistration in LoaderAllocator::Destroy for correct collectible cleanup
- LookupThunk/LookupPortableEntryPointThunk now also check R2R thunk hash
- Remove stale WASM-TODO comments

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…bleBase, not by the image base.

Co-authored-by: Copilot <copilot@github.com>
Add FlagPendingThunkResolution on DynamicMethodDesc to track whether the
method is already in the pending thunk resolution list. The flag is set/cleared
using interlocked operations under s_pendingThunkResolutionLock, preventing
unbounded growth from re-used LCG methods.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The pregenerated string thunk hash table, lookup, and pending resolution
are only used on WASM. Guard them with TARGET_WASM, providing no-op stubs
for InitializePregeneratedStringThunkHash and ProcessInjectStringThunksFixup
on other platforms so callers remain unchanged.

Also adds FlagPendingThunkResolution on DynamicMethodDesc with interlocked
set/clear to prevent duplicate pending entries from reused LCG methods.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Instead of logging a message and producing an invalid signature, emit
WASM0067 error and return null so the build fails with a clear diagnostic
pointing at the missing entry in s_knownStructSizes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
davidwrighton and others added 16 commits May 1, 2026 11:46
- Adjust when the interpreter thunks are attached, previously they triggered unsafe recursion in the type loader now they are attached in GetMultiCallableAddr, and when ExternalMethodFixupWorker finishes.
  - Adjust lock to respect the new type load dependency of the signature walk
  - This should cover the existing R2R usage, as R2R code does not directly dispatch on virtual functions
  - It also will cause more resolution of the interpreter thunks than necessary, as the interpreter codepath calls GetMultiCallableAddr often, but that could possibly be tweaked to go down a special path for scenarios where the acquired pointer is directly used to dispatch to more interpreted code.
- Fix dependency generation for InterpreterToR2R thunks
  - Both for the WasmTypeNode of the thunk
  - And for referencing the WasmInterpreterToR2R thunks
- Fix InjectStringThunksSignature to use a table index relative to the tableBase. Add a new reloc to make that possible
Co-authored-by: Copilot <copilot@github.com>
…that it doesn't trigger on unmanaged entrypoints
- Adjust the Pending portable entrypoint thunk logic to be a MethodDesc property not a DynamicMethodDesc property
- Handle TypedByReference in LowerTypeHandle

Co-authored-by: Copilot <copilot@github.com>
R2RDump previously could not read Webcil files (the format used for
managed assemblies in WebAssembly environments). This adds a
WebcilImageReader that implements IBinaryImageReader for the Webcil
format, enabling R2RDump to dump headers, methods, and section
contents from Webcil-format R2R images.

Changes:
- New WebcilImageReader.cs implementing IBinaryImageReader
- ReadyToRunReader detects Webcil format (after MachO, before PE)
- DumpModel handles Webcil in reference assembly loading
- Program.cs maps OperatingSystem.Unknown to TargetOS.Linux for Webcil
- ReadyToRunMethod gracefully handles null PEReader (Webcil has no PE)
- ILCompiler.Reflection.ReadyToRun.csproj includes shared Webcil.cs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the PEReader ImageReader property with a GetSectionData(int rva)
method that returns a BlobReader. This decouples the interface from
PEReader, enabling non-PE formats (Webcil) to provide section data.

Implementations:
- StandaloneAssemblyMetadata: delegates to PEReader.GetSectionData
- ManifestAssemblyMetadata: same with null-guard
- WebcilAssemblyMetadata: resolves RVA via WebcilImageReader sections
- SimpleAssemblyMetadata (tests): delegates to PEReader.GetSectionData

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement a full WASM instruction disassembler that decodes WebAssembly
binary format into WAT-style text output. This enables the --disasm flag
in R2RDump to work with Webcil/WASM R2R images.

- Add WasmDisassembler.cs with complete opcode tables for all standard
  WASM instructions (control, parametric, variable, table, memory,
  numeric, conversion, sign-extension, reference types) plus 0xFC
  (bulk memory/saturating truncation), 0xFB (GC), and 0xFD (SIMD)
  prefixed opcodes
- Add WebcilImageReader.GetWasmFunctionBody() to parse the WASM module's
  type, function, and code sections to extract function info including
  type signature and local declarations
- Integrate into TextDumper.DumpWasmDisasm() to print parameters and
  locals with their local indices, result types, and disassembled
  instructions

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WebcilAssemblyMetadata was not retaining a reference to the pinned
metadata byte array passed to its constructor. After
GetStandaloneAssemblyMetadata returned, the array could be collected
by the GC despite being allocated on the Pinned Object Heap, since
no live reference existed. This caused an AccessViolationException
when MetadataReader accessed the freed memory on larger files like
system.private.corelib.wasm.

Fix: store the metadata byte array in a field to keep it rooted for
the lifetime of the MetadataReader.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WASM R2R images should use .wasm extension instead of .dll. Update
CLRTest.CrossGen.targets to:
- Set output extension to .wasm for both composite and non-composite
  modes in bash and batch scripts when CrossGen2OutputFormat is 'wasm'
- Pass -f flag to crossgen2 in batch scripts (matching bash behavior)

Also set CrossGen2OutputFormat=wasm in Directory.Build.props for
browser target OS so all tests targeting wasm use the correct format.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-crossgen2-coreclr only use for closed issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant