Remove unsafe code from System.Linq.MaxMin#127845
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
15bf50d to
8837a91
Compare
This comment was marked as resolved.
This comment was marked as resolved.
4453202 to
9a4b7b1
Compare
647f299 to
4eeec93
Compare
This comment was marked as resolved.
This comment was marked as resolved.
|
@EgorBot -arm -amd using System;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
private int[] _ints = default!;
private long[] _longs = default!;
private byte[] _bytes = default!;
[Params(2, 4, 16, 32, 200, 1000)]
public int Length { get; set; }
[GlobalSetup]
public void Setup()
{
var rng = new Random(42);
_ints = new int[Length];
_longs = new long[Length];
_bytes = new byte[Length];
for (int i = 0; i < Length; i++)
{
_ints[i] = rng.Next();
_longs[i] = ((long)rng.Next() << 32) | (uint)rng.Next();
_bytes[i] = (byte)rng.Next(256);
}
}
[Benchmark] public int IntMax() => _ints.Max();
[Benchmark] public int IntMin() => _ints.Min();
[Benchmark] public long LongMax() => _longs.Max();
[Benchmark] public byte ByteMax() => _bytes.Max();
} |
|
PTAL @MihaZupan @tannergooding Had to mark the method NoInlining due to inliner issues I want to address separately. Currently it adds 0.5ns call overhead for small inputs (but only for trivial benchmark wrappers where the entire thing used to inline, in the real world it's very unlikely to happen) Otherwise, no extra bound checks in this impl in codegen. benchmark: #127845 (comment) and results: EgorBot/Benchmarks#186 Follow ups:
|
MihaZupan
left a comment
There was a problem hiding this comment.
GitHub reviews are broken atm, so here's the comment I wanted to post:
Does this do anything since it's on the interface?
In this method we currently still run out of time budget, [AI] gave inliner a hint that we should inline this 14b method even we when already ran out of time budget. Otherwise I see non-inline calls in the codegen. Also something I want to clean up in a separate inliner change. Although, in this case we just need to increase the budget based on some heuristics like Vector.IsSupported branches. |
|
I meant specifically about adding the attribute on the interface's abstract definition itself. |
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| public static bool Compare(T left, T right) => left > right; |
There was a problem hiding this comment.
Are these not within the "always inline" threshold already?
I would've though this is like 5-6 bytes of IL and so something that we should trivially believe is in budget, particularly for primitive T
Maybe it being an op_GreaterThan constrained call is what's throwing things off?
There was a problem hiding this comment.
yeah this one was 14 bytes of IL, and we reduce "always inline" to 12 bytes once we ran out of time budget.
I removed this AggressiveInlining as it's no longer relevant with #127845 (comment)
There was a problem hiding this comment.
Interesting, I wonder if we should special case some of the IL sizes here like this one where it's namely big because of the constrained. <token> prefix.
We could of course special case all the generic math calls for the primitives as well, but it'd be nice if we didn't need to do that and have the inliner "do the right thing".
There was a problem hiding this comment.
Maybe in T1 we could do something like "inline everything <= 12 bytes" and then look at IL to make a heuristic for everything <= otherBound", such as if it is a single call or other special cases that are "likely profitable" but slightly larger.
4d1b2d4 to
a8ebe80
Compare
|
@MihaZupan @tannergooding so I decided to take another approach: if we already happen to inline a method with |
|
@EgorBot -arm -amd using System;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
private int[] _ints = default!;
private long[] _longs = default!;
private byte[] _bytes = default!;
[Params(2, 4, 16, 32, 200, 1000)]
public int Length { get; set; }
[GlobalSetup]
public void Setup()
{
var rng = new Random(42);
_ints = new int[Length];
_longs = new long[Length];
_bytes = new byte[Length];
for (int i = 0; i < Length; i++)
{
_ints[i] = rng.Next();
_longs[i] = ((long)rng.Next() << 32) | (uint)rng.Next();
_bytes[i] = (byte)rng.Next(256);
}
}
[Benchmark] public int IntMax() => _ints.Max();
[Benchmark] public int IntMin() => _ints.Min();
[Benchmark] public long LongMax() => _longs.Max();
[Benchmark] public byte ByteMax() => _bytes.Max();
} |
|
PTAL @AndyAyersMS @dotnet/jit-contrib inliner changes please. I need this in other places as well, my work to replace unsafe code with safe too often runs into "out of time budget". Functions with SIMD are typically very rare outside of BCL so I don't expect any impact on users codebases. |
|
benchmarks seem happy without all the MethodImpl hacks EgorBot/Benchmarks#187 |
| case NI_IsSupported_False: | ||
| { | ||
| assert(sig->numArgs == 0); | ||
| impInlineRoot()->m_inlineStrategy->NoteHardwareIntrinsicCheckObserved(); |
There was a problem hiding this comment.
Do we want to make this observation for the false case, since presumably everything under that path is "dead code"?
| CORINFO_CLASS_HANDLE typeArgHnd; | ||
| CorInfoType simdBaseJitType; | ||
|
|
||
| impInlineRoot()->m_inlineStrategy->NoteHardwareIntrinsicCheckObserved(); |
There was a problem hiding this comment.
Similar question here, do we want to make this observation in the case the only path is for an unsupported type? Will this negatively impact for example when T is System.__Canon and so all the intrinsics paths are dead/throwing?
| MAX_OVER_BUDGET_INTRINSIC_INLINES = 50 | ||
| MAX_OVER_BUDGET_INTRINSIC_INLINES = 50, | ||
|
|
||
| // When the root method or an already-imported inlinee references a |
There was a problem hiding this comment.
We still don't have a way to adjust how much budget was estimated to be used vs actually got used in the case most imported code is dead, right?
There was a problem hiding this comment.
We do make this adjustement:
runtime/src/coreclr/jit/flowgraph.cpp
Lines 570 to 583 in 4ffb643
This ultimately feeds into the budget update.
Note
This PR is AI-generated.
This PR does several things:
MinMaxInteger(3) is needed because we run out of inliner budget in the safe version: https://gist.github.com/EgorBo/278e4bc6795cc829b9e129d7c5932f14
Benchmark
Summary — speedup of PR vs
main(>1 = PR faster, <1 = PR slower). Full results: EgorBot/Benchmarks#187.x64 (AMD EPYC, AVX-512) — significant wins from inliner-budget fixes + more efficient Vector256/512 → scalar reduction:
arm64 (Apple M2, Vector128) — mostly flat; minor regressions at small/medium lengths