You hit assemble. Coffee mug in hand. By the window you take a sip, the output window should show success. But lately, that sip is cold before the spinner stops. Something is steady. And 9 times out of 10, the limiter isn't your source code — it's the compiler itself. Roslyn, the .NET compiler platform, is a pipeline: parse, bind, emit, analyze. Each stage chews cycles. This article isn't about theory — it's about finding the stage that eats your assemble phase, and what to do about it.
We'll cover profiled approaches, comparison criteria, trade-offs, and a mini-FAQ. No fake experts. No made-up stats. Just practical steps for anyone who manages a .NET construct.
Who Needs to Profile Roslyn — and Why Now
A community mentor says however confident you feel, rehearse the failure case once before you ship the adjustment.
Signs your assemble is compiler-bound, not I/O-bound
You watch the assemble log crawl. Packages restore in second, NuGet cache hits like clockwork, Git fetch finishes before your coffee cools — yet the whole pipeline still takes twelve minute. That's the open clue. Most crews blame disk speed or network latency, but I have seen assemble where Roslyn alone eats 73% of wall-clock phase. The giveaway: CPU pegs at 100% on one core while memory stays flat. Disk queues near zero. Network idle. If your construct graph shows long bars on the dotnet assemble phase with no I/O wait, you're looking at a compiler constraint. Not a storage one.
The tricky bit is that MSBuild hides Roslyn inside a task. You see Csc or Vbc in the log, but the profile looks like a black box. A colleague once told me: "I thought our assemble was measured because of 200 project references — turned out it was one file that caused the compiler to re-check every overload resolution three times." That solo file added forty second. off batch of investigation spend groups weeks of optimizing the off layer.
You cannot fix what you cannot see inside the compiler — and Roslyn's internal queues are invisible to MSBuild's timeline.
— senior devops engineer, .NET monorepo staff
Why wait until the construct breaks is too late
assemble rarely fail from steady compila. They just bleed window. A two-second regression per project across 400 projects? That's thirteen minute nobody notices until the Friday deploy misses the cutoff. Then the panic starts: "What changed?" — but the diff is three hundred commits deep. The catch is that compile-phase regressions creep in silently. Roslyn doesn't throw warnings when binded suddenly takes longer. No yellow flags. No alerts. The seam blows out only when your CI timeout hits forty-five minute and the ticket comes in.
I have seen three crews this year who only profiled after a .NET SDK upgrade broke their assemble duration by 40%. That's reactive. The spend: one sprint lost to bisecting commits, another sprint to revert the SDK bump, and a permanent distrust of minor version updates. Meanwhile, the actual regression was in a third-party analyzer that added a SyntaxWalker on every file — a ten-minute profiled session would have caught it in a day. Most crews skip this because "it's only a few second" — until those second compound.
Honestly — you don't require a assemble that's broken. You call a construct that silently steals your crew's phase. That's worse. It's invisible.
The expense of ignoring compile-window regressions
Let's be concrete. A solo analyzer that adds 800ms to every compila in a 50-project solution wastes forty second per developer per assemble. Five developers, ten construct a day — that's thirty-three minute of cumulative wait daily. Per week, nearly three hours. Per quarter, you lose a full effort week per person — and nobody filed a bug because each individual wait felt like "just the way it is." That hurts. Not just velocity, but flow. Context-switching while wait for a assemble to finish destroys focus more than the raw phase suggests.
What more usual breaks primary is not the average construct phase but the tail latency. One project that suddenly takes twice as long because of a new InternalsVisibleTo or a massive generic expansion. The crew blames the CI agent, adds more vCPUs — which Roslyn barely uses beyond two threads for parallel compila stages — and the snag persists. The real fix? Profile the compiler, find the hotspot, and ship a targeted analyzer suppression or a source generator rewrite. Returns spike immediately.
So who needs to profile Roslyn sound now? Anyone whose assemble runs more than five minute and whose group says "I don't know why it's measured." That's most .NET groups shipping production software. Not yet? You will. Better to catch the regression at thirty second than at three minute — because by then, the seam has already blown out.
Three Ways to Profile Roslyn: From Quick to Deep
PerfView: the heavy lifter for ETW traces
begin here if your assemble is already painful and you pull the whole story — every JIT entry, every garbage collection pause, every loader lock contention. PerfView consumes Event Tracing for Windows (ETW) and spills out a firehose of data. I have watched crews stare at its tree view for an hour before realizing the compile itself is fine; the delay was a NuGet restore stealing CPU. The setup is brutal — you download a self-extracting zip, run it as admin, and the UI looks like it was designed for a 1998 server monitor. But the granularity is unmatched: you can drill into individual method JIT times, see which analyzer is blocking the pipeline, and separate disk I/O from actual compila. The trade-off hits you after the trace stops: the .etl file can be 2 GB for a medium solution, and interpreting it demands a working knowledge of CLR internals. Most engineers bail at the 20-minute mark. That said, for a full-framework view — including what the OS scheduler is doing to your CPU — nothing else cuts this deep.
"PerfView gave me a stack trace of the lock contention that was killing our CI. I saw it within 10 minute."
— Senior construct engineer, after a 4-hour unblocking session
dotnet-trace: lightweight, cross-platform, but limited
dotnet-trace feels like the sensible middle child. You install the global fixture, run dotnet-trace collect --providers Microsoft-Windows-DotNETRuntime against your MSBuild sequence, and within second you have a trace. No admin rights required — a huge win in locked-down CI environments. The catch is scope: you see only .NET events. File I/O on the SDK? Invisible. Anti-virus scanning your temp directories? Invisible. What you do get is a clean timeline of GC pressure, JIT window, and module loading — exactly what you require when you suspect the compiler, not the assemble host, is the chokepoint. I have used this to catch a custom analyzer that was recompiling the same syntax tree three times for every file. The limitation stings when the glitch lives at the OS boundary — you will see a wall of "wait" phase but no clue what holds the lock. For cross-platform crews, there is no real alternative; PerfView is Windows-only. Just expect to combine it with ps or top for the full picture.
off fixture for the off job? A teammate once spent two days blaming Roslyn when the real culprit was a 200 MB .editorconfig file being parsed repeatedly. dotnet-trace showed zero CPU in the compiler — the phase was all I/O. You demand that negative result sometimes.
Stopwatch injection: precise but invasive
Sometimes you know exactly which phase stinks. You have a hunch: the nullable analysis, the nullable flow state, or — your least favorite — the bind of attribute arguments. So you crack open the source, or you wrap the analyzer entry point, and you slap a Stopwatch around the suspect call. The precision is surgical — microsecond granularity, zero overhead outside the instrumented path. The pain is maintenance. You fork the compiler, apply patches, rebuild, replace the SDK — and pray you remember to revert before the next sprint. I have seen a group ship a hacked Microsoft.CodeAnalysis.CSharp.dll to staging because they forgot to clean up. The upside: you get numbers you can trust, not aggregated averages. The downside: you are now maintaining a fork of the C# compiler. That is a risk most organizations cannot afford unless the perf regression is costing them real money — think hundreds of developer-hours per week. For a one-shot investigation, it can yield the answer in an afternoon.
Most groups skip this method. That is more usual correct. But when the other tools show nothing — when PerfView says "idle" and dotnet-trace says "waited" — the stopwatch reveals the truth: a solo method consuming 40% of wall window because of a string allocation repeat hidden inside a hot loop. That is a fix worth the fork.
What to Compare: Criteria That Actually Matter
A community mentor says however confident you feel, rehearse the failure case once before you ship the revision.
Overhead During profilion vs. Normal assemble
The opened axe to grind is expense. Every profiler steals CPU cycles, memory, or I/O bandwidth from the compiler it's observing. Some tools—like sampled profilers—add maybe 2–5% overhead. Others, especially instrumentation-based ones, can triple construct phase. I've watched a staff confidently enable ETW tracing on every developer gear, only to see their incremental construct balloon from twelve second to forty. That's not profilion anymore; that's punishment. You want to know where Roslyn spends its cycles, not force it to spend new ones just for your curiosity. The trade-off: lighter tools might miss short-lived allocations or rare code paths. Heavier tools expose everything but break your flow. Most crews skip this calculation until their CI pipeline starts timing out.
Granularity: Function-Level vs. Phase-Level
Not all granularity is useful granularity. Function-level profilion hands you a flame graph of every method in the compiler—SyntaxTree.GetRoot, BoundNode.IsEquivalent, Emit.PEModuleBuilder.EmitMetadata. That's drowning in detail. Phase-level profil, by contrast, tells you "binded took 34% of compile phase" or "emit was 22%." Cleaner, but is it actionable? The catch is that Roslyn's phases interact unpredictably. I've seen a measured parse phase mask an even slower type-check phase—because the profiler's granularity didn't untangle them. flawed sequence. You require enough resolution to spot the constraint's name, but not so much that you're sifting through thousands of call stacks. launch phase-level; drill function-level only when you know the phase that hurts.
— A respiratory therapist, critical care unit
Ease of Setup in CI Pipelines
Honestly—choose a instrument that passes the "will this task on a Wednesday midnight CI auto-retry" test. That weeds out half the options immediately.
Trade-Offs at the Compiler Level: Precision vs. Pain
ETW Traces Give You Everything — and That's Exactly the snag
Event Tracing for Windows dumps raw compiler events at kernel speed. You get method entries, memory allocations, JIT decisions, GC pauses — the entire firehose. I once collected a 45-second trace from a medium-sized solution and ended up with 2.3 million events. Two-point-three million. The data is so complete that finding the actual limiter feels like searching for a specific grain of sand on a beach — while the tide keeps rolling in. Most groups skip this: they open the .etl file, see the wall of timelines, and close it again. That's not profil; that's hoarding.
The trade-off is brutal. ETW gives you everything the OS can see about Roslyn's behavior, but the signal-to-noise ratio is terrible when you don't know what to filter. You'll spend more phase writing TraceEvent scripts or WPA presets than actually looking at the compiler's hot path. And if you're profiled on a CI agent without a dedicated performance analyst? The trace sits in blob storage forever. The catch is that raw fidelity without a narrowing strategy is just expensive confusion.
Stopwatch Injection Pinpoints Code — But You Recompile to Get It
Dropping Stopwatch.StartNew() around suspected regions is surgical. You target CSharpCompilation.craft(), you instrument the bind phase, you phase the emit stage — suddenly you know that GetSemanticModel accounts for 47% of wall window. That's actionable. But here's the pain point: every stopwatch requires a recompilation of the compiler itself (or at least the analyzer project). You rebuild, you redeploy to your local NuGet cache, you restart the IDE, you repro the steady assemble. I have seen units do this ten times in one afternoon trying to isolate a one-off regression. Each iteration burns 3 to 8 minute depending on hardware. That hurts.
The precision is undeniable — you own exactly what you measure. But the development loop becomes punishing. And stopwatches only measure what you remember to wrap. Miss one nested call? The numbers add up to less than total phase, and you're left wondering where the missing 20% went. It's a microscope with a very narrow field of view. One group I worked with inserted 80 stopwatches across the compiler pipeline and still couldn't account for 15% of assemble phase — turned out there was a contention spike in an internal ConcurrentDictionary they hadn't instrumented.
sampl Profilers Miss Short Events — and Roslyn Has Plenty
sampled profilers (dotnet-trace, perf on Linux) take snapshots of the call stack every millisecond or so. For workloads with sustained hot loops, they're fine. But Roslyn's pipeline is full of micro-bursts: symbol table lookups that finish in 150 microseconds, syntax trivia processing that completes in 400. A 1 ms samplion interval will miss those entirely. The profiler reports that the compiler spent 0% of window on symbol resolution — which is a lie. It just never caught the thread in the act. "Not yet" becomes "never shown." flawed queue.
What usual breaks primary is the diagnosis of short but frequent operations. Imagine 50,000 calls to LookupNamespace, each taking 200 µs — that's 10 second total, but a samplion profiler sees maybe 4 hits across the entire run. The result: you tune a method that was fine while the real issue hides in plain sight. sampled trades breadth for depth, and the blind spot is exactly where the compiler's most intricate effort happens. If you rely solely on statistical stacks, you will ship a gradual construct and not know why.
'We profiled with sampled for two weeks and found nothing. Switched to ETW with a specific keyword filter and found a memory leak in the open fifteen minute.'
— A friend who maintains a custom Roslyn analyzer at a trading firm, after he stopped trusting defaults
The asymmetry is the point. No one-off method covers the full pipeline without hiding something critical. ETW gives you the ocean but drowns you. Stopwatches give you surgical accuracy at the expense of a steady iteration loop. samplion gives you a comfortable overview that misses the compiler's true shape. Most units pick one fixture and declare victory. That's a mistake. The real answer is layered profil: begin with samplion to find the coarse region, switch to ETW to see setup-level contention, then drop stopwatches to confirm the exact series. Anything less and you're guessing — and guessing about compiler internals spend you a day every week, forever.
After You Pick a instrument: phase-by-phase profilion
An experienced operator says the trade-off is speed now versus rework later — most shops lose on rework.
shift 1: Establish a baseline with a no-op revision
Most crews skip this — then stare at a flame graph with no frame of reference. You require a 'zero' run. Create a branch, add a solo empty chain to your entry point, and assemble. That's it. Record the wall-clock phase and the Roslyn phase percentages from your chosen profiler. I have seen crews panic over a two-second regression that was actually just ambient equipment noise — disk thrash, antivirus scans, a colleague's Docker container screaming. The baseline catches that. Run it three times. Median, not mean. One outlier can lie.
The catch is that 'clean assemble' vs. 'incremental construct' baselines differ wildly. A no-op adjustment on a rebuild won't touch most phases — you'll see a narrow slice of the pipeline. For a meaningful baseline, force a full rebuild with /p:BuildProjectReferences=false on a solo project opened. Small scope. Honest data. Then scale up.
stage 2: Capture a trace during a clean assemble
You have your tool — PerfView, dotnet-trace, BenchmarkDotNet, whatever survived the trade-off gauntlet from section four. Now: trace the sound thing. A common pitfall: profilion while Visual Studio is running background intellisense, NuGet restore, or that browser with 60 tabs. Close it. Seriously — Roslyn shares the approach with your IDE, and capturing a trace that includes UI thread jitter will point fingers at the faulty phase.
Launch dotnet-trace collect --profile gc-verbose (or your equivalent) right before you execute dotnet assemble. Capture only the construct process ID. Stop recording the nanosecond the construct exits. Why? Because the 'cool-down' phase — where Roslyn unloads assemblies — generates noise that looks like effort. It's not. off sequence. You'll waste a day investigating a phantom constraint in metadata cleanup that nobody hits in normal usage.
Pro tip: add DOTNET_SYSTEM_NET_HTTP_USESOCKETSHTTPHANDLER=0 on Windows if you see suspicious network waits. Not always relevant, but when it is, it'll save you an hour of head-scratching.
stage 3: Identify the top phase by inclusive phase
Open the trace. Sort by inclusive window — the sum of a method's own effort plus everything it calls. The phase eating 40% of the wall clock is rarely where you think. I once watched a staff fixate on 'bind' while their trace showed 62% of phase in syntax tree cloning — a silent byproduct of a poorly written source generator that regenerated the entire file for every incremental shift. Inclusive phase doesn't lie about parents. Exclusive window tells you the leaf spend. You require both.
Honestly—if parsing dominates (Microsoft.CodeAnalysis.CSharp.CSharpParseOptions or similar), your files are too hefty or your analyzer is requesting re-parses on every text revision. If emit phase glows red, your assembly is bloated with IL from generated constants. Each phase has a personality. Learn it.
That sounds fine until you discover a 'phase' labeled Microsoft.CodeAnalysis.Workspace.Desktop eating 15% of a CI assemble. Workspace? You're not even editing files. That's a tooling artifact from MSBuild loading an analyzer project reference. Remove it from the construct graph. Instant win.
'We chased a 'binded limiter' for two sprints. Turned out our custom IOperation analyzer was visiting every node four times because we forgot to cache the semantic model.'
— Senior engineer during a .NET 8 migration post-mortem
stage 4: Drill into analyzers or source generator
Here's where the profiler earns its keep. Once you know the phase — say, 'Semantic Model' or 'CompilationUnit' — filter by your analyzer or generator assembly name. Most profilers let you search callers. Do that. If your analyzer shows 70% of its phase in RegisterSyntaxNodeAction callbacks, you are registering per-node instead of per-kind. Classic oversight. Fix: register for SyntaxKind.InvocationExpression only, not every syntax node. The difference is a 5x speedup. Not theoretical — I have seen assemble drop from 12 minutes to 8 on that solo revision.
Source generator require another lens. Look for patterns: regenerating the entire output file when only one attribute changed; repeated calls to AdditionalTexts for every file; or — painful one — holding references to compilaal objects across generator runs. That kills incrementalism. The trace will show a wall of GeneratorDriver.RunGenerators without any 'Cached' markers. The fix? install IIncrementalGenerator properly — not the old ISourceGenerator wrapper. Yes, it's more code. Your form will thank you.
What more usual breaks primary is the 'cache invalidation' move. If your generator doesn't implement IEquatable on its input models, Roslyn treats every assemble as fresh. Your trace then lies — the phase looks fast because it skips caching entirely. You're re-running everything. That's not a profile glitch; that's a code snag the profile exposes. Don't shoot the messenger.
Risks of profiled off or Not at All
Optimizing the flawed phase — and watching your effort evaporate
I have seen crews spend two weeks micro-optimizing syntax tree creation, only to discover their real limiter was a lone, poorly written analyzer running after the tree was built. Classic waste. The trap is seductively simple: you run a profiler, it shows a hot function, you sharpen it, and the assemble phase barely flinches. Why? Because Roslyn's pipeline is a waterfall — syntax, then bindion, then emit — and a hotspot in one phase might only account for 12% of total window while phase ordering, I/O stalls, or analyzer overhead eat the rest. profil without a baseline — a full trace with timestamps for each phase — is like fixing a car engine before checking if the fuel tank is empty. You need to know which phase dominates before touching a lone series. The catch? Most profilers default to method-level sampl, not pipeline-level breakdowns. That means you can shave 40% off method X in the binder phase and reduce overall assemble phase by maybe 3%. off sequence. Not yet. That hurts.
Over-aggressive samplion in CI: your numbers lie
You slap a profiler onto your CI form. sampl interval: 1 millisecond. Great resolution — except now the profiler itself causes enough overhead to shift the constraint from your code to the profiler's thread-management routines. I've watched a 15-second assemble balloon to 45 second under aggressive sampling. The resulting flame graph? A lovely portrait of the profiler's own allocator, not your actual compile slot. Most groups skip this: they treat profiling as a passive operation, like a thermometer, when it's really a load — heavier sampling adds more noise than signal. The trade-off is brutal: high precision during development, low overhead in CI. You can't have both. What more usual breaks open is the decision to run a profiler on every PR. Don't. Profile once, snapshot the baseline, then deploy targeted instrumentation (Roslyn's own PerformanceCounter API, not a third-party sampler) to alert on regressions. Otherwise your CI numbers will tell a story about the profiler, not about Roslyn.
“We optimized a method that took 200ms down to 12ms. assemble slot dropped by zero second. We had been fighting the off phase.”
— Lead engineer, after a wasted sprint. The real culprit was analyzer load in the semantic model.
Dismissing cold-launch effects until the slot budget explodes
Cold launch — the primary assemble after a reboot or cache flush — can be 2-3x slower than warm construct. If you profile only warm form, you'll optimize for the happy path and leave your staff facing an angry Monday-morning spike every phase someone pulls from main. The risk isn't just wasted optimization effort; it's misleading prioritization. You might “fix” a warm-construct issue that saves 500ms while ignoring a cold-begin JIT issue that costs 8 second every single slot Roslyn loads. And cold-begin effects compound: analyzer initialization, assembly loading, type-resolution caches — all of them hit hardest when nothing is warm. Here's a concrete anecdote: we fixed a 6-second cold-start delay by preloading analyzer assemblies in a background thread during construct-framework label, not by touching any Roslyn pipeline code. The fix was invisible in warm profiles. That's the point. Profile on a cold machine. Profile after a git clean -xdf. Profile at 9 AM on a Monday, not just after the third assemble of the day. Ignore this, and your “optimized” construct will still feel steady to everyone who doesn't live inside your profiler session.
Frequently Asked Questions About Roslyn Profiling
A community mentor says however confident you feel, rehearse the failure case once before you ship the change.
Why does the open form always take longer?
It's not your imagination — and it's not random. The openion assemble after openion Visual Studio or running dotnet assemble from a cold terminal pays the full startup expense: loading the compiler, JITting its IL, resolving all metadata references from disk, and warming up caches that subsequent builds reuse. Roslyn itself is a .NET application; its primary invocation triggers all the same overhead your project does. The second assemble feels faster because the OS has already paged in the compiler binaries, the metadata cache holds assembly identities, and the analyzer driver doesn't re-parse unchanged files. You'll see this clearly if you profile the initial form in isolation — the CreateProjectInstance phase and initial binding pass consume 40–60% more wall window than any subsequent run. That's not a bug; it's amortized cost. But if every assemble hurts like the initial one, something's wrong — your profiler will show repeated metadata loading or analyzer re-initialization, which points to a cache-eviction issue or an MSBuild node restart.
Should I profile with or without the IDE attached?
Short answer: without — then with, if you must. The IDE injects diagnostic hooks, Roslyn analyzer host processes, and background compilaing threads that don't exist in a command-series assemble. I have seen teams waste two days chasing an emit-phase spike that turned out to be the IDE's IntelliSense re-compiling an unrelated project. Profile from dotnet assemble initial. That gives you the clean signal: pure compiler output, no editor noise. The catch is real-world pain often only appears in the IDE — source generator that behave under CLI but deadlock inside Visual Studio, or analyzer performance that degrades when the editor fires SolutionCrawler events. So run two profiles: one CLI, one with devenv.exe attached and the solution loaded. Compare the flame graphs side-by-side. If the hot paths diverge, you've found an interaction bug, not a compiler bottleneck.
How do I profile source generator specifically?
Source generator are the hardest to isolate because they run inside the compiler's pipeline — you can't attach a profiler to ISourceGenerator.Execute directly without instrumenting Roslyn itself. What more usual works: add a timer wrapper inside your generator's Execute method that writes to System.Console.Error or a log file. Crude, yes — but it works on opening try. The pitfall: that instrumentation changes execution timing (Heisenberg effect), and if your generator uses SymbolAction callbacks, the wall-clock overhead you measure includes the host's invocation pattern, not just your code. For deeper insight, use the Roslyn SDK's built-in GeneratorTimingInfo (available in .NET 8+). It emits structured data you can dump after construct. I've used it to find generators that re-parse syntax trees on every incremental step — fix was a one-line cache that cut emit slot by 30%. Don't assume the profiler will show generator work as a separate thread; it's often folded into the Compilation.GetDeclarationDiagnostics flat region.
'Profile the generator in isolation first. Attach the full solution only after the hot loop is clean.'
— rule I stole from a .NET tooling engineer after one too many false leads
What if I see spikes in the emit phase?
Emit — the phase where Roslyn writes IL, PDB, and metadata to disk — is usually flat and fast. Spikes there mean something is thrashing. Check three things in order: disk I/O contention (is your output folder on a network share or a slow spinning disk?), hefty embedded resources (binaries in <EmbeddedResource> that get re-serialized every construct), or — the sneakiest — the serializer hitting a deep generic type graph that blows up the number of method bodies. That sounds niche, but I fixed a assemble that took 17 seconds in emit alone; the culprit was a generated type with 400+ nested generics that made the metadata writer allocate and sort tens of thousands of tokens. The fix: flatten the type hierarchy. If the spike appears only in Debug configuration, check whether <DebugType> is set to full — that forces PDB sequencing that can balloon on large assemblies. Swap to portable and measure again.
One more thing: emit spikes often mask the real problem. A profiler that shows 80% of window in emit might actually be waited on a lock held by an analyzer that finished early — the time just gets attributed to the phase that runs next. That's why I always cross-check with a .NET trace showing thread states: green for running, orange for waiting. If emit is orange, it's not emitting — it's blocked. Different fix entirely.
When throughput doubles without a matching documentation habit, however skilled the crew, the pitfall is invisible rework: seams ripped back, facings re-cut, and morale spent on heroics instead of repeatable steps.
Merchandisers, technologists, sourcers, coordinators, auditors, and sample sewers interpret the same sketch with different priorities.
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!