Knowledge Base/Profiler/Resources

User Guide

Thomas Høst Andersen
posted this on August 02, 2011 11:43

Intro

EQATEC Profiler can performance-profile almost anything .NET:

  • Full .NET 2.0 and above
  • Windows Phone 7, both Silverlight and XNA apps
  • Silverlight 2.0 and above
  • NET CF 2.0 and above (WinCE, Windows Mobile)
  • Xbox XNA, ASP.NET, .NET services

The only thing it doesn't profile is .NET 1.x. No other profilers in the World works on this many .NET platforms. It's also quite easy to use. You usually just point and click a few times:

First you select a .NET application, then you build and run a profiled version of it, and afterwards you can view the performance snapshot and compare snapshots to see your optimization's effect.

That's it, really. Most often you should just follow the yellow buttons.

Example walk-through

The profiler comes with a demo application (Help > Load example application) which you can try out immediately.

For a more elaborate example we have profiled the command-line version, EQATECProfilerCmd.exe - so what you see here is actually a real example of how we use the profiler to optimize itself.

Step 1: Build

The Build-tab is where you generate the profiled version of an application. The profiled version will have some additional timing-instructions injected into it. Technically, this is called instrumentation. Here, we have browsed to the folder for EQATECProfilerCmd.

Building a profiled application with EQATEC Profiler
  • Browse to, or simply drag-drop a folder on this tab, to select the application's assemblies.
  • Normally you will always want to include (profile) all the application's assembly-files.
  • If you wish to profile a signed assembly you must re-sign it afterwards. Alternatives are skip (don't profile this assembly) or strip (don't resign, which will likely cause trouble).
  • Press shift-delete to remove an app from the "most recently used" drop-down list.
  • You can control snapshot and method-profiling programatically, too. See "Profiler API" further down.
  • The command-line version, EQATECProfilerCmd, can e.g. be used in your autobuild.
  • If more that one executable is found you must manually move to the Run-tab and select which one to run. If only one exe is profiled you can use the yellow Run-button to do that.

You can change the default build-options in Tools > Default Profiling Options, or change options just for this one app. All settings for profiling this particular app is stored in the file app.eqconfig in the application's directory. It's a plain xml-file.

You can load/save options from other config-files, too, so e.g. various project-members can have each their preferred profiling-options. These can also be put under version-control. Note, that they must, however, reside in the app's folder.

A note on Silverlight apps: A Silverlight application's xap-file is always hosted in a webpage. Most often you have therefore created a web-project with some customized hosting webpage (html or aspx) in a ClientBin-folder. You will typically want to point the profiler to that ClientBin-folder, and not to the Silverlight-project's bin\Debug-folder. Also, you can surely profile just a single stand-alone xap-file, e.g. one downloaded from the web, but without an associated hosting webpage the profiler can't run the app. Think of it this way: point the profiler at a folder that has a xap you can already view in a webpage.

Build options

You will normally not need to change the build options.

Options for profiling an application with EQATEC Profiler
  • Output folder: This is where you profiled app ends up. Change this if e.g. you are not allowed to write to the application's folder, or just want the instrumented version somewhere else. Note that on e.g Vista you are probably not allowed to write to C:\Program Files, so you must choose a different output-folder if you wish to profile an installed application.
  • Full/CF apps: This is where snapshot-reports are stored. Change this if you want them to end up somewhere else, e.g. on a network drive or on a memory card.
  • Tiny methods: Is always best to skip; see "Overhead and performance" further down. Change this if you wish to check that the method is really called (use callcount only for that) or if you simply wish these methods to be reported just like any other method (use full info, but don't place too much trust in the timing numbers).
  • Runtime logging: The runtime-module can print out diagnostics, mostly useful for debugging the profiler itself. You may want to not print anything at all (for e.g. a device with no writeable disk) or print detailed info (for e.g. debugging the profiler).
  • App/Profiler connectivity: Communication between the profiler and profiled application (for taking live snapshots etc) is by plain HTTP over sockets. The profiled application will connect to the profiler at the specified port.

For folders you may use ordinary environment variables in the settings, e.g. $(SystemRoot). One special pseudo-variable has been added: $(AppDir) is the path to the unprofiled application itself.

Step 2: Run

The Run-tab is where you run and control your profiled application. It's also where you have an overview of the snapshot-reports that are generated. Just run any regular application (for services etc see below) and it will automatically dump a profiling report upon exit.

The Run-tab is where you run and control your profiled application. It's also where you have an overview of the snapshot-reports that are generated. Just run any regular application (for services etc see below) and it will automatically dump a profiling report upon exit.

Run and control a profiled application with EQATEC Profiler
  • "Take snapshot" will generate a profiling-report and also reset all timing counters to 0, while "Reset counters" will only reset the timing counters.
  • "Shutdown app" will terminate a running .NET CF app (may take up to 10 sec).
  • If the profiler is connected to a .NET CF app that is writing a report, the report will be transferred into your "full .NET report-folder" and deleted on the device afterwards.
  • Command-line args are remembered in the registry on a per-app basis.
  • The listview monitors the report-output-folder. Select one report to view it or two to compare them.

More about running applications

The profiler can only start .NET executables and Silverlight apps by itself. If you have one or more DLLs representing a service or ASP.NET application then you will have to put those DLLs "into play" manually:

App type Run? Comments
Full .NET
Full .NET
yes Just press "Run app" to start the executable.
Silverlight
Silverlight
yes Just press "Run app" to load the application in a web-server or plain html-page.
.NET CF
.NET CF
yes Just press "Run app" to deploy and run the .NET CF app to any of your installed devices or emulators.
WP7
Windows Phone 7
yes Press "Run app" to deploy and run the Windows Phone 7 app to any of your installed devices or emulators. (Note: unlike for .NETCF the profiler does not wait for the app to finish)
Service
.NET service
no You must manually stop the running service, copy the instrumented service-dll to the appropriate location, and start the service again.
ASP.NET
ASP.NET
no After profiling the asp.net-files and making sure they end up where you want them you must manually open a browser and load the corresponding webpage. See also Profiling ASP.NET websites with EQATEC Profiler.

Once any instrumented set of DLLs are running you can control them via the profiler and get reports just like for a ordinary standalone full .NET app. It's just the actual loading of them, to get them started, that must be done by hand as described above.

Running .NET CF apps

The profiler can deploy and run .NET CF apps to any installed device or emulator. You must pick a device and target directory:

Deploy a .NET CF application with EQATEC Profiler Deploy a .NET CF application with EQATEC Profiler
  • Platform: The communication module is native code. Therefore the profiler must know the correct CPU platform. For some platforms this info is not stored on your PC, but you can press "Detect" to actually connect to the device and detect it.
  • If you pick the wrong platform then profiler/app communication will simply not work, so you can't take snapshots and it can't even tell you nicely about this problem. Therefore, the profiled app will at runtime throw an exception along the lines of "The DLL 'DeviceAgentTransport.dll' does not match this device platform".
  • Target dir: The profiler will examine the app's parent folders for project files containing hints about the target directory, so this will usually be filled out automatically. Feel free to change it.
  • By default only the profiled assemblies are copied to the device, to save time. You may optionally choose to deploy all files in the app dir and its subfolders.
  • Deployment is pretty device/platform-specific. If something does not work well for you then don't hesitate to contact us so we can improve it.

Running Windows Phone 7 apps

The profiler can also deploy and run Windows Phone 7 (WP7) Silverlight apps to any installed device or emulator:

Deploy a Windows Phone 7 application with EQATEC Profiler

 

  • For now, the Zune player must be running to make a connection to a physical WP7 device. This is not a profiler-specific requirement, but also goes for even Visual Studio 2010.

 

Step 3: View

In the View-tab you view a profiling-report and hopefully find the app's bottlenecks.

You can browse the executed methods by selecting them in the upper list or in the lower call-graph view. Just follow the highest total time to find performance bottlenecks.

View a profiling report generated by the EQATEC Profiler
  • The upper summary show the grand total for each method: how many times it was called during this run and how long that took.
  • The lower call graph show the details of the selected method. The neighbor columns, "called by" and "called into", will show numbers related to the particular calls into and from the selected method.
  • The yellow bars show how much time each caller spends calling the selected method. Some caller's calls may be cheap while others are expensive. All the yellow bars add up to 100%.
  • The red bars show how much time is spent internally in the selected method and by the methods it calls. Those figures add up to 100%.
  • If a method is still running its timing will be shown with a '+' in front of it. This only happens if you take snapshots for a live application. What it means is that currently the reported internal execution time may be too low, as the latest call to the method has not yet completed.
  • Navigate backwards/forwards in the call-graph. The entire background acts as a back/forward-button, making it really easy to hit.

In our example we saw that AssemblyScout.ProcessDirectory() took most of the time. So we optimized it and 20 minutes later we were ready to do a new profiling and compare the before/after runs to see the performance improvement.

Step 4: Compare

In the Compare-tab you can compare two profiling-reports and see the effect of your optimization efforts.

This feature is a bit special because judging "improvements" by comparing two runs of an application is pretty complex, even by hand. Who's to say what's really important? If you profile a game twice then the total runtime of many methods will be different, but just because e.g. Main took twice as long may not mean anything. If a method was called 10 times before, and now only 5 times, is that an improvement? If a method was called 10,000 times before and now only 10 times, is it reasonable to compare the average time? Well - maybe.

We figured there were two ways to deal with this complexity:

  1. Give you lots of information and let you sort it all out yourself, or
  2. Figure out what's really most important and show only that

Naturally, we settled for option 2. Some nuances may be lost, some guesses may fall short in certain situations, but generally we hope the comparison-mechanism will usually give you helpful advice. Let's take a look at comparing the before/after reports for the EQATECProfilerCmd optimization example:

View a profiling report generated by the EQATEC Profiler
  • The confidence is an attempt to help you judge how much "trust" you should place in the reported speedup of this particular method. Let's look at a few examples. If some method - let's say Main - goes from 20 sec to 5 sec with the exact same input and behavior, then surely you've sped it up by 4 times. That's pretty evident. But what if your mouse-move handler that is called 10,000 times in the first run and only 10 times in the next goes from an average time of 0.5ms to 0.4ms? Is that really a reliable speedup of 25%? Probably not, as it might just be due to slightly different scheduling or variations in what the handler does with these particular mouse-coordinates or initial CLR JIT-overhead or whatever. Also, the profiler's timing can be too coarse for such small methods. Bottom line, you shouldn't trust this reported speedup nearly as much as the very reliably measured decrease of 15 sec for the "big" Main method. Long runtimes and roughly similar callcounts seems inherently the "most reliable", and short miliseconds runtimes or vastly different number of calls seems the "least reliable", wouldn't you say? Thus, attributing high confidence to long runtimes and high callcount-similarity is the guiding principle behind the rating. But please remember: the confidence is not "an absolute truth", but merely a hint.
  • The speedup is the ratio between the time before and after. It's what you will tell your teammates and project lead: "I optimized our app and got a speedup of 7 times". If a method took 20 sec before and now only takes 5 sec then the speedup is 4 times. Speedups of very small numbers are not very reliable.
  • The Diff, Old, and New Avg times are just that - the difference in average time of this particular method before and after. The total-time is not shown, because it's ultimately the average time (maybe not of this method itself, but of some higher caller) that says something about the change in performance.
  • The Diff, Old, and New Calls show the differences in calls. Sometimes this is interesting.
Apart from the columns there are a few other noteworthy issues:
  • By default only info about "significant changes" is shown. This is helpful in weeding out the many methods that has not really changed between the two runs. A change is only significant if: the method took at least 0.5ms either before or after; and the average time difference is larger than 0.05ms; and the speedup change is at least 2%.
  • Don't worry about which report is "Report 1" or "Report 2"; the viewer will automatically find out which one is the oldest and newest.
  • You can browse to any particular method in the old or new report, both from right-click context-menu or the buttons beneath the list.

So, how did our optimization go? Well, by replacing one single method-call from GetAssembly to GetAssemblyManifest and change a few other lines, that particular phase of the profiler was sped up 2.43 times. The entire profiling was sped up 1.52 times, which was okay for a 20-minute effort.

Overhead and performance

We have found that a profiled application typically is 30% larger than the original and runs 30% slower. However, this can vary a lot and "the overhead" is of course much more complex. Here is the full story. Knowing this can help you understand certain profiling-situations.

The code-overhead of a profiled app consists of two parts:

  • Base runtime-module: fixed overhead of about 100 Kb (about 60KB for .NET CF)
  • Per-method overhead: instrumenting a profiled method typically adds 40 bytes, and storing its name also adds some bytes, like 10-20% of the name's length in bytes (method-names are stored compressed).

A small app will therefore grow very much in total, maybe several hundred percent, due to the added runtime-module. On the bright side an app may also shrink in size, because instrumentation optimizes branching in the existing code.

The runtime-overhead of timing a profiled method typically requires executing an additional 80 CIL assembly-instructions. That is why it is highly recommended that tiny methods (simple set/get of variables, 3-8 CIL instructions long) should not be profiled. It hardly makes sense to spend 10-20 times as much timetiming a method than just executing it.

This means that a computing-intensive app, or an app calling a lot of small, profiled methods, can run very, very slow. Several orders of magnitude slower, in fact. A profiled fractal-generator app might run 5-10 times slower when profiled. On the other hand, many regular apps we have tested have run with only a slight hint of overhead during "normal use", so this is indeed an area of great variation.

Profiler API

If your application includes a reference to the runtime-module it may programmatically control certain profiling aspects. Generic versions of the runtime-modules are included in the installation, in folder RuntimeDLL, specifically for the purpose of being used as a reference. You will most likely want to copy that DLL to your project and place it under version control, if you use it.

The programming API allow you to take snapshots at runtime, clear the counters, examine the profiling-settings etc. Please explore the EQATEC.Profiler.Runtime namespace for the full API. Here is an example:

Use the runtime-API to dump snapshots at will

You may also add attributes to prevent methods from being profiled or appearing in the runtime-report. Profiled apps normally run fast enough, but it's there if you really need it:

Use attributes for finer profiling-control

"HideAtRuntime" means that this method's activities will not be reported at runtime. "SkipInstrumentation" avoids profiling this method entirely. If you specify the attribute for an class it takes effect on every method in that class.

Known limitations

This is a list of known limitations and problems in the current version.

  • Blocking methods, such as Read(), will be counted as the total time, including the time the thread spends being descheduled and waiting for the call to complete. For now, you will have to recognize and judge situations like this manually.
  • No debug information is available for the profiled assemblies. It means that you cannot debug the profiled versions - but you would probably not like to do that, anyway.
  • Only defined methods are currently profiled, not referenced ones. So all the methods that your application itself defines will be profiled, but not System.* etc.
  • Windows Phone 7 has several limitations:
    • Constructors are not timed due to a WP7 IL-verifier problem
    • The profiler cannot at present take an automatic snapshot when the WP7 app closes, so you must take snapshots manually during the execution.
    • WP7 is very new technology; the APIs may change and potentially break the profiler. We'll strive to stay ahead and appreciate any WP7-feedback from you all.
  • Silverlight apps are only profiled with a resolution of about 16ms. The granularity is an inherent limitation of Silverlight-timers.

Troubleshooting

Sometimes things goes wrong. You can tweak the troubleshooting features available in the profiler in Tools > Profiler troubleshooting:

Troubleshooting the EQATEC Profiler

If something goes wrong then we suggest you do the following:

  1. If you have a build/run problem then you can enable "trace-level" of the printed output, which will tell you more about what's happening.
  2. If you get an exception during build/run then the complete stack-trace can be enabled, which may give further clues.
  3. If your running, profiled app misbehaves then examine the generated profiler.log logfile. By default it lives in C:\Windows\Temp\EQATECProfilerLogs and log only errors, so it is usually empty.
  4. If (really rare) you get no error-output in the logfile then try directing all logging to System.Diagnostics.Debug as well, which will be displayed e.g. in Visual Studio.

Also, don't hesitate to use our forum. It is often (depending on the problem) useful for us if you include the appropriate build/run output, possibly with trace-level enabled, and also include the stack-trace in reported exceptions. We sometimes get really important clues just looking at that output.

 
Topic is closed for comments