Nsight Compute CLI

NVIDIA Nsight Compute Command Line Interface user manual.

1. Introduction

NVIDIA Nsight Compute CLI (nv-nsight-cu-cli) provides a non-interactive way to profile application from the command line. It can print the results directly on the command line or store them in a report file. It can also be used to simply launch the target application (see General for details) and later attach with NVIDIA Nsight Compute or another nv-nsight-cu-cli instance.

2. Quickstart

  1. Launch the target application with the command line profiler
    The command line profiler launches the target application, instruments the target API, and collects profile results for the specified kernels. To collect the default set of data for all kernel launches in the target application, launch:
    $ nv-nsight-cu-cli CuVectorAddMulti.exe
    The application runs in instrumented mode and for each kernel launch a profile result is created. The results are written by default to profile.nsight-cuprof. Each output from the compute profiler starts with ==PROF== The other lines are output from the application itself.
    [Vector addition of 1144477 elements]
    Copy input data from the host memory to the CUDA device
    CUDA kernel launch A with 4471 blocks of 256 threads
    ==PROF== Profiling -    1
    CUDA kernel launch B with 4471 blocks of 256 threads
    ==PROF== Profiling -    2
    Copy output data from the CUDA device to the host memory
    Done
    ==PROF== Report: profile.nsight-cuprof
                

    Additional options are available to specify which kernels should be collected. -c limits the number of kernel launches collected. -s skips the given number of kernels before data collection starts. -k allows to filter the kernels by a regex match of their names. -o specifies the output filename for the report file. --kernel-id allows to filter kernels by context, stream, name and invocation, similar to nvprof. To limit what should be collected for each kernel launch, specify the exact *.section (files) by their identifier. See the usage options shown with --help for more details.

  2. Changing command line output

    Besides storing results in a report file, the command line profiler can print results using different pages. Those pages correspond to the respective pages in the UI’s report. By default, the Details page is printed, if no explicit output file is specified. To select a different page or print in addition to storing in an explicit file, use the --page=<Page> command. Currently, the following pages are supported: details, raw.

  3. Open the report in the UI

    The UI executable is called nv-nsight-cu. A shortcut with this name is located in the base directory of the NVIDIA Nsight Compute installation. The actual executable is located in the folder host\windows-desktop-win7-wgl-x64 on Windows or host/linux-desktop-glibc_2_11_3-glx-x64 on Linux. When executed a Qt UI Window should come up. Close the Connection dialog and open the report file through File > Open, by dragging the report file into NVIDIA Nsight Compute.

    The report opens in a new document window. For more information about the report see paragraph Profiler Report in the user guide for collecting profile information through NVIDIA Nsight Compute.

3. Usage

3.1. Modes

Modes change the fundamental behavior of the command line profiler. Depending on which mode is chosen, different Command Line Options become available. For example, Launch are invalid if the Attach mode is selected.

  • Launch-and-attach: The target application is launched on the local system with the tool's injection libraries. Depending on which profiling options are chosen, selected kernels in the application are profiled and the results printed to the console or stored in a report file. The tool exits once the target application finishes or crashes and once all results are processed.

    This is the default mode.

  • Launch: The target application is launched on the local system with the tool's injection libraries. As soon as the first intercepted API call is reached (commonly cuInit()), all application threads are suspended. The application now expects a tool to attach for profiling. You can attach using NVIDIA Nsight Compute or using the command line profiler's Attach mode.

  • Attach: The tool tries to connect to a target application launched previously using NVIDIA Nsight Compute or using the command line profiler's Launch mode. The tool can attach to a target on the local system or using a remote connection. You can choose to connect to a specific process by its process ID or to the first attachable process on the specified system.

3.2. Output Pages

The command line profiler supports printing results to the console using various pages. Each page has an equivalent in NVIDIA Nsight Compute's Profiler Report. In the command line profiler, they are slightly adapted to fit console output. To select a page, use the --page option. By default, the details page is used. Note that if --page is not used but --export is, no results will be printed to the console.

  • Details: This page represents NVIDIA Nsight Compute's Details page. For each profiled kernel launch, it prints each collected section as a three-column table, followed by any rule results applied to this section. Rule results not associated with any section are printed after the kernel’s sections.

    The first section table column shows the metric name. If the metric was given a label in the section, it is used instead. The second column shows the metric unit, if available. The third column shows the unit value. Both metric unit and value are automatically adjusted to the most fitting order of magnitude. By default, only metrics defined in section headers are shown. This can be changed by passing the ­‑‑details‑all option on the command line.

    Some metrics will show multiple values, separated by ";", e.g. memory_l2_transactions_global Kbytes 240; 240; 240; 240; 240. Those are instanced metrics, which have one value per represented instance. An instance can be a streaming multiprocessor, an assembly source line, etc.

  • Raw: This page represents NVIDIA Nsight Compute's Raw page. For each profiled kernel launch, it prints all collected metrics as a three-column table. Besides metrics from sections, this includes automatically collected metrics such as device attributes and kernel launch information.

    The first column shows the metric name. The second and third columns show the metric unit and value, respectively. Both metric unit and value are automatically adjusted to the most fitting order of magnitude. No unresolved regex: or group: metrics are included.

3.3. Profile Import

Using the --import option, saved reports can be imported into the command line profiler. When using this flag, most other options except Output are not available.

4. Command Line Options

4.1. General

Table 1. General Command Line Options
Option Description Default
h,help Show help message  
v,version Show version information  
mode Select the mode of interaction with the target application
  • launch-and-attach: Launch the target application and immediately attach for profiling.
  • launch: Launch the target application and suspend in the first intercepted API call, wait for tool to attach.
  • attach: Attach to a previously launched application to which no other tool is attached.
launch-and-attach

4.2. Launch

Table 2. Launch Command Line Options
Option Description Default
injection-path-64 Override the default path for the injection libraries. The injection libraries are used by the tools to intercept relevant APIs (like CUDA or NVTX).  

4.3. Attach

Table 3. Attach Command Line Options
Option Description Default
hostname Set the hostname or IP address for connecting to the machine on which the target application is running. When attaching to a local target application, use localhost. localhost
list-processes List all attachable processes on the target system.  

4.4. Profile

Table 4. Profile Command Line Options
Option Description Default
k,kernel-regex Set the regular expression to use when matching kernel names. If the kernel name does not match the expression, it will be ignored for profiling. If this option is not given, no filtering on kernel names will occur. localhost
kernel-regex-base Set the basis for --kernel-regex. Options are:
  • function: Function name without parameters, templates etc.
  • demangled: Demangled function name, including parameters, templates, etc.
  • mangled: Mangled function name.
 
kernel-id Set the identifier to use for matching kernels. If the kernel does not match the identifier, it will be ignored for profiling. The identifier must be of the following format: <context>:<stream>:<kernel>:<invocation> where <context> is the CUDA context ID or NVTX name <stream> is the CUDA stream ID or NVTX name <kernel> is the kernel name <invocation> is the N’th invocation of this kernel function.  
launch-skip-before-match Set the number of profile launches to skip before starting to profile. The count is incremented for all launches. 0
s,launch-skip Set the number of kernel launches to skip before starting to profile kernels. 0
c,launch-count Set the number of kernel launches to profile.  
section-folder Add a non-recursive search path for .section files. Section files in this folder will be made available to the --section option. If no --section-folder options are given, the target-specific ProfileSectionTemplates folder is added by default.
section-folder-recursive Add a recursive search path for .section files. Section files in this folder and all folders below will be made available to the --section option. If no --section-folder options are given, the target-specific ProfileSectionTemplates folder is added by default.
section Add a section identifier to collect. If no --section options are given, all section files in the target-specific ProfileSectionTemplates folder are collected.
list-sections List all sections found in the searched section folders and exit.  
query-metrics List all the metrics for the devices on system. This can be controlled by --devices and --chips options.  
chips Specify the chips, separated by comma to list their metrics.  
list-metrics List all metrics collected from active sections. If the list of active sections is restricted using the --section option, only metrics from those sections will be listed.  
list-rules List all rules found in the searched section folders and exit.  
apply-rules Apply all active and applicable rules to each profiling result.  
rule Add a rule identifier to apply. Implies --apply-rules. If no --rule options are given, all applicable rules in the target-specific ProfileSectionTemplates folder are applied.
devices List the GPU devices to enable profiling on, separated by comma.  
metrics List metrics to be profiled, separated by comma. If no --section options are given, only the temporary section containing all metrics listed using this option is collected. If --section options are given in addition to --metrics, all metrics from those sections and from --metrics are collected.  
nvtx Enable NVTX support for tools.  
profile-from-start Set whether the application should be profiling from the beginning. Allowed values:
  • on/off
  • yes/no
  • true/false
  • 1/0
yes
disable-profiler-start-stop Disable profiler start/stop. When enabled, cu(da)ProfilerStart/Stop API calls are ignored.  

4.5. Output

Table 5. Output Command Line Options
Option Description Default
csv Use comma-separated values as console output. localhost
i,import Set the input file for reading the profile results  
o,export Specify an output file for storing the profile report. If --export is set and no --page option is given, no profile results will be printed on the console.
i,import Set the input file for reading the profile results  
f,force-overwrite Force overwriting all output files. By default, the profiler won't overwrite existing output files and show an error instead.
page Select the report page to print console output for. Available pages are:
  • details Show results grouped as sections, include rule results. Some metrics that are collected by default (e.g. device attributes) are omitted if not specified explicitly in any section or using --metrics.
  • raw Show all collected metrics by kernel launch.
details If no --page option is given and --export is set, no results are printed to the console output.

Notices

Notice

ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, "MATERIALS") ARE BEING PROVIDED "AS IS." NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.

Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.

Trademarks

NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.