[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

Commands Reference, Volume 5


tprof Command

Purpose

Reports CPU usage.

Syntax

tprof Program

tprof [ -m ] [ -v ] { -k [ -e ] | -s | -j Java Class | -p Program | -t Process_ID | [ -x Command ] | [ -i Trace_File  [ -n Gennames_File ] [ -C { all | List } ] ]}

Notes:

Description

The tprof command reports CPU usage for individual programs and the system as a whole. This command is a useful tool for anyone with a JAVA, C, C++, or FORTRAN program that might be CPU-bound and who wants to know which sections of the program are most heavily using the CPU. The tprof command also reports the fraction of time the CPU is idle. These reports can be useful in determining CPU usage in a global sense.

The tprof command specifies the user program to be profiled, executes the user program, and then produces a set of files containing reports.

In the operating system, an interrupt occurs periodically to allow a "housekeeping" kernel routine to run. This housekeeping occurs 100 times per second. When the tprof command is invoked, the housekeeping kernel routine records the process ID and the address of the instruction executing when the interrupt occurred. With both the instruction address and process ID, the tprof analysis routines can charge CPU time to processes and threads, to subprograms, and even to source lines of programs. Charging CPU time to source program lines is called microprofiling.

Note: Microprofiling can only be performed on code compiled with the -g flag. It is also necessary for the source code to reside in the working directory, which is the directory in which the tprof command is invoked. The proftest working directory is used in the examples for this command.

For subroutine-level profiling, the tprof command can be run without modifying the executable program. You do not have to recompile with special compiler flags or linker options. This means you can obtain a subroutine profile of any executable module that has already been built. However, as previously stated, recompilation is required to get a microprofile.

The files containing the reports are left in the working directory. All files created by the tprof command are prefixed by ___ (two underscores). Prefixing with underscores allows the tprof command user to easily identify all tprof-created files. It is recommended that a working directory be created specifically for microprofiling and that copies of executables and source files be placed (or linked) in that directory. In the examples, the working directory is proftest.

In its simplest form, the tprof command is issued as follows:

tprof -x program

When using the -x Command flag, one of the -s, -k, -j, or -p flags must be specified to get the specific profiling report. If none of these flags are specified, only the summary report is produced. Using the -t flag limits the report to information about the specified process, which is named in the Process_Id parameter.

The -i Trace_File flag allows for offline processing by tprof of trace data files created by the system trace command. The -n flag allows you to specify a Gennames_File to be used when processing an offline file. These flags are useful when it is necessary to post-process a trace file from a remote machine or perform the trace data collection at one time and post-process it at another time. It is very important that the gennames file is created immediately after the trace command has completed. The flags are also useful when system loading is high and trace hooks are being missed by tprof. The offline option relieves this problem. Trace hooks relevant to tprof must be collected by the trace command and are specified by the trace -J flag. The gennames command is then executed to collect mapping information for tprof. Once trace and gennames have executed, trcrpt -r must be executed on the trace logfile and redirected to another file. At this point an adjusted trace logfile and a Gennames_File is fed into tprof.

Example:

trace -af -T 10000000 -L 10000000 -o trace.out -j 234,106,10C,134,139,00A,465

(The program is either already running or has begun.)

trcoff

gennames > gennames.out

trcstop

trcrpt -r   trace.out > trace.rpt

Then invoke the tprof command with at least the -i and -n flags:

tprof -i trace.rpt -n gennames.out -s -k -e

The trace hook numbers above are defined as follows:

HKWD_TRACE_SYNC      0x000
HKWD_TRACE_TRCON     0x001
HKWD_TRACE_TRCOFF    0x002
HKWD_TRACE_HEADER    0x003
HKWD_TRACE_LWRAP     0x005
HKWD_TRACE_TWRAP     0x006
HKWD_KERN_PROF       0x234
HKWD_KERN_DISPATCH   0x106
HKWD_KERN_IDLE       0x10C
HKWD_SYSC_EXECVE     0x134
HKWD_SYSC_FORK       0x139
HKWD_TRACE_UTIL      0x00A
HKWD_SYSC_CRTHREAD   0x465

Note: Review the /usr/lpp/perfagent/README.perfagent.tools file for the latest information on changes to the performance analysis tools.

If multiple CPU trace files are created by the trace daemon, the base name of the file should be entered after the -i flag. The -C flag could be used to specify the CPUs to run the tprof command against and to generate per CPU tprof command reports. When the -i trc and -C all are specified together and there are 5 CPU trace files, the tprof command processes the trace files trc-0, trc-1, trc-2, trc-3, and trc-4.

Consider the following example:

tprof -i trc -n gennames_file -C all
tprof -i trc -n gennames_file -C  0,1,2,3,4

In the case where multiple CPU IDs are specified, an individual report file is created for each trace file and a summary file is produced. In conjunction with the -i and -C flags, the -n flag must be used. This allows you to specify a Gennames file to be used when processing offline files. This is important because Gennames files save program loading information that may change over time.

The following is an example of a 2 CPU system:

trace -C all -af -T 10000000 -L 10000000 -o trace.out -j 234,106,10C,134,139,00A,465

(The program is either already running or has begun.)

trcoff

gennames > gennames.out

trcstop

/* A trcrpt -r must be run on each trace file created including the anchor file */
trcrpt -r   trace.out > trace.rpt
trcrpt -r   trace.out-0 > trace.rpt-0
trcrpt -r   trace.out-1 > trace.rpt-1
 
tprof -C all -i trace.rpt -n gennames.out -ske

Reports

A summary report with the suffix .all is always produced. If no program is specified, the report is named __prof.all. This report contains an estimate of the amount of CPU time spent in each process that was executing while the tprof command was monitoring the system. If the sample program is specified, the report is named __sample.all. This report contains an estimate of the amount of CPU time spent in the sample program that was executing while the tprof command was monitoring the system. This report also contains an estimate of the amount of CPU time spent in each subprogram of the sample program. The summary report shows the amount of time the CPU was idle as well as the amount of time spent in the kernel. The tprof command reports CPU time in ticks, where 100 ticks equals 1 second.

Example 1 Report A shows an example summary report. The first section lists the CPU time estimates for each process. The second section of the report summarizes the first by combining the reports for multiple occurrences of each process into a single line.

A third section of the report is produced only if there are User ticks associated with the program being profiled. This section of the report notes the subprogram names and the number of ticks they collected. The source file from which the subprogram was compiled is also given as well as the length, in bytes, of the subprogram.

When microprofiling occurs, two more reports are produced only if there are User ticks associated with the program being microprofiled. The two reports are left as additional files in the working directory by the tprof command:

__t.routine_source
Profiled source-statement listing, one for each routine in the source file.

__h.source
Hottest source-statement listing, one for each source file.

Example 2 Reports A and B are the result of microprofiling an example program.

As previously stated, if microprofiling is desired, copies of the source files (usually suffixed with .c, .h, or .f) must be placed in the working directory. Additionally, executable files which need to be microprofiled, should be compiled with the -g flag.

When microprofiling occurs, there is one __t.routine_source file for each routine in each source file compiled with the -g flag. The source line number and number of ticks associated with that source line are to the left of the microprofile text. Count totals for each routine are at the end of the function. Counts should not be associated with a comment line, data declaration, or macro definition.

Note: tprof cannot report correct source line information if a .c file is included in another .c file.

The tprof command uses the following tools to create reports:

Shared Libraries

When you run executable modules built with shared libraries, the tprof command reports an estimate of the aggregate CPU time spent in shared code. When invoked with the -s flag, the tprof command resolves the time spent in shared libraries to the subprogram level.

It provides a report on CPU time spent in shared libraries, and resolves that time to the subprogram level for each shared library incurring ticks.

Note: The -v flag with the -m flag turns on verbose microprofiling which lists the files that cannot be microprofiled.

Using the information gathered in this report, it is possible to microprofile a single shared library subprogram to the source-statement level. If microprofiling of shared libraries is desired, a copy of the associated source file must be placed in the current working directory in which the tprof command is invoked. The object file needing to be microprofiled should be compiled with the -g flag and re-archived into the library it belongs to or compile the whole library with -g flag. The re-compiled or re-archived library should be placed in its original path.

Multiple Runs to Obtain Greater Accuracy

The degree to which CPU activity can be resolved is determined by the number of samples captured and the degree to which "hot spots" dominate. While a program with a few "hot spots" can be profiled with relatively few samples, less-frequently executed sections of the program will not be visible in the profiling reports unless more samples are captured. In cases where user programs run less than a minute, there may be insufficient resolution to have a high degree of confidence in the estimates.

A simple solution is to repeatedly execute the user program or script until you achieve the degree of resolution you need. The longer a program is run, the finer the degree of resolution of the profile. If you doubt the accuracy of a profile, run the tprof command several times and compare the resulting profiles.

In order to profile a program or script that is executed multiple times using different sets of arguments, you must invoke the tprof command using a script that executes the desired program accordingly. The following is an example of a script called lin5 that executes the lin program five times:

./lin; ./lin; ./lin; ./lin; ./lin   # execute program lin 5 times

The tprof command is then invoked with:

tprof -p ./lin -x ./lin5

Another example can be found in the script DOIT, which executes the lin program n times (where n is an argument to DOIT).

How tprof Capabilities Differ from prof and gprof

The most significant difference between these three commands is that the tprof command provides microprofiling, while the gprof and prof commands do not.

The prof and gprof commands are standard, supported profiling tools on many UNIX systems, including this operating system. Both the prof and gprof commands provide subprogram profiling and exact counts of the number of times every subprogram is called. The gprof command also provides a very useful "call graph" showing the number of times each subprogram was called by a specific parent and the number of times each subprogram called a child. The tprof command provides neither subprogram call counts nor call graph information.

Like the tprof command, both the prof and gprof commands obtain their CPU consumption estimates for each subprogram by sampling the address counter of the user program at the rate of 100 times per second.

If only subprogram-level profiling is desired, the prof and gprof commands require users to relink their programs using a special linker flag. In order to get subprogram call counts, both the prof and gprof commands require users to both recompile and relink their programs by using the appropriate compile or link flag (-p or -pg, respectively). This may prove to be an impediment to users who do not have the ability to recompile or relink applications that need to be profiled. The tprof command requires recompilation (with the -g flag) only if source statement profiling is required. The tprof command will profile subprograms in any C or FORTRAN executable without recompilation.

The CPU times of user programs relinked for the prof command are the same as the run times of the unprofiled programs. The CPU times of user programs recompiled and relinked for the gprof command are increased by as little as a few percent to a factor of three or more. The CPU times of user programs profiled under the tprof command are the same as the run times of the unprofiled programs.

Also, the tprof command provides a CPU usage summary of all processes active during the execution of the profiled user program, which neither the prof nor gprof command provides.

Notes: CPU activity is sampled at a rate of 100 samples per second. This means that estimates of CPU consumption for each subprogram may not be sufficiently accurate for short-running programs. However, the accuracy is sufficient for programs and processes that run several minutes or more. The system trace facility is used by the tprof command. Only one user can be using the system trace facility at a time. Thus, only one tprof command can be active in the system at a time. The tprof command is a CPU-activity profiler. It does not profile other system resources, such as disks or memory.

-C all | List Creates per CPU reports and a cumulative report if the trace daemon was also executed with the -C flag. Multi-CPU trace file processing is only available in offline mode. CPU numbers should be separated with a comma if you give a list (for example, 0,1,2).

Note: The -C flag must be given in conjunction with the -i and -n flag.
-e Profiles the kernel extension if kernel profiling ( -k ) is specified.

Note: If -m flag is used in conjunction with -e flag, then the microprofiling on kernel extension is performed.
-i Trace_File Input trace file for offline processing.

Note: The -i and -x flags may not be specified at the same time.
-k Profiles the kernel.
-m

Enables microprofiling.

Note: Java applications will not be microprofiled.

-j Java Class

Profiles the specified Java Class.

-n Gennames_File  Specifies a Gennames_ File to be used in conjunction with the -i flag

Note: Do not specify a Gennames_File with the -n flag without providing a Trace input file with the -i flag.
-p Program Profiles the user program.

Note: If -m flag is used in conjunction with -p flag, then the microprofiling on the user program is performed. The -j and -p flags are mutually exclusive.
-s Profiles shared libraries.

Note: If -m flag is used in conjunction with -s flag, then the microprofiling on shared library is performed.
-t Process_Id Profiles the process whose process id matches the specified Process_Id.
-v Specifies verbose mode. If microprofiling of shared libraries or kernel extensions is enabled, the -v flag produces a warning message if the executable / object file is not compiled with -g flag (or) if tprof is not able to read the executable / object file.
-x Command Allows the execution of an arbitrary Command. Profiling is activated if one of the following flags is specified: -s, -k or -p.

Note: If you do not enter a path with the specified command, then the -x flag searches your PATH environment variable to find the command since it uses the system call to run the command. To guarantee that the correct command runs, always specify the path to the command. When the program name (-p flag) and the command name (-x flag) are the same and do not have full or relative paths specified, then the tprof command automatically ads the path information by searching the environment PATH variable.

Examples

1. Profiling a C Program

Note: The following C program does nothing useful. It simply wastes CPU cycles. The program is used only for illustrative purposes.

/* Array Incrementer -- Version 1 */
#include <stdlib.h>
#define Asize 1024
#define RowDim InnerIndex
#define ColDim OuterIndex
main()
{
   int Increment;
   int OuterIndex;
   int InnerIndex;
   int big [Asize][Asize];
   /* Initialize every byte of the array to 0x01 */
   for (OuterIndex=0; OuterIndex<Asize; OuterIndex++)
   {
      for (InnerIndex=0; InnerIndex<Asize; InnerIndex++)
         big [RowDim][ColDim] = 0x01010101;
   }
   Increment = rand();
   /* Increment every element in the array */
   for (OuterIndex=0; OuterIndex<Asize; OuterIndex++)
   {
      for (InnerIndex=0; InnerIndex<Asize; InnerIndex++)
      {
         big [RowDim][ColDim] += Increment;
         if (big [RowDim][ColDim] <0)
            printf("negative number. %d\n", big[RowDim][Coldim]);
      }
   }
   printf("version 1 check num: %d\n", big[rand()%Asize][rand()%Asize]);
   return(0);
} 

In this example, version1.c is the source file and version1 will be the program. The source program, version1.c, already exists in a directory called proftest.

  1. Go to the subdirectory proftest.

    cd proftest
    
  2. Compile version1.c.

    xlc -o version1 version1.c
    
  3. Profile and run the program version1.

    tprof -p ./version1 -x ./version1
    
  4. Examine the profiling output report in the __version1.all file.

An example of the__version1.all file is found in Example 1 Report. Of course, the __version1.all file that you create is certain to be different from the example __version1.all reported shown here.

Example 1 Report A

          Process      PID      TID    Total   Kernel     User   Shared    Other
          =======      ===      ===    =====   ======     ====   ======    =====
       ./version1     4524     5313      287        8      278        1        0
          =======      ===      ===    =====   ======     ====   ======    =====
            Total                        287        8      278        1        0
 
          Process     FREQ    Total   Kernel     User   Shared    Other
          =======      ===    =====   ======     ====   ======    =====
       ./version1        1      287        8      278        1        0
          =======      ===    =====   ======     ====   ======    =====
            Total        1      287        8      278        1        0
 
   Total Ticks For ./version1 (USER) = 278
 
 Subroutine                Ticks  %   Source   Address Bytes
 =============             ===== ==== ======== ======== ======
 .main                       278 60.4 version1.c 10000340 218
 

Process names and process identification (PID) numbers can be reused during normal operation. When a process forks another process, the forked process inherits the process name of the original process. When a process execs another process, the execed process inherits the PID of the original process.

The first section of the report shows the number of ticks consumed by, or on behalf of, the specified process and its thread (if no process is specified, then this section shows all processes that is running during tprof monitoring). The columns give tick counts for Kernel Code, User Code, and Shared Code. The name of the program is version1, so the version1 process (with process ID number 4524) is the process being profiled. This process/program collected a total of 287 ticks: 8 ticks for services performed for it in the kernel's address space, 278 ticks in the user's address space, and 1 ticks in the shared library space. Other is a catchall category that normally is 0.

The second section of the report summarizes the results by program, regardless of process ID. It shows the number (FREQ) of different processes that ran each program at some point.

The third section of the report shows the number of ticks used by each function in the executable and the percentage of the total runs' CPU ticks that each function's ticks represent.

Note: If there are no ticks in the User column for the program being profiled, then the third section of the report is not produced.

Sampling occurs at the rate of 100 per second. Each tick corresponds to 1/100 sec or 10 msec. Thus, in Example 1 Report A, version1 is estimated to have consumed 2.87 seconds: 2.78 in user space, and 0.08 seconds in kernel (system) space.

2. Microprofiling a C Program

The example of profiling contains summary information at the subprogram level. In this example, we provide CPU-usage profiling at the microprofiling level. Additional information must be provided to the tprof command in the working directory in order to make the correspondences between execution-time estimates and source statements:

Profile:

tprof -mp ./version1 -x ./version1     # invoke the profiler, telling
                                       # it to execute and profile "version1"

The output file__version1.all should be nearly identical to that obtained from the previous steps. Of course, the report that summarizes all processes active during the execution of the version1 program will change. You will also find the __t.main_version1.c file, which contains profiled source for main in source, and the __h.version1.c file, which contains the "hottest" lines in the source.

Note: If there are no ticks in the User column in the __program.all file for the program being profiled, then the __t.routine_source and __h.source files are not produced.

Example 2 Reports A and B contain results from the microprofiling example shown above. Example 2 Report A shows the contents of the __t.main_version1.c file. Example 2 Report B shows the contents of the __h.version1.c file.

Example 2 Report A

Ticks Profile for main in ./version1.c
 
   Line   Ticks   Source
 
    13      1        for (OuterIndex=0; OuterIndex<Asize; OuterIndex++)
    14      -        {
    15     17           for (InnerIndex=0; InnerIndex<Asize; InnerIndex++)
    16    105              big [RowDim][ColDim] = 0x01010101;
    17      -        }
    18      -        Increment = rand();
    19      -        /* Increment every element in the array */
    20      -        for (OuterIndex=0; OuterIndex<Asize; OuterIndex++)
    21      -        {
    22     21           for (InnerIndex=0; InnerIndex<Asize; InnerIndex++)
    23      -           {
    24     30              big [RowDim][ColDim] += Increment;
    25    104              if (big [RowDim][ColDim] <0)
    26      -                 printf("negative number. %d\n", big[RowDim][ColDim
]);
    27      -           }
    28      -        }
    29      -        printf("version 1 check num: %d\n", big[rand()%Asize][rand(
)%Asize]);
    30      -        return(0);
    31      -     }
 
 278 Total Ticks for main in ./version1.c

Example 2 Report B

 Hot Line Profile for ./version1.c
 
 
   Line   Ticks
 
    16    105
    25    104
    24     30
    22     21
    15     17
    13      1

3. Profiling a FORTRAN Program with Multiple Source Files

This example uses two FORTRAN source files, linless.f and daxpy.f, which already exist in a directory called proftest.

  1. Go to the subdirectory proftest.

    cd proftest
    
  2. Compile lin2 as the program file using the two source files, linless.f and daxpy.f.

    xlf -o lin2 linless.f daxpy.f
    
  3. Profile and run the program lin2.

    tprof -p ./lin2 -x ./lin2
    
  4. Examine the profiling output report in the __lin2.all file.

4. Profiling a Shell Script

Sometimes you want to profile a script of sequential executions of programs and shell commands rather than a single program. In this example, the DOIT script runs a program n times, with both the program and n specified as arguments in the script.

# DOIT Shell Script
#!/usr/bin/ksh
i=$2
command=$1
while [ $i -gt 0 ]
do
let i=i-1
$command
done

For example, to run the program lin three times, enter:

DOIT ./lin 3

In this example, the DOIT script and the FORTRAN source file called lin.f, already exists in a directory called proftest.

  1. Go to the subdirectory proftest.

    cd proftest
    
  2. Compile lin.f as the executable file lin unoptimized with the -g flag.

    xlf -g -o lin lin.f
    
  3. Profile program lin using -p flag, and specify the name of the script and any script arguments to run using the -x flag.

    tprof -p ./lin -x DOIT ./lin 3
    

    This provides statement-level profiling

  4. Examine the profiling output report in the __lin.all file.

5. Profiling a Java Application

This example uses a Java Class named myclass. This class exists in the proftest directory, which is already included in the CLASSPATH.

Example Java Program myclass.java: 
public class myclass 
{
          public static void main(String args[]) {
                 int i;
                 for(i=0;i < 100;i++) {
                 }
                 System.out.print("Hello World\n");
         } 
}
 

  1. Go to the proftest subdirectory by entering:
    cd proftest
    
  2. Compile the java program by entering:
    javac myclass.java 
    
  3. Profile and run the java xlass by entering:
    tprof -j myclass -x java -Xrunjpa myclass
    
  4. Examine the profiling output report in the __myclass.all file.
    Note: The tprof java profiler agent always generates a trace file no matter whether you run the java application via tprof or independently (with -Xrunjpa). tprof uses this trace file to profile the java classes in both real time and offline mode. When starting the java application, make sure to specify the -Xrunjpa. For huge java applications, the applications should be started with a -noclassgc flag, which disables the garbage collection of classes. If this flag is not used, then tprof profiling information for java classes may not be accurate.

    An example of the generated __myclass.all is shown below:

             Process    PID    TID    Total  Kernel  User  Shared  Other  Java
             =======    ===    ===    =====  ======  ====  ======  =====  ====
                java  23852  18985      343      28     0     171      0   140
                java  23852  18985        1       1     0       0      0     0
             =======    ===    ===    =====  ======  ====  ======  =====  ====
               Total                    344      29     0     171      0   140
     
    Segment ::  2  21
     
             Process   FREQ  Total   Kernel   User   Shared    Other   Java
             =======   ====  =====   ======   ====   ======    =====   ====
                java      2    344       29      0      171        0    140
             =======   ====  =====   ======   ====   ======    =====   ====
               Total      2    344       29      0      171        0    140
     
    Total System Ticks: 649 (used to calculate function level CPU)
     
    Total JAVA = 144 (ticks accumulated in Java Segment)
     
      Total ticks for myclass (JAVA) = 140
     
    Class Name                        Ticks  %   Source           Class ID
    ==========                        ===== ==== ======           ========
    myclass                             140 21.6 myclass.java     3008a750
     
       Profile: myclass ( myclass.java )
     
    Method Name                       Ticks  %    Method ID  Load Addr       Size
    ===========                       ===== ====  =========  =========       ====
    main [([Ljava/lng/String;)V]        140 21.6   34b09edc   345b2a2c        32c
    
Note: All the method names are displayed with their signatures in brackets ([]).

Messages

If your system displays the following message:

/dev/systrace: device busy or trcon: TRCON:no such device

This means the trace facility is already in use. Stop your program and try again after typing trcstop, stops the trace.

Note: This command is valid on both the POWER-based platform and the Itanium-based platform. And it can read both types of tracefiles, 64-bit and 32-bit, regardless of the OS (32-bit or 64-bit).

Related Information

The cc command, gprof command, prof command, stripnm command, and trace daemon.


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]