[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]

Resource Monitoring and Control Guide and Reference


Using the Monitoring Application

This chapter describes planning for monitoring your system, tracking system events, and using and modifying the predefined scripts, expressions, commands, and responses packaged with this application. These predefined elements and how to use them are described in detail in Components Provided for Monitoring.


Planning What to Monitor in Your System

First, select conditions to monitor that would have a severe impact on your system. These conditions might include:

When you have determined the resource problems you want to monitor, review the predefined conditions and identify the conditions you want to use. They can be displayed in the Monitoring application in the Conditions plug-in. See Getting Started with the Monitoring Application and Components Provided for Monitoring for more information. Use the lscondition command to view all conditions from the command line.

If a predefined condition deviates from your requirements in some way, you can edit it, use it as a template to create your own customized condition, or create your own condition.

Once you have selected conditions for monitoring, you need to plan one or more responses to be taken for the event and the optional rearm event.


Planning How to Respond to Detected Conditions

A set of predefined responses comes installed with your system (see Predefined Responses). Each response has one or more actions associated with it. Each action can be activated or deactivated to fit your particular work environment and schedule.

The Responses plug-in's Response Properties dialog has an Add Action option where you can choose predefined actions for responding to an event or rearm event. You can also specify a command or a script to be run as an action.

The predefined actions are:

You can also write your own commands to correct or mitigate conditions using the Run program option.

You might specify different actions based on when the monitored condition occurs. For example, you could have one set of actions to respond to a condition during working hours and another set to respond to a condition on nights and weekends. To be notified of events when you are away from your terminal, your actions must include e-mail, broadcasting, or logging. You can also view events in the Events plug-in.


Getting Started with the Monitoring Application

This section describes how to start using the Monitoring application. You can use the Web-based System Manager or the command line to do the following:

Note:
See the Monitoring online help for detailed task information.

How to Start Monitoring Your System

The following scenario demonstrates how to start monitoring your system using the Web-based System Manager user interface. Once you are familiar with the procedure, you can create customized conditions and responses, and take advantage of the more advanced Monitoring features.

  1. Start the Web-based System Manager graphical user interface by entering wsm on the command line.
  2. In the Navigation Area, select the host machine you want to monitor.
  3. In the Contents Area, double-click the Monitoring icon. The contents area of the Monitoring application displays the following plug-ins:
  4. To view predefined conditions and select a condition for monitoring, double-click the Conditions icon.
  5. Click the Details toolbar icon to show each predefined condition in detail.
  6. Select a condition and click the Start Monitoring toolbar icon.
  7. In the Edit Responses to Start Monitoring dialog, select a response from Other available responses.
  8. Click the left arrow.
  9. Click OK.
  10. A message displays that monitoring has been started. Click OK.

How to Associate a Response with a Condition

To associate a response with a condition using the Web-based System Manager, do the following:

  1. Select the condition, click the Properties tool bar icon, and click Responses to Condition...
  2. In the Edit Responses to Condition dialog, select the response you want to use from Other available responses.
  3. Click the left arrow. Monitoring will start if any of the responses are checked.
  4. Click OK.

To associate a response with a condition from the command line, use the following commands (see the man pages or the AIX Commands Reference for detailed usage information):

  1. Use the lsresponse command to list all responses.
  2. Use the mkcondresp command to associate a condition with a response without starting monitoring. Use the startcondresp command to associate a condition with a response and begin monitoring immediately.

How to View Events

To view events in the Web-based System Manager, select the Events plug-in. You can also view and sort events using the Audit Log.

To view events from the command line, use the lsaudrec command to view the audit log. You can use the notifyevents predefined script to log events to a file.

How to Stop Monitoring

To stop monitoring from Web-based System Manager, select the condition and click the Stop Monitoring toolbar icon.

To stop monitoring from the command line, use the stopcondresp command.

How to Monitor Your System Using the Command Line Interface

The following scenarios demonstrate most frequently performed monitoring tasks from the command line interface. See the AIX Commands Reference at http://www.ibm.com/servers/aix/library or the command man pages for detailed usage information.

  1. To list the conditions in your system, enter lscondition. Output is similar to:
    Name                           Monitoring Status
    "/tmp space used"              "Not monitored"
    "var space used"               "Monitored" 
    (more conditions listed...) 
    
  2. To list the responses available in the system, enter lsresponse. Output is similar to:
    Name
    "Critical notification"
    "Warning notification"
    "Informational notification"
    "Remove unwanted files"
    (more responses listed...) 
    
  3. To list responses associated with a condition, use the lscondresp command. For example, to list the responses associated with the condition "/tmp space used", enter lscondresp "/tmp space used". Output is similar to:
    Condition            Response                     State
    "/tmp space used"    "Broadcast event on-shift"   Active
    "/tmp space used"    "E-mail root anytime"        Not Active
     
    
  4. To start monitoring a condition, one or more responses need to be specified for the condition. For example, to start monitoring the condition "/tmp space used" using the response "critical notification" and "remove unwanted files," enter:
    startcondresp "/tmp space used" "critical notification" "remove unwanted files"
    
  5. You can either stop monitoring a condition completely, or stop monitoring a condition with specific responses. To stop monitoring the condition "/tmp space used" completely, enter:
    stopcondresp "/tmp space used"
    

    To stop monitoring the condition "/tmp space used" with a specific response, "critical notification," enter:

    stopcondresp "/tmp space used" "critical notification"
    
  6. You can copy a condition to use as a template for a new condition. For example, to create a new condition "my test condition" from an existing condition "/tmp space used," enter:
    mkcondition -c "/tmp space used" "my test condition"
    
  7. To view events or the actions taken in response to the events, enter:
    lsaudrec
    

For a complete list of predefined commands, scripts, and utilities, see Predefined Commands, Scripts, Utilities, and Files.


Tracking Monitoring Activity

For viewing information about monitoring events, rearm events, actions, and errors that have occurred, see the following table:


Available Information How to Find It
The Monitoring application's Events plug-in lets you view a list of all the events, rearm events, and error events that have occurred during the current Web-based System Manager session. To view events for your current session:
  1. In Web-based System Manager, select Monitoring.
  2. In the Events plug-in, select the All Events toolbar icon.
You can view a list of current events, which are the most recent events that are currently true for their respective monitored conditions. Note: In the current events view, you only see the latest event that is true for each monitored condition. To view current events for your current session:
  1. In Web-based System Manager, select Monitoring.
  2. In the Events plug-in, select the Current Events toolbar icon.
If you have specified a notification mechanism such as logging as an action for an event, the log file will receive an entry each time that action is taken.

Entries are logged during the entire period in which the condition is monitored, whether or not Web-based System Manager is running. You can browse the log file without running Web-based System Manager.

To view your log file use the alog command.
The Web-based System Manager Session Log contains all Monitoring messages issued during the current Web-based System Manager session that did not require a user response. To view the Session Log:
  1. In Web-based System Manager, select Console from the menu bar.
  2. Select Session Log.
The audit log is a system-wide facility for recording information about the operation of the system. It can include information about normal operation as well as errors.

Logging activity occurs independent of Web-based System Manager and continues whether or not a session is active.

To view the Audit Log in Web-based System Manager,
  1. In Web-based System Manager, select Monitoring.
  2. In the Events plug-in, select the Audit Log toolbar icon.

To view the Audit Log from a command line, issue lsaudrec.


Using the Audit Log to Track Monitoring Activity

Audit log records include the following:

The administrator can use the audit log to track activity that may not be visible otherwise because the activity is related to subsystems running in the background. The audit log is accessible from Web-based System Manager or the command line.

To list audit log records, use the Audit Log toolbar icon in the Events plug-in or the lsaudrec command. To remove records use the Audit Log toolbar icon in theEvents plug-in or the rmaudrec command. For details see the Monitoring application online help or the command man pages. Commands are also documented in the AIX Commands Reference at http://www.ibm.com/servers/aix/library.


Writing Your Own Scripts

You can write your own scripts to use as actions for responses. The AIX Commands Reference contains information about predefined scripts that are provided with the Event Response resource manager. The following scripts are provided: logevent, notifyevent, and wallevent (You can also use existing operating system commands and user-written scripts in the definition of an action.)

Using Predefined Response Scripts

The logevent, notifyevent, and wallevent scripts are examples of the types of actions that system administrators can use to respond to events. The logevent script appends a formatted string containing the specifics of an event to a user-specified file. Only the latest 65536 bytes are kept in the file. When the file size reaches its maximum, the oldest logged event is overwritten by the newest event. The alog command is used to read the user-specified log file. The notifyevent script captures the event information and sends the event information via UNIX mail to a specified userid. The wallevent script broadcasts a message to all users who are logged in.

For a full description of these scripts, see the man pages or the AIX Commands Reference.

You can use these scripts as-is or treat them as templates by copying and modifying them to create new scripts that suit your needs. For example, to use the wallevent script as a template for a page event command, do the following:

  1. Copy the wallevent script at /usr/sbin/rsct/bin/wallevent to a new script file and rename it, for example, to pageevent.
  2. Replace the wall command with the program for your pager.
Note:
These predefined scripts are in the /usr/sbin/rsct/bin directory. Because the Monitoring application relies on these scripts, do not modify the original scripts.

For a command to run in response to an event or a rearm event defined by a condition, the command must be included as an action in an Event Response resource. When an Event Response resource is defined, specify the entire path name for a script that is used within an action.

This is set up implicitly for you when you use the Monitoring application as follows:

  1. In the Responses plug-in, select a response and click the Properties toolbar icon.
  2. In the Response Properties dialog, click the Add button to display the Add Action dialog.
  3. Select the Run program option.
  4. Enter the fully qualified name of the script.

Test any scripts or commands that you have created or modified before you use them as actions in production.

Using Event Response Environment Variables

Once the Event Response resource manager (ERRM) has subscribed to RMC to monitor a condition and that condition occurs, the ERRM executes commands in the user's operating system environment. The Event Response resource contains a list of commands to be executed. Before each command is run, the following environment variables are established for the command to use (see Event Response Resource Manager for a detailed description of the ERRM):

(See "Resource Handle" on page *** for a definition and an example of a resource handle.)


Using Expressions

The information in this section is for advanced users who want to:

Permissible data types and operators are described and the order of precedence for the operators is included. RMC uses these functions to match a selection string against the persistent attributes of a resource and to implement the evaluation of an event expression or a rearm expression.

An expression is similar to a C language statement or the WHERE clause of a SQL query. It is composed of variables, operators, and constants. The C and SQL syntax styles may be intermixed within a single expression. The following table relates the SQL terminology to RMC terminology:

RMC SQL
attribute name column name
select string WHERE clause
operators predicates, logical connectives
resource class table

SQL Restrictions

For SQL syntax, the following restrictions apply:

Supported Base Data Types

The term variable is used in this context to mean the column name or attribute name in an expression. Variables and constants in an expression may be one of the following data types that are supported by the RMC subsystem:

Symbolic Name Description
CT_INT32 Signed 32-bit integer
CT_UINT32 Unsigned 32-bit integer
CT_INT64 Signed 64-bit integer
CT_UINT64 Unsigned 64-bit integer
CT_FLOAT32 32-bit floating point
CT_FLOAT64 64-bit floating point
CT_CHAR_PTR Null-terminated string
CT_BINARY_PTR Binary data - arbitrary-length block of data
CT_RSRC_HANDLE_PTR Resource handle - an identifier for a resource that is unique over space and time (20 bytes)

Structured Data Types

In addition to the base data types, aggregates of the base data types may be used as well. The first aggregate data type is similar to a structure in C in that it can contain multiple fields of different data types. This aggregate data type is referred to as structured data (SD). The individual fields in the structured data are referred to as structured data elements or simply elements. Each element of a structured data type may have a different data type, which can be one of the base types in the preceding table or any of the array types discussed in the next section, except for the structured data array.

The second aggregate data type is an array. An array contains zero or more values of the same data type, such as an array of CT_INT32 values. Each of the array types has an associated enumeration value (CT_INT32_ARRAY, CT_UINT32_ARRAY). Structured data may also be defined as an array but is restricted to have the same elements in every entry of the array.

Data Types That Can Be Used for Literal Values

Literal values can be specified for each of the base data types as follows:

Array
An array or list of values may be specified by enclosing variables or literal values, or both, within braces {} or parentheses () and separating each element of the list with a comma. For example: { 1, 2, 3, 4, 5 } or ( "abc", "def", "ghi" )

Entries of an array can be accessed by specifying a subscript as in the C programming language. The index corresponding to the first element of the array is always zero; for example, List [2] references the third element of the array named List. Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. For example, if List is an integer array, then List[2]+4 produces the sum of 4 and the current value of the third entry of the array.

Binary Data
A binary constant is defined by a sequence of hexadecimal values, separated by white space. All hexadecimal values comprising the binary data constant are enclosed in double quotation marks. Each hexadecimal value includes an even number of hexadecimal digits, and each pair of hexadecimal digits represents a byte within the binary value. For example:
"0xabcd 0x01020304050607090a0b0c0d0e0f1011121314"
 

Character Strings
A string is specified by a sequence of characters surrounded by single or double quotation marks (you can have any number of characters, including none). Any character may be used within the string except the null '\0' character. Double quotation marks and backslashes may be included in strings by preceding them with the backslash character.

Floating Types
These types can be specified by the following syntax:

Integer Types
These types can be specified in decimal, octal, or hexadecimal format. Any value that begins with the digits 1-9 and is followed by zero or more decimal digits (0-9) is interpreted as a decimal value. A decimal value is negated by preceding it with the character '-'. Octal constants are specified by the digit 0 followed by 1 or more digits in the range 0-7. Hexadecimal constants are specified by a leading 0 followed by the letter x (uppercase or lowercase) and then followed by a sequence of one or more digits in the range 0-9 or characters in the range a-f (uppercase or lowercase).

Resource Handle
A fixed-size entity that consists of two 16-bit and four 32-bit words of data. A literal resource handle is specified by a group of six hexadecimal integers. The first two values represent 16-bit integers and the remaining four each represent a 32-bit word. Each of the six integers is separated by white space. The group is surrounded by double quotation marks. The following is an example of a resource handle:
"0x4018 0x0001 0x00000000 0x0069684c 0x00519686 0xaf7060fc"
 

Structured Data
Structured data values can be referenced only through variables. Nevertheless, the RMC command-line interface displays structured data (SD) values and accepts them as input when a resource is defined or changed. A literal SD is a sequence of literal values, as defined in Data Types That Can Be Used for Literal Values, that are separated by commas and enclosed in square brackets. For example, ['abc',1,{3,4,5}] specifies an SD that consists of three elements: (a) the string 'abc', (b) the integer value 1, and (c) the three-element array {3,4,5}.

Variable names refer to values that are not part of the expression but are accessed during the execution of the expression. For example, when RMC processes an expression, the variable names are replaced by the corresponding persistent or dynamic attributes of each resource.

Entries of an array may be accessed by specifying a subscript as in 'C'. The index corresponding to the first element of the array is always 0 (for example, List[2] refers to the third element of the array named List). Only one subscript is allowed. It may be a variable, a constant, or an expression that produces an integer result. A subscripted value may be used wherever the base data type of the array is used. For example, if List is an integer array, then "List[2]+4" produces the sum of 4 and the current value of the third entry of the array.

The elements of a structured data value can be accessed by using the following syntax:

<variable name>.<element name>

For example, a.b

The variable name is the name of the table column or resource attribute, and the element name is the name of the element within the structured data value. Either or both names may be followed by a subscript if the name is an array. For example, a[10].b refers to the element named b of the 11th entry of the structured data array called a. Similarly, a[10].b[3] refers to the fourth element of the array that is an element called b within the same structured data array entry a[10].

How Variable Names Are Handled

Variable names refer to values that are not part of an expression but are accessed during the execution of the expression. When used to select a resource, the variable name is a persistent attribute. When used to generate an event, the variable name is a dynamic attribute. When used to select audit records, the variable name is the name of a field within the audit record.

A variable name is restricted to include only 7-bit ASCII characters that are alphanumeric (a-z, A-Z, 0-9) or the underscore character (_). The name must begin with an alphabetic character. When the expression is used by the RMC subsystem for an event or a rearm event, the name can have a suffix that is the '@' character followed by 'P', which refers to the previous observation.

Operators That Can Be Used in Expressions

Constants and variables may be combined by an operator to produce a result that in turn may be used with another operator. The resulting data type or the expression must be a scalar integer or floating-point value. If the result is zero, the expression is considered to be FALSE; otherwise, it is TRUE.

Note:
Blanks are optional around operators and operands unless their omission causes an ambiguity. An ambiguity typically occurs only with the word form of operator (that is, AND, OR, IN, LIKE, etc). With these operators, a blank or separator, such as a parenthesis or bracket, is required to distinguish the word operator from an operand. For example, aANDb is ambiguous. Is this intended to be the variable name aANDb or is it intended to be the variable names a, b combined with the operator AND? It is actually interpreted by the application as a single variable name aANDb. With non-word operators (for example, +, -, =, &&, etc.) this ambiguity does not exist, and therefore, blanks are optional.

The set of operators that can be used in strings is summarized in the following table:

Operator Description Left Data Types Right Data Types Example Notes
+ Addition Integer floats Integer floats "1+2" results in 3 None
- Subtraction Integer floats Integer floats "1.0-2.0" results in -1.0 None
* Multiplication Integer floats Integer floats "2*3" results in 6 None
/ Division Integer floats Integer floats "2/3" results in 1 None
- Unary minus None Integer floats "-abc" None
+ Unary plus None Integer floats "+abc" None
.. Range Integers Integers "1..3" results in 1,2,3 Shorthand for all integers between and including the two values
% Modulo Integers Integers "10%2" results in 0 None
| Bitwise OR Integers Integers "2|4" results in 6 None
& Bitwise AND Integers Integers "3&2" results in 2 None
~ Bitwise complement None Integers ~0x0000ffff results in 0xffff0000 None
^ Exclusive OR Integers Integers 0x0000aaaa^0x0000ffff results in 0x00005555 None
>> Right shift Integers Integers 0x0fff>>4 results in 0x00ff None
<< Left shift Integers Integers "0x0ffff<<4" results in 0xffff0 None
==


=

Equality All but SDs All but SDs "2==2" results in 1


"2=2" results in 1

Result is true (1) or false (0)
!=


<>

Inequality All but SDs All but SDs "2!=2" results in 0


"2<>2" results in 0

Result is true (1) or false (0)
> Greater than Integer floats Integer floats "2>3" results in 0 Result is true (1) or false (0)
>= Greater than or equal Integer floats Integer floats "4>=3"=1 Result is true (1) or false (0)
< Less than Integer floats Integer floats "4<3" results in 0 Result is true (1) or false (0)
<= Less than or equal Integer floats Integer floats "2<=3" results in 1 Result is true (1) or false (0)
=~ Pattern match Strings Strings "abc"="~a.*" results in 1 Right operand is interpreted as an extended regular expression
!~ Not pattern match Strings Strings "abc"!~"a.*" results in 0 Right operand is interpreted as an extended regular expression
=?


LIKE


like

SQL pattern match Strings Strings "abc"=? "a%" results in 1 Right operand is interpreted as a SQL pattern
!?


NOT LIKE


not like

Not SQL pattern match Strings Strings "abc"!? "a%" results in 0 Right operand is interpreted as a SQL pattern
|<


IN


in

Contains any All but SDs All but SDs "{1..5}|<{2,10}" results in 1 Result is true (1) if left operand contains any value from right operand
><


NOT IN


not in

Contains none All but SDs All but SDs "{1..5}><{2,10}" results in 1 Result is true (1) if left operand contains no value from right operand
&< Contains all All but SDs All but SDs "{1..5}&<{2,10}" results in 0 Result is true (1) if left operand contains all values from right operand
||


OR


or

Logical OR Integers Integers "(1<2)||(2>4)" results in 1 Result is true (1) or false (0)
&&


AND


and

Logical AND Integers Integers "(1<2)&&(2>4)" results in 0 Result is true (1) or false (0)
!


NOT


not

Logical NOT None Integers "!(2==4)" results in 1 Result is true (1) or false (0)

When integers of different signs or size are operands of an operator, standard C style casting is implicitly performed. When an expression with multiple operators is evaluated, the operations are performed in the order defined by the precedence of the operator. The default precedence can be overridden by enclosing the portion or portions of the expression to be evaluated first in parentheses (). For example, in the expression "1+2*3", multiplication is normally performed before addition to produce a result of 7. To evaluate the addition operator first, use parentheses as follows: "(1+2)*3". This produces a result of 9. The default precedence rules are shown in the following table. All operators in the same table cell have the same or equal precedence.

Operators Description
. Structured data element separator
~ Bitwise complement
!
NOT
not
Logical not


- Unary minus
+ Unary plus
* Multiplication
/ Division
% Modulo
+ Addition
- Subtraction
- Subtraction
<< Left shift
>> Right shift
< Less than
<= Less than or equal
> Greater than
>= Greater than or equal
== Equality
!= Inequality
=?
LIKE
like
SQL match
!? SQL not match
=~ Reg expr match
!~ Reg expr not match
?= Reg expr match (compat)
|<
IN
in
Contains any
><
NOT IN
not in
Contains none
&< Contains all
& Bitwise AND
^ Bitwise exclusive OR
| Bitwise inclusive OR
&& Logical AND
|| Logical OR
, List separator

Pattern Matching

Two types of pattern matching are supported; extended regular expressions and that which is compatible with the standard SQL LIKE predicate. This type of pattern may include the following special characters:

Examples of Expressions

Some examples of the types of expressions that can be constructed follow:

  1. The following expressions match all rows or resources that have a name which begins with 'tr' and ends with '0', where 'Name" indicates the column or attribute that is to be used in the evaluation:
    Name =~'tr.*0'
     
     
     
    
    Name LIKE 'tr%0'
    
  2. The following expressions evaluated to TRUE for all rows or resources that contain 1, 3, 5, 6, or 7 in the column or attribute that is called IntList, which is an array:
    IntList|<{1,3,5..7}
    
    IntList in (1,3,5..7)
    
  3. The following expression combines the previous two so that all rows and resources that have a name beginning with 'tr' and ending with '0' and have 1, 3, 5, 6, or 7 in the IntList column or attribute will match:
    (Name LIKE "tr%0")&&(IntList|<(1,3,5..7))
     
    
    (Name=~'tr.*0') AND (IntList IN {1,3,5..7})
    


[ Previous | Next | Table of Contents | Index | Library Home | Legal | Search ]