syschk Command

For Windows: Not Supported

The syschk BASIC program checks a system running under UNIX to detect abnormal situations.

The syschk command starts a phantom process which periodically checks if the system is behaving normally. If a system parameter goes beyond a threshold, defined by the system administrator, a message is sent to one or more users, and, optionally, an entry is put in the errors file. All elements are optional. These elements are controlled:

UNIX Swap

When syschk is started, it always ensures that the UNIX swap (or paging) space is at least equal to twice the physical memory. Then it checks periodically the swap usage does not go beyond a predefined level (90% of the total available space by default).

Total System CPU

Percentage of the CPU spent in system mode (Kernel, drivers, and so on) must stay below a predetermined level (25% by default).

Runaway Processes

Each time a sample is taken, the CPU time of the active processes is controlled. If a process has consumed more than a predefined percentage of the sampling period, a warning is issued. For example, if the sampling period is 10 minutes (600 seconds), and if the process consumed more than 5% of this time (30 seconds, which is an enormous amount of CPU), this process is probably in an abnormal tight CPU loop. syschk displays the UNIX status (ps) and the result of a where, if it is a D3 process. If a process exceeds the limit more than three times in a row, the reporting of the error stops. This is to ensure that the system administrator does not receive constant messages for a process which is running a large report, for example. If the process is still running after nine samples, then the reporting restarts three more times, and the reporting cycle restarts.

UNIX File System Usage

The state of a predetermined list of UNIX file systems is controlled, to make sure they do not get full (used over 90%). This is to prevent UNIX crashes due to the filling up of critical file systems. By default, only the forward-slash is controlled.

Overflow Usage

When the used D3 overflow exceeds a predetermined percentage of the total overflow space, a message is generated. The default level is 90%. Overflow is reported only once a day, at the first sample taken after noon.

Basic Usage

Monitors the usage of the FlashBASIC basic area. When the used basic space exceeds a predetermined percentage of the total basic space, a message is generated. The default level is 90%. To examine the basic space in more detail, use the shpstat command.

The syschk command must be run on the dm account. Only one example of a phantom process running syschk is supported at any given time.

Without any argument, the syschk command reports whether syschk is currently running and displays when it was started the parameters and some of the current system parameter values. With one or more arguments, syschk creates a phantom process and returns immediately to TCL. Arguments can be specified in any order.

keyword{=value}

The argument specifies which system parameters to control:

sampling=[sec|hh:mm:ss]

Specifies the sampling delay. The delay can be expressed either in seconds in this format: hh:mm:ss. If sampling is not specified, the default is 30 minutes.

notify=[user{,..},!n{,..}, /dev/ ttyXX{,...},*|off]

Specifies the list of D3 users, of D3 port numbers or UNIX devices to notify in case of abnormal situation. The users can be specified as a list of explicit D3 user-IDs, D3 port numbers, in decimal, prefixed by an !, UNIX devices, prefixed by a /, or any combination. If an asterisk is used, all users logged on the system are notified. If a period is used, it is replaced by the tty name. If off is specified, notification is disabled. If notify is not specified, or if the specified users are not logged on at the time the anomaly occurs, a message is sent to dm or SYSPROG. If a UNIX device is specified, it must exist and be writable when syschk is started.

syscpu{= percentage {%}}

Specifies the maximum percentage of total CPU usage UNIX is allowed to spend in System mode. If % is not specified, the default is 25%.

proccpu{= percentage {%}}

Specifies the maximum percentage of CPU a process is allowed to take. If % is not specified, the default is 25%. This trigger point may be a little difficult to evaluate. For example, on a system with only one active user running a FlashBASIC CPU intensive program, the process takes 100% of the CPU, since there is no other running process. To avoid false alarms, select a sampling period large enough. It is probably unusual to have a process doing 100% of CPU for 15 minutes.

swapusg{= percentage {%}}

Specifies the acceptable swap usage at any given time. The percentage is the amount of swap actually used. If % is not specified, a swap usage above 90% of the total swap is considered abnormal.

ovfusg{= percentage {%}}

Specifies the acceptable overflow usage at any given time. The percentage is the amount of overflow actually used. If % is not specified, an overflow usage above 90% of the total D3 space is considered abnormal. Errors are reported only once a day.

basicusg{= percentage {%}}

Specifies the acceptable basic usage at any given time. The percentage is the amount of basic space actually used. If % is not specified, a basic usage above 90% of the total basic space is considered abnormal.

diskusg= filesystem{,filesystem,..}

Specifies the list of the UNIX file system, which should never get full (over 90%). The list should always include a forward-slash. Depending on the system, the files: /usr and /tmp may have to be included. Some UNIX systems, are not able to boot with a full backslash. If not specified, a forward-slash is the only UNIX file system to be checked.

log

Logs messages in the errors file. This keyword is equivalent to the (l) option.

nolog

Does not log messages in the errors file. This keyword is equivalent to not having the (l) option. It supersedes the (l) option.

When started as a phantom, syschk runs indefinitely, until the system is shut down. To stop it, use syschk stop.

Syntax

syschk keyword{=value} {...} {(options}

Parameter(s)

start

Starts the syschk command running as a phantom with the same argument as the last time it was started. If necessary, the process is stopped before it is restarted. If the process was never started, a set of defaults is applied.

edit

Enters the Update Processor to allow editing the arguments used by the start command. Note that if any edits are made to the arguments, the syschk process needs to be stopped and restarted for the edits to take affect.

now

Instructs the syschk phantom process to take a sample immediately, and sends the anomaly messages, if any, to the terminal requesting the sample, instead of the normal notify mechanism.

stop

Stops the syschk command running as a phantom.

options

f

Starts syschk in foreground on the current process, as opposed to a phantom process. With this option, the only way to stop the process is to do a break/end, or a logoff.

l

Logs a short summary of the error messages to the errors file. The initial swap space control is always logged. Can be specified by using the log keyword.

q

Quiet option suppresses the user messages (for example, started, stopped).

v

Verbose option can be used if syschk runs in the foreground, instead of a phantom.

 

CAUTION

syschk uses UNIX system commands like sar, ps, and some others. If the process which executes this command does not have root privilege, it may be necessary to change the permissions of the required utilities to make these commands work. If syschk fails to access a system parameter, an error message such as: Cannot get swap usage, is sent to the system administrator (see the error messages below). Usually, changing the set user-ID bit gives adequate permissions to the tools. For example, to run sar as nonroot on AIX:

chmod 06755 /usr/sbin/sar

chmod 06755 /usr/lib/sa/sadc

This command can use some very intrusive UNIX commands. Do not run it with a sampling period less than a few minutes.

The UNIX parameters are obtained through a variety of more or less UNIX dependent commands, and, often, on a given UNIX version, there may be several ways of obtaining the information. Therefore, results shown by syschk may differ from results provided by other UNIX tools, and may be very difficult to compare to results given by another system. For example, a given application may prompt syschk to complain about heavy CPU usage on a UNIX platform, and not on another, all other factors being equal (if this is possible). When a UNIX device is specified in the notify parameter, the messages are written asynchronously on the device, even if it is not logged on to either UNIX or D3. It is a good idea to specify the UNIX console as a notify device.

Error messages include:

Cannot get swap usage: syschk

Could not get the swap information. These are the UNIX commands used:

AIX: ’lsps -a/’

Linux: !cat /proc/meminfo |grep Swap

NOTE—The Linux command used is case sensitive.

Cannot get real memory: syschk

Could not get the amount of physical memory. These are the UNIX commands used:

AIX: ’lsattr -l sys0 -E’

Linux: !cat /proc/meminfo |grep Memtotal

NOTE—The Linux command used is case sensitive.

Cannot get proc list: syschk

Could not get information on the currently active processes. On all systems, this is obtained by a ps -ef.

Cannot get FS: syschk

Could not get information on the UNIX file systems. On all systems, this information is obtained by some form of df. The most common errors involve a remote file system, which is unreachable.

Example(s)

This starts a phantom process to check the system every 30 minutes for a CPU system usage above 10%, a swap usage above 60% and a process runaway limit of 10%. In case of anomaly, a message is sent to the UNIX terminal /dev/tty0, the D3 user bob if he is logged on, and to the line 0 whether it is logged on or not, and a short message is logged in the errors file. The other parameters are left to their default values.

syschk sampling=00:30:00 syscpu=10%

notify=/dev/tty0,bob,!0

swapusg=60% proccpu=10% log

Checks whether syschk is running.

syschk

syschk is running on port 132

Started on 03/11/94 at 08:20:21

Current running parameters:

Sampling period 00:30:00

Notify list /dev/tty0 bob !0

Maximum system CPU % 10

Maximum CPU % per process 25

Maximum % of swap 60

Maximum % of overflow 90

UNIX file systems /

Log messages (0=no;1=yes) 1

Current System Status:

User CPU usage 3%

System CPU usage 11%

Waiting for IO 82%

Idle CPU 4%

Total swap space 128 Mb

Used swap space 76 Mb (59%)

Stops the phantom running syschk as a background process, suppressing the message stopped. This command could be included in the user-shutdown macro.

syschk stop (q

Restarts the syschk phantom with the same parameters.

syschk start

Edits the syschk command line to change the arguments. Use the Update Processor command to edit the command line.

syschk edit

See Also

buffers Command, Performance Monitoring, shpstat Command, system-coldstart Macro