syschk command

The syschk BASIC program checks a system running under UNIX to detect abnormal situations.

For Windows: Not Supported

Syntax

 
  syschk keyword{=value} {...} {(options}

Parameter(s)

start Starts the syschk command running as a phantom with the same argument as the last time it was started. If necessary, the process is stopped before it is restarted. If the process was never started, a set of defaults is applied.
edit Enters the Update processor to allow editing the arguments used by the start command. Note that if any edits are made to the arguments, the syschk process needs to be stopped and restarted for the edits to take affect.
now Instructs the syschk phantom process to take a sample immediately, and sends the anomaly messages, if any, to the terminal requesting the sample, instead of the normal notify mechanism.
stop Stops the syschk command running as a phantom.
options a Logs a short summary of all error messages to the errors file.

The initial swap space control is always logged, and can be specified by using the logall keyword.

f Starts syschk in foreground on the current process, as opposed to a phantom process. With this option, the only way to stop the process is to do a break/end, or a logoff.
l Logs a short summary of the error messages to the errors file.

The initial swap space control is always logged, and can be specified by using the log keyword.

q Quiet option suppresses the user messages (for example, started, stopped).
v Verbose option can be used if syschk runs in the foreground, instead of a phantom.
CAUTION:
The syschk command uses UNIX system commands, such as sar, ps, and some others. If the process that executes this command does not have root privilege, it might be necessary to change the permissions of the required utilities to make these commands work.

If the syschk command fails to access a system parameter, an error message, such as: Cannot get swap usage, is sent to the system administrator. See the following error messages. Usually, changing the set user-ID bit gives adequate permissions to the tools.

For example, to run sar as non-root on AIX:

 chmod 06755 /usr/sbin/sar
 chmod 06755 /usr/lib/sa/sadc
 AIX: ’lsps -a/’ 
 Linux: !cat /proc/meminfo |grep Swap

This command can use some very intrusive UNIX commands. Do not run it with a sampling period less than a few minutes.

Description

The UNIX parameters are obtained through a variety of more or less UNIX-dependent commands. Often, on a given UNIX version, there might be several ways of obtaining the information. Therefore, results shown by syschk might differ from results provided by other UNIX tools and be very difficult to compare to results given by another system.

For example, a given application might prompt the syschk command to complain about heavy CPU usage on a UNIX platform and not on another, all other factors being equal (if this is possible). When a UNIX device is specified in the notify parameter, the messages are written asynchronously on the device, even if it is not logged on to either UNIX or D3. It is a good idea to specify the UNIX console as a notify device.

The syschk command starts a phantom process that periodically checks if the system is behaving normally. If a system parameter goes beyond a threshold, defined by the system administrator, a message is sent to one or more users and (optionally) an entry is put in the errors file. All elements are optional. These elements are controlled:

UNIX Swap When syschk is started, it always ensures that the UNIX swap (or paging) space is at least equal to twice the physical memory.

It then periodically checks that the swap usage does not go beyond a predefined level (90% of the total available space by default).

Total System CPU Percentage of the CPU spent in system mode (Kernel, drivers, and so on) must stay below a predetermined level (25% by default).
Runaway Processes Each time a sample is taken, the CPU time of the active processes is controlled.

If a process has consumed more than a predefined percentage of the sampling period, a warning is issued. For example: If the sampling period is 10 minutes (600 seconds), and if the process consumed more than 5% of this time (30 seconds, which is an enormous amount of CPU), this process is probably in an abnormal tight CPU loop.

The syschk command displays the UNIX status (ps) and the result of a where if it is a D3 process. If a process exceeds the limit more than three times consecutively, the reporting of the error stops. This is to ensure that the system administrator does not receive constant messages, for example, for a process that is running a large report.

If the process is still running after nine samples, then the reporting restarts three more times, and the reporting cycle restarts.

UNIX File System Usage The state of a predetermined list of UNIX file systems is controlled to ensure that the file systems do not get full (used over 90%).

This is to prevent UNIX crashes due to the filling up of critical file systems. By default, only the forward-slash is controlled.

Overflow Usage When the used D3 overflow exceeds a predetermined percentage of the total overflow space, a message is generated.

The default level is 90%. Overflow is reported only once a day, at the first sample taken after noon.

Basic Usage Monitors the usage of the FlashBASIC basic area.

When the used basic space exceeds a predetermined percentage of the total basic space, a message is generated. The default level is 90%. To examine the basic space in more detail, use the shpstat command.

The syschk command must be run on the dm account. Only one example of a phantom process running syschk is supported at any given time.

Without any argument, the syschk command reports whether syschk is currently running and displays when it was started, the parameters, and some of the current system parameter values. With one or more arguments, syschk creates a phantom process and returns immediately to TCL. The argument specifies which system parameters to control and can be in any order.

 keyword{=value}

Sampling Delay

The delay can be expressed in seconds in this format: hh:mm:ss. If sampling is not specified, the default is 30 minutes.

 sampling=[sec|hh:mm:ss]

D3 Users, D3 Port Numbers, UNIX Devices Lists

Specifies the list of D3 users, D3 port numbers, or UNIX devices to notify in case of abnormal situation. The users can be specified as a list of explicit D3 user-IDs, D3 port numbers, in decimal, prefixed by an !, UNIX devices, prefixed by a /, or any combination.

 notify=[user{,..},!n{,..}, /dev/ ttyXX{,...},*|off]

  • If an asterisk is used, all users logged on the system are notified.
  • If a period is used, it is replaced by the tty name.
  • If off is specified, notification is disabled.
  • If a UNIX device is specified, it must exist and be writable when syschk is started.
  • If notify is not specified, or if the specified users are not logged on at the time the anomaly occurs, a message is sent to dm or SYSPROG.

Maximum Percentage of Total CPU: System Mode

The following specifies the maximum percentage of total CPU usage UNIX is allowed to spend in System mode. If % is not specified, the default is 25%.

 syscpu{=percentage{%}}

Maximum Percentage of CPU: Process

The following specifies the maximum percentage of CPU a process is allowed to take. If % is not specified, the default is 25%. This trigger point might be difficult to evaluate.

 proccpu{=percentage{%}}

For example, on a system with only one active user running a FlashBASIC CPU intensive program, the process takes 100% of the CPU, since there is no other running process. To avoid false alarms, select a sampling period large enough. It is probably unusual to have a process doing 100% of CPU for 15 minutes.

Acceptable Swap Usage

The following specifies the acceptable swap usage at any given time. The percentage is the amount of swap actually used. If % is not specified, a swap usage above 90% of the total swap is considered abnormal.

 swapusg{=percentage{%}}

Acceptable Overflow Usage

The following specifies the acceptable overflow usage at any given time. The percentage is the amount of overflow actually used. If % is not specified, an overflow usage above 90% of the total D3 space is considered abnormal. Errors are reported only once a day.

 ovfusg{=percentage{%}}

Acceptable Basic Usage

The following specifies the acceptable basic usage at any given time. The percentage is the amount of basic space actually used. If % is not specified, a basic usage above 90% of the total basic space is considered abnormal.

 basicusg{=percentage{%}}

UNIX File System List

The following specifies the list of the UNIX file system, which should never get full (over 90%). The list should always include a forward-slash.

Depending on the system, the /usr and /tmp files might have to be included. Some UNIX systems are not able to boot with a full back slash. If not specified, a forward slash is the only UNIX file system to be checked.

 diskusg=filesystem{,filesystem,..}

Log Messages

The log keyword logs messages in the errors file. This keyword is equivalent to the l option.

 log

Do Not Log Messages

The nolog keyword does not log messages in the errors file. This keyword is equivalent to not having the l option, and it supersedes the l option.

 nolog

Logall Messages

The logall keyword logs all messages in the errors file. This keyword is equivalent to the a option.

If the l option is used, some warnings are not logged to the errors file. This is done to help alleviate the need to constantly clear or delete items from the errors file. The a option will cause the syschk command to log all warnings to the errors file, including those warnings not logged by the l option.

  logall

Stop syschk

When started as a phantom, the syschk command runs indefinitely, until the system is shut down. To stop it, use the syschk stop command.

Error Message Types

Could not get the swap information. These are the UNIX commands used:

Note: The Linux command used is case sensitive.
 Cannot get swap usage: syschk

Could not get the amount of physical memory.
Note: The Linux command used is case sensitive.

These are the UNIX commands used:

 Cannot get real memory: syschk

 AIX: ’lsattr -l sys0 -E’
 Linux: !cat /proc/meminfo |grep Memtotal

Could not get information on the currently active processes. On all systems, this is obtained by a ps -ef.

 Cannot get proc list: syschk

Could not get information on the UNIX file systems. On all systems, this information is obtained by some form of df. The most common errors involve a remote file system, which is unreachable.

 Cannot get FS: syschk

Example(s)

Example 1

This example starts a phantom process to check the system every 30 minutes for a CPU system usage above 10%, a swap usage above 60% and a process runaway limit of 10%.

In case of anomaly, a message is sent to the:
  • UNIX terminal /dev/tty0
  • D3 user bob if he is logged on
  • Line 0, whether it is logged on or not
A short message is also logged in the errors file. The other parameters are left to their default values.
 syschk sampling=00:30:00 syscpu=10%
 notify=/dev/tty0,bob,!0
 swapusg=60% proccpu=10% log

Example 2

This example logs syschk messages to the errors file and uses syschk edit to set the check period to 30 seconds. After starting syschk with the A option, run the following program:

 fin=1
 loop while fin do
 repeat

You will see messages logged every 30 seconds logged about CPU usage.

Example 3

This example checks whether syschk is running:

 syschk
 syschk is running on port 132
 Started on 03/11/94 at 08:20:21
 Current running parameters:
      Sampling period 00:30:00
      Notify list /dev/tty0 bob !0
      Maximum system CPU % 10
      Maximum CPU % per process 25
      Maximum % of swap 60
      Maximum % of overflow 90
      UNIX file systems /
      Log messages (0=no;1=yes) 1
 Current System Status:
      User CPU usage 3%
      System CPU usage 11%
      Waiting for IO 82%
      Idle CPU 4%
      Total swap space 128 Mb
      Used swap space 76 Mb (59%)

Example 4

This example stops the phantom running syschk as a background process, suppressing the message stopped. This command could be included in the user-shutdown macro.

 syschk stop (q

Example 5

This example restarts the syschk phantom with the same parameters.

 syschk start

Example 6

This example edits the syschk command line to change the arguments. Use the Update processor command to edit the command line.

 syschk edit