The syschk BASIC program checks a system running under UNIX to detect abnormal situations.
For Windows: Not Supported
syschk keyword{=value} {...} {(options}
start | Starts the syschk command running as a phantom with the same argument as the last time it was started. If necessary, the process is stopped before it is restarted. If the process was never started, a set of defaults is applied. | |
edit | Enters the Update processor to allow editing the arguments used by the start command. Note that if any edits are made to the arguments, the syschk process needs to be stopped and restarted for the edits to take affect. | |
now | Instructs the syschk phantom process to take a sample immediately, and sends the anomaly messages, if any, to the terminal requesting the sample, instead of the normal notify mechanism. | |
stop | Stops the syschk command running as a phantom. | |
options | a | Logs a short summary of all error messages to the
errors file. The initial swap space control is always logged, and can be specified by using the logall keyword. |
f | Starts syschk in foreground on the current process, as opposed to a phantom process. With this option, the only way to stop the process is to do a break/end, or a logoff. | |
l | Logs a short summary of the error messages to the
errors file. The initial swap space control is always logged, and can be specified by using the log keyword. |
|
q | Quiet option suppresses the user messages (for example, started, stopped). | |
v | Verbose option can be used if syschk runs in the foreground, instead of a phantom. |
If the syschk command fails to access a system parameter, an error message, such as: Cannot get swap usage, is sent to the system administrator. See the following error messages. Usually, changing the set user-ID bit gives adequate permissions to the tools.
For example, to run sar as non-root on AIX:
chmod 06755 /usr/sbin/sar chmod 06755 /usr/lib/sa/sadc
AIX: ’lsps -a/’ Linux: !cat /proc/meminfo |grep Swap
This command can use some very intrusive UNIX commands. Do not run it with a sampling period less than a few minutes.
The UNIX parameters are obtained through a variety of more or less UNIX-dependent commands. Often, on a given UNIX version, there might be several ways of obtaining the information. Therefore, results shown by syschk might differ from results provided by other UNIX tools and be very difficult to compare to results given by another system.
For example, a given application might prompt the syschk command to complain about heavy CPU usage on a UNIX platform and not on another, all other factors being equal (if this is possible). When a UNIX device is specified in the notify parameter, the messages are written asynchronously on the device, even if it is not logged on to either UNIX or D3. It is a good idea to specify the UNIX console as a notify device.
The syschk command starts a phantom process that periodically checks if the system is behaving normally. If a system parameter goes beyond a threshold, defined by the system administrator, a message is sent to one or more users and (optionally) an entry is put in the errors file. All elements are optional. These elements are controlled:
UNIX Swap | When syschk is started, it always
ensures that the UNIX swap (or paging) space is at least
equal to twice the physical memory. It then periodically checks that the swap usage does not go beyond a predefined level (90% of the total available space by default). |
Total System CPU | Percentage of the CPU spent in system mode (Kernel, drivers, and so on) must stay below a predetermined level (25% by default). |
Runaway Processes | Each time a sample is taken, the CPU time of the active
processes is controlled. If a process has consumed more than a predefined percentage of the sampling period, a warning is issued. For example: If the sampling period is 10 minutes (600 seconds), and if the process consumed more than 5% of this time (30 seconds, which is an enormous amount of CPU), this process is probably in an abnormal tight CPU loop. The syschk command displays the UNIX status (ps) and the result of a where if it is a D3 process. If a process exceeds the limit more than three times consecutively, the reporting of the error stops. This is to ensure that the system administrator does not receive constant messages, for example, for a process that is running a large report. If the process is still running after nine samples, then the reporting restarts three more times, and the reporting cycle restarts. |
UNIX File System Usage | The state of a predetermined list of UNIX file systems is
controlled to ensure that the file systems do not get full
(used over 90%). This is to prevent UNIX crashes due to the filling up of critical file systems. By default, only the forward-slash is controlled. |
Overflow Usage | When the used D3 overflow exceeds a predetermined
percentage of the total overflow space, a message is
generated. The default level is 90%. Overflow is reported only once a day, at the first sample taken after noon. |
Basic Usage | Monitors the usage of the FlashBASIC basic area. When the used basic space exceeds a predetermined percentage of the total basic space, a message is generated. The default level is 90%. To examine the basic space in more detail, use the shpstat command. |
The syschk command must be run on the dm account. Only one example of a phantom process running syschk is supported at any given time.
Without any argument, the syschk command reports whether syschk is currently running and displays when it was started, the parameters, and some of the current system parameter values. With one or more arguments, syschk creates a phantom process and returns immediately to TCL. The argument specifies which system parameters to control and can be in any order.
keyword{=value}
Sampling Delay
The delay can be expressed in seconds in this format: hh:mm:ss. If sampling is not specified, the default is 30 minutes.
sampling=[sec|hh:mm:ss]
D3 Users, D3 Port Numbers, UNIX Devices Lists
Specifies the list of D3 users, D3 port numbers, or UNIX devices to notify in case of abnormal situation. The users can be specified as a list of explicit D3 user-IDs, D3 port numbers, in decimal, prefixed by an !, UNIX devices, prefixed by a /, or any combination.
notify=[user{,..},!n{,..}, /dev/ ttyXX{,...},*|off]
Maximum Percentage of Total CPU: System Mode
The following specifies the maximum percentage of total CPU usage UNIX is allowed to spend in System mode. If % is not specified, the default is 25%.
syscpu{=percentage{%}}
Maximum Percentage of CPU: Process
The following specifies the maximum percentage of CPU a process is allowed to take. If % is not specified, the default is 25%. This trigger point might be difficult to evaluate.
proccpu{=percentage{%}}
For example, on a system with only one active user running a FlashBASIC CPU intensive program, the process takes 100% of the CPU, since there is no other running process. To avoid false alarms, select a sampling period large enough. It is probably unusual to have a process doing 100% of CPU for 15 minutes.
Acceptable Swap Usage
The following specifies the acceptable swap usage at any given time. The percentage is the amount of swap actually used. If % is not specified, a swap usage above 90% of the total swap is considered abnormal.
swapusg{=percentage{%}}
Acceptable Overflow Usage
The following specifies the acceptable overflow usage at any given time. The percentage is the amount of overflow actually used. If % is not specified, an overflow usage above 90% of the total D3 space is considered abnormal. Errors are reported only once a day.
ovfusg{=percentage{%}}
Acceptable Basic Usage
The following specifies the acceptable basic usage at any given time. The percentage is the amount of basic space actually used. If % is not specified, a basic usage above 90% of the total basic space is considered abnormal.
basicusg{=percentage{%}}
UNIX File System List
The following specifies the list of the UNIX file system, which should never get full (over 90%). The list should always include a forward-slash.
Depending on the system, the /usr and /tmp files might have to be included. Some UNIX systems are not able to boot with a full back slash. If not specified, a forward slash is the only UNIX file system to be checked.
diskusg=filesystem{,filesystem,..}
Log Messages
The log keyword logs messages in the errors file. This keyword is equivalent to the l option.
log
Do Not Log Messages
The nolog keyword does not log messages in the errors file. This keyword is equivalent to not having the l option, and it supersedes the l option.
nolog
Logall Messages
The logall keyword logs all messages in the errors file. This keyword is equivalent to the a option.
If the l option is used, some warnings are not logged to the errors file. This is done to help alleviate the need to constantly clear or delete items from the errors file. The a option will cause the syschk command to log all warnings to the errors file, including those warnings not logged by the l option.
logall
Stop syschk
When started as a phantom, the syschk command runs indefinitely, until the system is shut down. To stop it, use the syschk stop command.
Could not get the swap information. These are the UNIX commands used:
Cannot get swap usage: syschk
These are the UNIX commands used:
Cannot get real memory: syschk
AIX: ’lsattr -l sys0 -E’ Linux: !cat /proc/meminfo |grep Memtotal
Could not get information on the currently active processes. On all systems, this is obtained by a ps -ef.
Cannot get proc list: syschk
Could not get information on the UNIX file systems. On all systems, this information is obtained by some form of df. The most common errors involve a remote file system, which is unreachable.
Cannot get FS: syschk
Example 1
This example starts a phantom process to check the system every 30 minutes for a CPU system usage above 10%, a swap usage above 60% and a process runaway limit of 10%.
syschk sampling=00:30:00 syscpu=10% notify=/dev/tty0,bob,!0 swapusg=60% proccpu=10% log
Example 2
This example logs syschk messages to the errors file and uses syschk edit to set the check period to 30 seconds. After starting syschk with the A option, run the following program:
fin=1 loop while fin do repeat
You will see messages logged every 30 seconds logged about CPU usage.
Example 3
This example checks whether syschk is running:
syschk syschk is running on port 132 Started on 03/11/94 at 08:20:21 Current running parameters: Sampling period 00:30:00 Notify list /dev/tty0 bob !0 Maximum system CPU % 10 Maximum CPU % per process 25 Maximum % of swap 60 Maximum % of overflow 90 UNIX file systems / Log messages (0=no;1=yes) 1 Current System Status: User CPU usage 3% System CPU usage 11% Waiting for IO 82% Idle CPU 4% Total swap space 128 Mb Used swap space 76 Mb (59%)
Example 4
This example stops the phantom running syschk as a background process, suppressing the message stopped. This command could be included in the user-shutdown macro.
syschk stop (q
Example 5
This example restarts the syschk phantom with the same parameters.
syschk start
Example 6
This example edits the syschk command line to change the arguments. Use the Update processor command to edit the command line.
syschk edit