赞
踩
Variable Name | Variable Type | Function / Description |
admin | string | This is the email user name of the person to be notified when the system is rebooting, the default is "root". Assumes the sendmail program is installed and configured correctly. |
allocatable-memory | integer | This is similar to the older min-memory configuration, but actively tests for a given number of allocatable memory pages (typically 4kB/page on x86 hardware). Zero to disable test. |
change | integer | Time limit (in seconds) for a specified file time-stamp to age. Must come after the corresponding 'file' entry. |
file | string/R | The path/name of a file to be checked for existence and, (if 'change' given) for age. |
heartbeat-file | string | Name of the file for diagnostic heartbeat writes a time_t value (in ASCII) on each write to the watchdog device. |
heartbeat-stamps | integer | Number of entries in debug heartbeat file. |
interface | string/R | Name of interface (such as eth0) in /proc/net/dev to check for incoming (RX) bytes. |
interval | integer | Time interval (seconds) between polling for system health. Default is 1, but should not be more than [watchdog timeout]-2 seconds. |
log-dir | string | Path for watchdog log directory where the heartbeat file is usually kept, and where the files for re-directing test/repair scripts are kept. Default is /var/log/watchdog |
logtick | integer | Number of polling intervals between periodic "verbose" status messages. Default is 1 (i.e. every poll event). |
max-load-1 | integer | Limit on the 1-minute load-average before a reboot is triggered. Set to zero to ignore this test. |
max-load-5 | integer | Limit on the 5-minute load-average before a reboot is triggered. Set to zero to ignore this test. |
max-load-15 | integer | Limit on the 15-minute load-average before a reboot is triggered. Set to zero to ignore this test. |
max-temperature | integer | Limit on temperature before shut-down, Celsius. |
min-memory | integer | Minimum number of memory pages (typically 4kB/page on x86 hardware). Zero to disable test. |
pidfile | string/R | Path/name of a PID file related to a daemon to be monitored. |
ping | string/R | The IP address of a target for ICMP "ping" test. Must be in numeric IPv4 format such as 192.168.1.1 |
ping-count | integer | Number of ping attempts per polling interval. Must be >= 1 and default is 3 (hence with 1 second polling interval ping delay must be less than 333ms). |
priority | integer | The scheduling priority used with a call to the sched_setscheduler() function to configure the round-robin (SCHED_RR) priority for real-time use (only applicable if 'realtime' is true). |
realtime | yes/no | This flag is used to tell the watchdog daemon to lock its memory against paging out, and also to the permit real-time scheduling. It is strongly recommended to do this! |
repair-binary | string | The path/name of a program (or bash script, etc) that is used to make a repair on failed tests (other than auto-loaded V1 test scripts). |
repair-maximum | integer | Number of repair attempts on one "object" without success before giving up and rebooting. Default is 1, and setting this to zero will allow any number of repair attempts. |
repair-timeout | integer | Time limit (seconds) for the repair action. Default is 60 and beyond this a reboot is initiated. |
retry-timeout | integer | Time limit (seconds) from the first failure on a given "object" until it is deemed bad and a repair attempted (if possible, otherwise a reboot is the action). Default is 60 seconds. |
sigterm-delay | integer | Time between the SIGTERM signal being sent to all processes and the following SIGKILL signal. Default is 5 seconds, range 2-300. |
temperature-device | string | (depreciated) This was used in V5.13 and below for the old /dev/temperature style of device. With V5.15 & V6.0 the use of temperature-sensor is used and old style no longer supported. |
temperature-poweroff | yes/no | This flag decides if the system should power-off on overheating (default = yes), or perform a system halt and wait for Ctrl-Alt-Del reactivation (the "no" case). |
temperature-sensor | string/R | Name of the file-like device that holds temperature as an ASCII string in milli-Celsius, typically generated by the lm-sensors package. |
test-binary | string/R | The path/name of a V0 test program (or bash script, etc) used to extend the watchdog's range of health tests. NOTE: The V0 test binary should be considered as 'depreciated' and used for reverse compatibility only, and the the V1 test/repair script mode of operation used when ever possible. |
test-directory | string | The path name of the directory for auto-loaded V1 test/repair scripts. Default is: test-directory=/etc/watchdog.d This ability can be disabled completely by setting it to no string: test-directory= If the directory is not present it is ignored in any case. |
test-timeout | integer | Time limit (seconds) for any test scripts. Default is 60. This can be set to zero to disable the time-out, however, in this case a hung program will never be actioned, though all other tests will continue normally. |
verbose | yes/no | Provides basic control of the verbosity of the status messages. Previously this was only possible on the -v / --verbose command line options. |
watchdog-device | string | The name of the device for the watchdog hardware. Default is /dev/watchdog If this is not given (or disabled by setting it to no string) the watchdog can still function, but will not be effective as any internal watchdog faults or kernel panic will be unrecoverable. |
watchdog-timeout | integer | The timeout to set the watchdog device to. Default is 60 seconds and it is not recommended to change this without good reason. Not all watchdog hardware supports configuration, or configuration to second resolution, etc. |
Warning: There is currently a bug/feature where by the order of loading the temperature sensor modules determines the abstracted names (e.g. the first module loaded becomes /sys/class/hwmon/hwmon0 and the second /sys/class/hwmon/hwmon1 etc.)Since the new lm-sensors style of monitoring provides files in milli-Celsius the watchdog now always works in Celsius, and the maximum temperature is set using the configuration option, for example:
If using the abstracted paths (e.g. /sys/class/hwmon/hwmon0) rather then the device paths (e.g. /sys/devices/platform/w83627ehf.2576) then make sure you black-list any modules that are automatically loaded by adding a suitable entry to one of the files in /etc/modprobe.d/ and then add all modules for temperature sensing to /etc/modules as that appears to force deterministic enumeration.
http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
In a simple form, a load average above 1 per CPUs indicates tasks are being held up due to a lack of resources, either CPU time or I/O delays. This is not a problem if it is only happening at peak times of the day and/or if it is only by a modest amount (say 1-2 times the number of CPUs).Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。