AIX系统日志学习笔记之一


AIX系统上线之后,难免会出现错误,为了应对错误,aix提供了很多处理错误的方法和日志记录机制,为修复故障和系统提供方便。
Errdemon是aix的一个守护进程,该进程会实时检查/dev/drror设备文件,查看是否有新的内容,并将与系统错误模版对比,将错误信息写入系统错误日志中。
 
Errdemon守护进程会在系统启动是自动启动,也可以手动启动:
#/usr/lib/errdemon
关闭errdemon守护进程
#/usr/lib/errstop
#ps –ef | greperrdemon
 
AIX错误日志记录在/var/adm/ras/errlog中、
以下可以确定系统中错误日志文件的位置,日志文件的大小,缓存占用情况等
/usr/lib/errdemon–l    
以下命令可以更改日志文件的大小
/usr/lib/errdemon–s 2097153
日志缓存设置
/usr/lib/errdemon–B 16384
 
AIX将日志记录下来之后,同时提供errpt命令来查看错误日志。另外一个诊断命令是diag用来诊断和分析硬件错误,而errpt仅仅是打印错误。
1、errpt命令
# errpt --h
errpt: Not arecognized flag: -
Usage:   errpt -@ wpar_name -actgDP -s startdate -eenddate
         -N resource_name_list -Sresource_class_list -R resource_type_list
         -T err_type_list -d err_class_list -jid_list -k id_list
         -J label_list -K label_list -lseq_no_list -F flags_list
         -m machine_id -n node_id -i filename -yfilename -z filename
         -I filename
 
Process errorlog entries from the supplied file(s).
-i filename  Uses the error log file specified by thefilename parameter.
-y filename  Uses the error record template file specifiedby the filename
                         parameter.
-z filename  Uses the error logging message catalogspecified by the filename
                         parameter.
-I filename  Uses the diagnostics error log specified bythe filename
                        parameter.
 
Output formattederror log entries sorted chronologically.
显示全部错误日志的详细信息
-a         Print adetailed listing. Default is a summary listing.
-c         Concurrent mode. Display error logentries as they arrive.
-t         Print error templates instead of errorlog entries.
-g         Output raw ascii  error record structures.
-D         Consolidate duplicate errors.
-P         Show only duplicates from the errordevice driver.
 
Error log entryqualifiers:
-@wpar_name    Select entries for the wparname.
下面两个是起止日期
-s startdate  Selectentries posted later   than date.(MMddhhmmyy)
-e enddate    Selectentries posted earlier than date. (MMddhhmmyy)
-N list       Select resource_names   in 'list'.
-S list       Select resource_classes in 'list'.
-R list       Select resource_types   in 'list'.
-T list       Select types            in 'list'.
-d list       Select classes          in 'list'.
指定错误ID
-j list       Selectids              in 'list'.
-k list       Select ids  NOT        in 'list'.
-J list       Select labels           in 'list'.
-K list       Select labels NOT       in 'list'.
-l list       Select sequence_numbers in 'list'.
-F list       Select templates according to the valueof the
              Alert, Log, or Report field.
-m machine_idSelect entries for the machine id as output by uname -m.
-n node_id    Select entries for the node id    as output by uname -n.
 
'list' is a listof entries separated by commas.
错误信息严重性:
error_type  =PERM,TEMP,PERF,PEND,UNKN,INFO
错误类型:                                              
error_class = H (HARDWARE), S (SOFTWARE), O (errloggerMESSAGES), U (UNDETERMINED)
 
常用的命令有:
1、列出简短的出错信息
errpt | more
2、列出所有硬件出错信息      
errpt -d H
3、列出所有软件错误信息        
errpt -d S
4、列出详细的出错信息
errpt –a
5、指定错误id号查询     
errpt -aj ERROR_ID
6、永久错误信息
errpt -T PERM -d H
 
2、错误日志处理方法
#errclear                     从错误日志中删除记录
#errstop/errdemon            停止错误记录守护进程/启动错误记录守护进程
    #errclear
0315-136 Number of days is required, and must be zero or greater.
Usage:
errclear -@ wpar_name -J err_label_list -K err_label_list -Nresource_name_list
        -R resource_type_list -S resource_class_list -T err_type_list
        -d err_class_list -i filename -m machine_id -n node_id
        -j id_list -k id_list -l seq_no_list -y filename number_of_days
 
Delete error log entries in the specified list that are older than
number_of_days specified. Number_of_days refers to the number of twenty
four hour periods from command invocation time.
-@ wpar_name    Delete entriesfor the wpar name.
-J list       Select onlyerror_labels     in 'list'.
-K list       Select onlyerror_labels not in 'list'.
-N list       Select onlyresource_names   in 'list'.
-S list       Select onlyresource_classes in 'list'.
-R list       Select onlyresource_types   in 'list'.
-T list       Select onlyerror_types      in 'list'.
-d list      Select only error_classes    in'list'.
-i filename   Uses the errorlog file specified by the filename parameter.
-j list       Select onlyerror_ids        in 'list'.
-k list       Select onlyerror_ids  not   in 'list'.
-l list       Selectsequence_numbers in 'list'.
-m machine_id Delete entries for the machine id as output by uname-m.
-n node_id    Delete entriesfor the node id    as output by uname -n.
-y filename   Uses the error recordtemplate file specified by the filename
              parameter.
'list' is a list of entries separated by commas.
error_type = PERM,TEMP,PERF,PEND,UNKN,INFO
error_class = H (HARDWARE), S (SOFTWARE), O(errlogger MESSAGES), U (UNDETERMINED)
 
常用的errclear命令
  从错误日志中删除所有记录,请输入:
errclear  0
  从错误日志中删除所有软件错误类的条目
errclear -d S 0
从错误日志中删除所有硬件错误类的条目
errclear -d H 0

摘自 wolf

相关内容

    暂无相关文章