数日前にも、サーバが不調となり再起動をしていたが、 今日も同様と思われるような症状の末、"BUG soft locked"なるメッセージを出しながら、 CPU負荷が異常に高くなり、再び再起動となった。
以下、そのメモ。
症状その1: 婆ちゃん猫ように設置している WebCam を、ママが確認したらカメラが真っ暗で映らない。 その30分後あたりから、サーバが異様に負荷が高くなり、loadaverage=10程にまで達し、警告メールが 届く。すぐさま login をして、top などでプロセスを監視するが、カーネルに近いプロセスが異常に 負荷が高いが、その原因となったと思われる高負荷プロセスは見つからない。 その確認作業中には、以下の様な"BUG: soft lockup – CPU#2 stuck for 22s"といったメッセージが 何度か表示される。 しかたがないので、ひとまず再起動。
Message from syslogd@perrine at Jul 29 07:56:32 ... kernel:[216966.348806] BUG: soft lockup - CPU#2 stuck for 22s! [lsof:2139]
改めて、/var/log/messages を確認する。
Jul 29 07:13:19 perrine kernel: [214372.550873] usb 2-1.1: USB disconnect, device number 3 Jul 29 07:13:21 perrine kernel: [214374.787731] usb 2-1.1: new high-speed USB device number 5 using ehci-pci Jul 29 07:13:21 perrine kernel: [214374.982560] usb 2-1.1: New USB device found, idVendor=0ac8, idProduct=332d Jul 29 07:13:21 perrine kernel: [214374.982570] usb 2-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 Jul 29 07:13:21 perrine kernel: [214374.982577] usb 2-1.1: Product: Vega USB 2.0 Camera. Jul 29 07:13:21 perrine kernel: [214374.982583] usb 2-1.1: Manufacturer: Vimicro Corp. Jul 29 07:13:21 perrine kernel: [214374.984000] uvcvideo: Found UVC 1.00 device Vega USB 2.0 Camera. (0ac8:332d) Jul 29 07:13:21 perrine kernel: [214374.986621] input: Vega USB 2.0 Camera. as /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1/2-1.1:1.0/input/input15 Jul 29 07:13:22 perrine mtp-probe: checking bus 2, device 5: "/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.1" Jul 29 07:13:22 perrine mtp-probe: bus: 2, device: 5 was not an MTP device
ここまでが、Webカメラのメッセージ。なぜか、disconnect 後に改めて connect してる。 この直後に以下の様な kernel の異常メッセージが始まっている。
Jul 29 07:42:11 perrine rsyslogd: [origin software="rsyslogd" swVersion="7.4.2" x-pid="3103" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Jul 29 07:43:23 perrine nagios3: Auto-save of retention data completed successfully. Jul 29 07:44:44 perrine kernel: [216258.276631] general protection fault: 0000 [#1] SMP Jul 29 07:44:44 perrine kernel: [216258.276666] Modules linked in: udp_diag tcp_diag inet_diag xfs libcrc32c ebtable_nat ebtables ppdev lp cpufreq_cons ervative cpufreq_stats cpufreq_userspace bnep rfcomm cpufreq_powersave nfnetlink_queue nfnetlink_log autofs4 binfmt_misc input_polldev xt_LOG xt_limit xt_tcpudp ip6table_filter ip6_tables xt_set ip_set nfnetlink iptable_filter ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_ nat nf_conntrack ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd dns_resolver fscache sunrpc dm_crypt hwmon_vid loop fuse md_mod snd_usb_audio sn d_hda_codec_hdmi snd_hda_codec_realtek snd_usbmidi_lib snd_hda_intel snd_hda_codec snd_hwdep joydev hid_logitech snd_pcm_oss snd_mixer_oss ff_memless u vcvideo pl2303 i915 saa7134 usbserial snd_pcm snd_seq_midi videobuf2_vmalloc videobuf2_memops videobuf2_core snd_seq_midi_event snd_rawmidi snd_page_al loc tveeprom video snd_seq snd_seq_device snd_timer iTCO_wdt iTCO_vendor_support videobuf_dma_sg drm_kms_helper rc_core acpi_cpufreq drm v4l2_common co retemp videodev i2c_algo_bit media kvm_intel videobuf_core kvm mperf i2c_i801 i2c_core snd lpc_ich mfd_core intel_ips processor psmouse r8169 soundcore mii btusb bluetooth parport_pc rfkill thermal microcode parport crc16 thermal_sys serio_raw pcspkr evdev button hid_generic usbhid hid ext3 mbcache jb d usb_storage sr_mod cdrom sd_mod crc_t10dif ata_generic pata_via ata_piix crc32c_intel ehci_pci ehci_hcd libata scsi_mod usbcore usb_common dm_mirror dm_region_hash dm_log dm_mod Jul 29 07:44:44 perrine kernel: [216258.277537] CPU 1 Jul 29 07:44:44 perrine kernel: [216258.277550] Pid: 557, comm: lsof Not tainted 3.9-1-amd64 #1 Debian 3.9.8-1 BIOSTAR Group H55 HD/H55 HD Jul 29 07:44:44 perrine kernel: [216258.277592] RIP: 0010:[] [] mntget+0x9/0x1c Jul 29 07:44:44 perrine kernel: [216258.277631] RSP: 0018:ffff88004de57ea0 EFLAGS: 00010202 Jul 29 07:44:44 perrine kernel: [216258.277655] RAX: ffff88007312b020 RBX: 676c613f6b636162 RCX: 0000000000000013 Jul 29 07:44:44 perrine kernel: [216258.277687] RDX: 743d6d687469726f RSI: 0000000000000278 RDI: 676c613f6b636162 Jul 29 07:44:44 perrine kernel: [216258.277718] RBP: 0000000000000000 R08: 660000342f6f0000 R09: 0000000000000035 Jul 29 07:44:44 perrine kernel: [216258.277750] R10: 0000000000000001 R11: 99ffffcbd090ffca R12: ffff880036a56f00 Jul 29 07:44:44 perrine kernel: [216258.277782] R13: ffff88004de57ef0 R14: 00007fff48e0da10 R15: ffff8800257fe658 Jul 29 07:44:44 perrine kernel: [216258.277814] FS: 00007feaa1db1700(0000) GS:ffff880077440000(0000) knlGS:0000000000000000 Jul 29 07:44:44 perrine kernel: [216258.277849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 29 07:44:44 perrine kernel: [216258.277875] CR2: 00007feaa1dd9000 CR3: 0000000016182000 CR4: 00000000000007e0 Jul 29 07:44:44 perrine kernel: [216258.277906] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jul 29 07:44:44 perrine kernel: [216258.277938] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jul 29 07:44:44 perrine kernel: [216258.277969] Process lsof (pid: 557, threadinfo ffff88004de56000, task ffff880044f68770) Jul 29 07:44:44 perrine kernel: [216258.278004] Stack: Jul 29 07:44:44 perrine kernel: [216258.278015] ffff88007312b030 ffffffff81111ed7 ffff880036a56e80 ffffffff8115a0ab Jul 29 07:44:44 perrine kernel: [216258.278055] ffff8800145f3380 ffff8800257fe658 00000000ffffff9c 0000000000001000 Jul 29 07:44:44 perrine kernel: [216258.278094] ffffffff81155a57 ffffffff8111d168 676c613f6b636162 743d6d687469726f Jul 29 07:44:44 perrine kernel: [216258.278134] Call Trace: Jul 29 07:44:44 perrine kernel: [216258.278151] [] ? path_get+0xc/0x13 Jul 29 07:44:44 perrine kernel: [216258.278177] [] ? proc_fd_link+0x8b/0xab Jul 29 07:44:44 perrine kernel: [216258.278203] [] ? proc_pid_readlink+0x34/0xd2 Jul 29 07:44:44 perrine kernel: [216258.278231] [] ? touch_atime+0x72/0x12b Jul 29 07:44:44 perrine kernel: [216258.278257] [] ? sys_readlinkat+0x8e/0xd7 Jul 29 07:44:44 perrine kernel: [216258.278286] [] ? system_call_fastpath+0x16/0x1b Jul 29 07:44:44 perrine kernel: [216258.278313] Code: 14 48 8b 7b 18 be 01 00 00 00 e8 d5 ff ff ff ff 8b d0 00 00 00 5b 48 c7 c7 50 75 64 81 e9 9c cb f 3 ff 48 85 ff 53 48 89 fb 74 0e <48> 8b 7f 18 be 01 00 00 00 e8 ab ff ff ff 48 89 d8 5b c3 53 48 Jul 29 07:44:44 perrine kernel: [216258.278591] RSP Jul 29 07:44:44 perrine kernel: [216258.289487] ---[ end trace 162c0f0dadb1f4b2 ]---