1号機のシスログを2号機へリモート転送設定してからしばらくしたある日、
ついにその日がやってきました。
1号機がフリーズしたのです。
その時行っていた作業としては、sshで複数ログインして、何かコンパイルしながら別のファイルを移動
させたとか、その程度だったのですが・・・。
コマンド打って Enter を押したところ反応がなくなり、別のターミナル(PuTTY)でssh接続すると、
つながりはするがファイルアクセスできない、といった感じだでした。
そのときのsyslogを2号機側でうまく回収できた(リモート転送でうまく拾えた)ので、下記へ載せて
おきます。
Mar 17 13:06:52 rasp00 kernel: [646575.119720] INFO: task jbd2/mmcblk0p2-:43 blocked for more than 120 seconds. Mar 17 13:06:52 rasp00 kernel: [646575.119756] Not tainted 3.18.7+ #755 Mar 17 13:06:52 rasp00 kernel: [646575.119767] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 17 13:06:52 rasp00 kernel: [646575.119778] jbd2/mmcblk0p2- D c055689c 0 43 2 0x00000000 Mar 17 13:06:52 rasp00 kernel: [646575.119846] [<c055689c>] (__schedule) from [<c0556c74>] (schedule+0x40/0x8c) Mar 17 13:06:52 rasp00 kernel: [646575.119873] [<c0556c74>] (schedule) from [<c0556d68>] (io_schedule+0xa8/0x110) Mar 17 13:06:52 rasp00 kernel: [646575.119899] [<c0556d68>] (io_schedule) from [<c0557654>] (bit_wait_io+0x44/0x68) Mar 17 13:06:52 rasp00 kernel: [646575.119922] [<c0557654>] (bit_wait_io) from [<c0557318>] (__wait_on_bit+0x90/0xc8) Mar 17 13:06:52 rasp00 kernel: [646575.119947] [<c0557318>] (__wait_on_bit) from [<c05573c8>] (out_of_line_wait_on_bit+0x78/0x84) Mar 17 13:06:52 rasp00 kernel: [646575.119984] [<c05573c8>] (out_of_line_wait_on_bit) from [<c016c7ec>] (__wait_on_buffer+0x30/0x38) Mar 17 13:06:52 rasp00 kernel: [646575.120021] [<c016c7ec>] (__wait_on_buffer) from [<c02112ec>] (jbd2_journal_commit_transaction+0xf74/0x1f1c) Mar 17 13:06:52 rasp00 kernel: [646575.120056] [<c02112ec>] (jbd2_journal_commit_transaction) from [<c0217754>] (kjournald2+0xdc/0x2e4) Mar 17 13:06:52 rasp00 kernel: [646575.120095] [<c0217754>] (kjournald2) from [<c003edf8>] (kthread+0xd8/0xf4) Mar 17 13:06:52 rasp00 kernel: [646575.120131] [<c003edf8>] (kthread) from [<c000e988>] (ret_from_fork+0x14/0x20) Mar 17 13:06:52 rasp00 kernel: [646575.120165] INFO: task kworker/u2:0:8096 blocked for more than 120 seconds. Mar 17 13:06:52 rasp00 kernel: [646575.120177] Not tainted 3.18.7+ #755 Mar 17 13:06:52 rasp00 kernel: [646575.120185] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 17 13:06:52 rasp00 kernel: [646575.120194] kworker/u2:0 D c055689c 0 8096 2 0x00000000 Mar 17 13:06:52 rasp00 kernel: [646575.120234] Workqueue: kmmcd mmc_rescan Mar 17 13:06:52 rasp00 kernel: [646575.120265] [<c055689c>] (__schedule) from [<c0556c74>] (schedule+0x40/0x8c) Mar 17 13:06:52 rasp00 kernel: [646575.120290] [<c0556c74>] (schedule) from [<c04252e8>] (__mmc_claim_host+0xd0/0x264) Mar 17 13:06:52 rasp00 kernel: [646575.120314] [<c04252e8>] (__mmc_claim_host) from [<c042549c>] (mmc_get_card+0x20/0x24) Mar 17 13:06:52 rasp00 kernel: [646575.120345] [<c042549c>] (mmc_get_card) from [<c042d348>] (mmc_sd_detect+0x2c/0x80) Mar 17 13:06:52 rasp00 kernel: [646575.120370] [<c042d348>] (mmc_sd_detect) from [<c0427d88>] (mmc_rescan+0x118/0x53c) Mar 17 13:06:52 rasp00 kernel: [646575.120394] [<c0427d88>] (mmc_rescan) from [<c0039b3c>] (process_one_work+0x13c/0x444) Mar 17 13:06:52 rasp00 kernel: [646575.120418] [<c0039b3c>] (process_one_work) from [<c003a998>] (worker_thread+0x190/0x580) Mar 17 13:06:52 rasp00 kernel: [646575.120443] [<c003a998>] (worker_thread) from [<c003edf8>] (kthread+0xd8/0xf4) Mar 17 13:06:52 rasp00 kernel: [646575.120471] [<c003edf8>] (kthread) from [<c000e988>] (ret_from_fork+0x14/0x20) Mar 17 13:08:52 rasp00 kernel: [646695.122291] INFO: task jbd2/mmcblk0p2-:43 blocked for more than 120 seconds. Mar 17 13:08:52 rasp00 kernel: [646695.122327] Not tainted 3.18.7+ #755 Mar 17 13:08:52 rasp00 kernel: [646695.122339] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 17 13:08:52 rasp00 kernel: [646695.122351] jbd2/mmcblk0p2- D c055689c 0 43 2 0x00000000 Mar 17 13:08:52 rasp00 kernel: [646695.122420] [<c055689c>] (__schedule) from [<c0556c74>] (schedule+0x40/0x8c) Mar 17 13:08:52 rasp00 kernel: [646695.122449] [<c0556c74>] (schedule) from [<c0556d68>] (io_schedule+0xa8/0x110) Mar 17 13:08:52 rasp00 kernel: [646695.122474] [<c0556d68>] (io_schedule) from [<c0557654>] (bit_wait_io+0x44/0x68) Mar 17 13:08:52 rasp00 kernel: [646695.122498] [<c0557654>] (bit_wait_io) from [<c0557318>] (__wait_on_bit+0x90/0xc8) Mar 17 13:08:52 rasp00 kernel: [646695.122523] [<c0557318>] (__wait_on_bit) from [<c05573c8>] (out_of_line_wait_on_bit+0x78/0x84) Mar 17 13:08:52 rasp00 kernel: [646695.122558] [<c05573c8>] (out_of_line_wait_on_bit) from [<c016c7ec>] (__wait_on_buffer+0x30/0x38) Mar 17 13:08:52 rasp00 kernel: [646695.122596] [<c016c7ec>] (__wait_on_buffer) from [<c02112ec>] (jbd2_journal_commit_transaction+0xf74/0x1f1c) Mar 17 13:08:52 rasp00 kernel: [646695.122632] [<c02112ec>] (jbd2_journal_commit_transaction) from [<c0217754>] (kjournald2+0xdc/0x2e4) Mar 17 13:08:52 rasp00 kernel: [646695.122672] [<c0217754>] (kjournald2) from [<c003edf8>] (kthread+0xd8/0xf4) Mar 17 13:08:52 rasp00 kernel: [646695.122706] [<c003edf8>] (kthread) from [<c000e988>] (ret_from_fork+0x14/0x20) Mar 17 13:08:52 rasp00 kernel: [646695.122740] INFO: task kworker/u2:0:8096 blocked for more than 120 seconds. Mar 17 13:08:52 rasp00 kernel: [646695.122751] Not tainted 3.18.7+ #755 Mar 17 13:08:52 rasp00 kernel: [646695.122760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 17 13:08:52 rasp00 kernel: [646695.122771] kworker/u2:0 D c055689c 0 8096 2 0x00000000 Mar 17 13:08:52 rasp00 kernel: [646695.122809] Workqueue: kmmcd mmc_rescan Mar 17 13:08:52 rasp00 kernel: [646695.122841] [<c055689c>] (__schedule) from [<c0556c74>] (schedule+0x40/0x8c) Mar 17 13:08:52 rasp00 kernel: [646695.122866] [<c0556c74>] (schedule) from [<c04252e8>] (__mmc_claim_host+0xd0/0x264) Mar 17 13:08:52 rasp00 kernel: [646695.122890] [<c04252e8>] (__mmc_claim_host) from [<c042549c>] (mmc_get_card+0x20/0x24) Mar 17 13:08:52 rasp00 kernel: [646695.122920] [<c042549c>] (mmc_get_card) from [<c042d348>] (mmc_sd_detect+0x2c/0x80) Mar 17 13:08:52 rasp00 kernel: [646695.122946] [<c042d348>] (mmc_sd_detect) from [<c0427d88>] (mmc_rescan+0x118/0x53c) Mar 17 13:08:52 rasp00 kernel: [646695.122971] [<c0427d88>] (mmc_rescan) from [<c0039b3c>] (process_one_work+0x13c/0x444) Mar 17 13:08:52 rasp00 kernel: [646695.122994] [<c0039b3c>] (process_one_work) from [<c003a998>] (worker_thread+0x190/0x580) Mar 17 13:08:52 rasp00 kernel: [646695.123021] [<c003a998>] (worker_thread) from [<c003edf8>] (kthread+0xd8/0xf4) Mar 17 13:08:52 rasp00 kernel: [646695.123049] [<c003edf8>] (kthread) from [<c000e988>] (ret_from_fork+0x14/0x20) Mar 17 13:10:52 rasp00 kernel: [646815.124882] INFO: task jbd2/mmcblk0p2-:43 blocked for more than 120 seconds. Mar 17 13:10:52 rasp00 kernel: [646815.124919] Not tainted 3.18.7+ #755 Mar 17 13:10:52 rasp00 kernel: [646815.124931] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 17 13:10:52 rasp00 kernel: [646815.124944] jbd2/mmcblk0p2- D c055689c 0 43 2 0x00000000 Mar 17 13:10:52 rasp00 kernel: [646815.125012] [<c055689c>] (__schedule) from [<c0556c74>] (schedule+0x40/0x8c) Mar 17 13:10:52 rasp00 kernel: [646815.125040] [<c0556c74>] (schedule) from [<c0556d68>] (io_schedule+0xa8/0x110) Mar 17 13:10:52 rasp00 kernel: [646815.125066] [<c0556d68>] (io_schedule) from [<c0557654>] (bit_wait_io+0x44/0x68) Mar 17 13:10:52 rasp00 kernel: [646815.125092] [<c0557654>] (bit_wait_io) from [<c0557318>] (__wait_on_bit+0x90/0xc8) Mar 17 13:10:52 rasp00 kernel: [646815.125116] [<c0557318>] (__wait_on_bit) from [<c05573c8>] (out_of_line_wait_on_bit+0x78/0x84) Mar 17 13:10:52 rasp00 kernel: [646815.125152] [<c05573c8>] (out_of_line_wait_on_bit) from [<c016c7ec>] (__wait_on_buffer+0x30/0x38) Mar 17 13:10:52 rasp00 kernel: [646815.125189] [<c016c7ec>] (__wait_on_buffer) from [<c02112ec>] (jbd2_journal_commit_transaction+0xf74/0x1f1c) Mar 17 13:10:52 rasp00 kernel: [646815.125226] [<c02112ec>] (jbd2_journal_commit_transaction) from [<c0217754>] (kjournald2+0xdc/0x2e4) Mar 17 13:10:52 rasp00 kernel: [646815.125263] [<c0217754>] (kjournald2) from [<c003edf8>] (kthread+0xd8/0xf4) Mar 17 13:10:52 rasp00 kernel: [646815.125298] [<c003edf8>] (kthread) from [<c000e988>] (ret_from_fork+0x14/0x20) Mar 17 13:10:52 rasp00 kernel: [646815.125333] INFO: task kworker/u2:0:8096 blocked for more than 120 seconds. Mar 17 13:10:52 rasp00 kernel: [646815.125345] Not tainted 3.18.7+ #755 Mar 17 13:10:52 rasp00 kernel: [646815.125354] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 17 13:10:52 rasp00 kernel: [646815.125363] kworker/u2:0 D c055689c 0 8096 2 0x00000000 Mar 17 13:10:52 rasp00 kernel: [646815.125402] Workqueue: kmmcd mmc_rescan Mar 17 13:10:52 rasp00 kernel: [646815.125433] [<c055689c>] (__schedule) from [<c0556c74>] (schedule+0x40/0x8c) Mar 17 13:10:52 rasp00 kernel: [646815.125461] [<c0556c74>] (schedule) from [<c04252e8>] (__mmc_claim_host+0xd0/0x264) Mar 17 13:10:52 rasp00 kernel: [646815.125485] [<c04252e8>] (__mmc_claim_host) from [<c042549c>] (mmc_get_card+0x20/0x24) Mar 17 13:10:52 rasp00 kernel: [646815.125515] [<c042549c>] (mmc_get_card) from [<c042d348>] (mmc_sd_detect+0x2c/0x80) Mar 17 13:10:52 rasp00 kernel: [646815.125541] [<c042d348>] (mmc_sd_detect) from [<c0427d88>] (mmc_rescan+0x118/0x53c) Mar 17 13:10:52 rasp00 kernel: [646815.125565] [<c0427d88>] (mmc_rescan) from [<c0039b3c>] (process_one_work+0x13c/0x444) Mar 17 13:10:52 rasp00 kernel: [646815.125590] [<c0039b3c>] (process_one_work) from [<c003a998>] (worker_thread+0x190/0x580) Mar 17 13:10:52 rasp00 kernel: [646815.125614] [<c003a998>] (worker_thread) from [<c003edf8>] (kthread+0xd8/0xf4) Mar 17 13:10:52 rasp00 kernel: [646815.125641] [<c003edf8>] (kthread) from [<c000e988>] (ret_from_fork+0x14/0x20) Mar 17 13:12:52 rasp00 kernel: [646935.127474] INFO: task kworker/u2:0:8096 blocked for more than 120 seconds. Mar 17 13:12:52 rasp00 kernel: [646935.127512] Not tainted 3.18.7+ #755 Mar 17 13:12:52 rasp00 kernel: [646935.127524] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 17 13:12:52 rasp00 kernel: [646935.127536] kworker/u2:0 D c055689c 0 8096 2 0x00000000 Mar 17 13:12:52 rasp00 kernel: [646935.127587] Workqueue: kmmcd mmc_rescan Mar 17 13:12:52 rasp00 kernel: [646935.127639] [<c055689c>] (__schedule) from [<c0556c74>] (schedule+0x40/0x8c) Mar 17 13:12:52 rasp00 kernel: [646935.127667] [<c0556c74>] (schedule) from [<c04252e8>] (__mmc_claim_host+0xd0/0x264) Mar 17 13:12:52 rasp00 kernel: [646935.127692] [<c04252e8>] (__mmc_claim_host) from [<c042549c>] (mmc_get_card+0x20/0x24) Mar 17 13:12:52 rasp00 kernel: [646935.127723] [<c042549c>] (mmc_get_card) from [<c042d348>] (mmc_sd_detect+0x2c/0x80) Mar 17 13:12:52 rasp00 kernel: [646935.127749] [<c042d348>] (mmc_sd_detect) from [<c0427d88>] (mmc_rescan+0x118/0x53c) Mar 17 13:12:52 rasp00 kernel: [646935.127779] [<c0427d88>] (mmc_rescan) from [<c0039b3c>] (process_one_work+0x13c/0x444) Mar 17 13:12:52 rasp00 kernel: [646935.127804] [<c0039b3c>] (process_one_work) from [<c003a998>] (worker_thread+0x190/0x580) Mar 17 13:12:52 rasp00 kernel: [646935.127833] [<c003a998>] (worker_thread) from [<c003edf8>] (kthread+0xd8/0xf4) Mar 17 13:12:52 rasp00 kernel: [646935.127866] [<c003edf8>] (kthread) from [<c000e988>] (ret_from_fork+0x14/0x20) Mar 17 13:12:52 rasp00 kernel: [646935.127896] INFO: task rm:12997 blocked for more than 120 seconds. Mar 17 13:12:52 rasp00 kernel: [646935.127907] Not tainted 3.18.7+ #755 Mar 17 13:12:52 rasp00 kernel: [646935.127916] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Mar 17 13:12:52 rasp00 kernel: [646935.127925] rm D c055689c 0 12997 12968 0x00000001 Mar 17 13:12:52 rasp00 kernel: [646935.127962] [<c055689c>] (__schedule) from [<c0556c74>] (schedule+0x40/0x8c) Mar 17 13:12:52 rasp00 kernel: [646935.127987] [<c0556c74>] (schedule) from [<c0556d68>] (io_schedule+0xa8/0x110) Mar 17 13:12:52 rasp00 kernel: [646935.128013] [<c0556d68>] (io_schedule) from [<c0557654>] (bit_wait_io+0x44/0x68) Mar 17 13:12:52 rasp00 kernel: [646935.128038] [<c0557654>] (bit_wait_io) from [<c0557318>] (__wait_on_bit+0x90/0xc8) Mar 17 13:12:52 rasp00 kernel: [646935.128062] [<c0557318>] (__wait_on_bit) from [<c05573c8>] (out_of_line_wait_on_bit+0x78/0x84) Mar 17 13:12:52 rasp00 kernel: [646935.128094] [<c05573c8>] (out_of_line_wait_on_bit) from [<c020e660>] (do_get_write_access+0x23c/0x5f0) Mar 17 13:12:52 rasp00 kernel: [646935.128119] [<c020e660>] (do_get_write_access) from [<c020ea48>] (jbd2_journal_get_write_access+0x34/0x48) Mar 17 13:12:52 rasp00 kernel: [646935.128155] [<c020ea48>] (jbd2_journal_get_write_access) from [<c01f3d14>] (__ext4_journal_get_write_access+0x3c/0x78) Mar 17 13:12:52 rasp00 kernel: [646935.128197] [<c01f3d14>] (__ext4_journal_get_write_access) from [<c01d2a00>] (ext4_delete_entry+0xa0/0x180) Mar 17 13:12:52 rasp00 kernel: [646935.128226] [<c01d2a00>] (ext4_delete_entry) from [<c01d40f8>] (ext4_unlink+0x220/0x3a4) Mar 17 13:12:52 rasp00 kernel: [646935.128259] [<c01d40f8>] (ext4_unlink) from [<c0144e38>] (vfs_unlink+0xb4/0x1bc) Mar 17 13:12:52 rasp00 kernel: [646935.128286] [<c0144e38>] (vfs_unlink) from [<c0149804>] (do_unlinkat+0x1fc/0x21c) Mar 17 13:12:52 rasp00 kernel: [646935.128312] [<c0149804>] (do_unlinkat) from [<c0149fb4>] (SyS_unlinkat+0x28/0x3c) Mar 17 13:12:52 rasp00 kernel: [646935.128339] [<c0149fb4>] (SyS_unlinkat) from [<c000e8c0>] (ret_fast_syscall+0x0/0x48) Mar 17 13:13:32 rasp00 kernel: [646975.078279] mmc0: Card stuck in programming state! mmcblk0 card_busy_detect Mar 17 13:13:32 rasp00 kernel: [646975.089130] blk_update_request: I/O error, dev mmcblk0, sector 4628336 Mar 17 13:13:32 rasp00 kernel: [646975.089183] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:317: I/O error -5 writing to inode 133990 (offset 0 size 4096 starting block 578543) Mar 17 13:13:32 rasp00 kernel: [646975.089211] Buffer I/O error on device mmcblk0p2, logical block 563182 Mar 17 13:13:32 rasp00 kernel: [646975.089339] ------------[ cut here ]------------ Mar 17 13:13:32 rasp00 kernel: [646975.089394] WARNING: CPU: 0 PID: 41 at drivers/mmc/host/bcm2835-mmc.c:476 bcm2835_mmc_transfer_dma+0x1ac/0x1e4() Mar 17 13:13:32 rasp00 kernel: [646975.089406] Modules linked in: nfnetlink_log nfnetlink bridge stp llc ipv6 snd_bcm2835 snd_pcm snd_seq snd_seq_device snd_timer snd uio_pdrv_genirq uio sg Mar 17 13:13:32 rasp00 kernel: [646975.089478] CPU: 0 PID: 41 Comm: mmcqd/0 Not tainted 3.18.7+ #755 Mar 17 13:13:32 rasp00 kernel: [646975.089536] [<c00151fc>] (unwind_backtrace) from [<c0012710>] (show_stack+0x20/0x24) Mar 17 13:13:32 rasp00 kernel: [646975.089576] [<c0012710>] (show_stack) from [<c0555b30>] (dump_stack+0x20/0x28) Mar 17 13:13:32 rasp00 kernel: [646975.089611] [<c0555b30>] (dump_stack) from [<c0022ef4>] (warn_slowpath_common+0x7c/0x9c) Mar 17 13:13:32 rasp00 kernel: [646975.089638] [<c0022ef4>] (warn_slowpath_common) from [<c0022fd0>] (warn_slowpath_null+0x2c/0x34) Mar 17 13:13:32 rasp00 kernel: [646975.089666] [<c0022fd0>] (warn_slowpath_null) from [<c043db20>] (bcm2835_mmc_transfer_dma+0x1ac/0x1e4) Mar 17 13:13:32 rasp00 kernel: [646975.089697] [<c043db20>] (bcm2835_mmc_transfer_dma) from [<c043ee48>] (bcm2835_mmc_request+0xf8/0x138) Mar 17 13:13:32 rasp00 kernel: [646975.089729] [<c043ee48>] (bcm2835_mmc_request) from [<c0424f90>] (mmc_start_request+0xd4/0xf8) Mar 17 13:13:32 rasp00 kernel: [646975.089755] [<c0424f90>] (mmc_start_request) from [<c0425f60>] (mmc_start_req+0x318/0x3e4) Mar 17 13:13:32 rasp00 kernel: [646975.089779] [<c0425f60>] (mmc_start_req) from [<c0435680>] (mmc_blk_issue_rw_rq+0x25c/0xbd0) Mar 17 13:13:32 rasp00 kernel: [646975.089805] [<c0435680>] (mmc_blk_issue_rw_rq) from [<c04360f8>] (mmc_blk_issue_rq+0x104/0x548) Mar 17 13:13:32 rasp00 kernel: [646975.089829] [<c04360f8>] (mmc_blk_issue_rq) from [<c04374d8>] (mmc_queue_thread+0xf0/0x188) Mar 17 13:13:32 rasp00 kernel: [646975.089864] [<c04374d8>] (mmc_queue_thread) from [<c003edf8>] (kthread+0xd8/0xf4) Mar 17 13:13:32 rasp00 kernel: [646975.089891] [<c003edf8>] (kthread) from [<c000e988>] (ret_from_fork+0x14/0x20) Mar 17 13:13:32 rasp00 kernel: [646975.089905] ---[ end trace 414103c321ca4717 ]--- Mar 17 13:13:32 rasp00 kernel: [646975.090016] mmcblk0: timed out sending r/w cmd command, card status 0x400e00 Mar 17 13:13:32 rasp00 kernel: [646975.090121] Unable to handle kernel NULL pointer dereference at virtual address 00000028 (リモート転送で拾うことができたのはここまで。このあと1号機は無反応となってしまいました。)
最初のほうに mmcblk0p2- などとあるので、やはりというか、SDカードへのアクセスに失敗し、
そのままだんだんとI/Oがたまっていき、カーネルが死んでしまったものと思われます。
これがどなたか詳しい方の目に留まり、さらに安定したシステムになることを願っています(他力本願モード:-)。