iMx
2010-06-19 09:36:42 UTC
Good Morning,
Im some what new to the dtrace world, but very keen to learn more! I have a small home NAS box that uses a silicon image 3124 RAID card, as purely an interface card, however ive been noticing some undesirable side affects when under relatively heavy IO - there is an acknowledged bug in the si3124 kernel driver. Im running b134 Opensolaris, from checking the source code there hasnt been a change in the si3124 driver for some time, so this is likely across a lot of Solaris releases.
There are 5 disks in 1 zpool, 3 are on the si3124 card and 2 are on the motherboard - of the 3 disks on the si3124, under load ie a scrub, one disk will hit 100 w% when monitoring using 'iostat -xcn 1' and block for 4-5 seconds - iostat output at bottom of this thread. This has been confirmed as a bug on the Oracle bug site (i can add myself as an interested party, but dont seem to be able to add comments?) and also kindly confirmed by Mark Logan at Oracle on the storage-discuss mailing list.
http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=622a4828714cd0382000b9764e83?bug_id=6796802
However, can someone help me to compile some dtrace statements/scripts to help pinpoint this further? Ive been playing around with a few in particular, however what i ideally need is when it hits 100 w% to then capture what the driver is doing, otherwise my data gained is clogged up with information from when the card is running ok; not during the 4-5 seconds blocking.
***@boss:~# dtrace -n si_\*:entry'{@[probefunc]=count();}'
dtrace: description 'si_*:entry' matched 71 probes
dtrace: aggregation size lowered to 2m
^C
si_watchdog_handler 8
si_claim_free_slot 57
si_deliver_satapkt 57
si_intr 57
si_intr_command_complete 57
si_tran_start 57
Any one have any ideas for how to move forward? I keen to be able to help others with this si3124 card, and who also see the problem, pin pointing whats causing this would be a start....
Many thanks,
iMx
#####
cpu
us sy wt id
20 12 0 68
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
32.9 56.8 1085.1 6107.6 0.3 0.2 3.3 1.8 7 10 c0d0
22.9 98.7 765.9 6011.8 0.1 0.1 0.6 1.2 3 10 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
25.9 17.0 829.7 1735.3 0.6 0.1 14.1 2.8 9 12 c3t1d0
27.9 18.0 925.5 1703.4 0.6 0.1 13.5 2.9 10 13 c3t2d0
20.9 19.9 765.9 1898.8 0.7 0.2 16.4 4.2 12 17 c3t3d0
cpu
us sy wt id
3 27 0 70
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.0 287.8 64.2 23927.6 0.3 0.3 1.2 1.1 12 18 c0d0
2.0 268.8 64.2 24024.2 0.5 0.3 1.7 1.3 14 19 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 25.1 0.0 3100.7 9.6 0.2 381.2 6.0 100 15 c3t1d0
0.0 23.1 0.0 2844.0 9.6 0.1 417.1 6.3 100 14 c3t2d0
1.0 229.6 64.2 28160.5 4.3 0.5 18.5 2.3 53 52 c3t3d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
1 1 0 98
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
1 6 0 93
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 19.0 64.0 63.5 0.0 0.0 0.1 1.1 0 2 c0d0
1.0 17.0 64.0 63.0 0.0 0.0 0.2 1.1 0 2 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
1.0 222.0 64.0 25216.9 8.4 0.8 37.5 3.7 95 81 c3t1d0
1.0 225.0 64.0 25504.9 8.4 0.8 36.9 3.6 93 81 c3t2d0
0.0 17.0 0.0 63.0 0.0 0.0 0.4 1.8 0 3 c3t3d0
cpu
us sy wt id
21 4 0 75
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
38.0 77.0 1216.1 274.0 0.2 0.1 1.7 1.2 4 8 c0d0
37.0 57.0 1216.1 274.0 0.2 0.1 2.6 1.1 4 7 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
21.0 66.0 672.0 274.0 0.2 0.1 2.6 0.9 3 8 c3t1d0
16.0 60.0 544.0 274.0 0.3 0.1 3.4 1.5 6 11 c3t2d0
39.0 59.0 1280.1 274.0 0.2 0.1 2.3 0.9 6 9 c3t3d0
Im some what new to the dtrace world, but very keen to learn more! I have a small home NAS box that uses a silicon image 3124 RAID card, as purely an interface card, however ive been noticing some undesirable side affects when under relatively heavy IO - there is an acknowledged bug in the si3124 kernel driver. Im running b134 Opensolaris, from checking the source code there hasnt been a change in the si3124 driver for some time, so this is likely across a lot of Solaris releases.
There are 5 disks in 1 zpool, 3 are on the si3124 card and 2 are on the motherboard - of the 3 disks on the si3124, under load ie a scrub, one disk will hit 100 w% when monitoring using 'iostat -xcn 1' and block for 4-5 seconds - iostat output at bottom of this thread. This has been confirmed as a bug on the Oracle bug site (i can add myself as an interested party, but dont seem to be able to add comments?) and also kindly confirmed by Mark Logan at Oracle on the storage-discuss mailing list.
http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=622a4828714cd0382000b9764e83?bug_id=6796802
However, can someone help me to compile some dtrace statements/scripts to help pinpoint this further? Ive been playing around with a few in particular, however what i ideally need is when it hits 100 w% to then capture what the driver is doing, otherwise my data gained is clogged up with information from when the card is running ok; not during the 4-5 seconds blocking.
***@boss:~# dtrace -n si_\*:entry'{@[probefunc]=count();}'
dtrace: description 'si_*:entry' matched 71 probes
dtrace: aggregation size lowered to 2m
^C
si_watchdog_handler 8
si_claim_free_slot 57
si_deliver_satapkt 57
si_intr 57
si_intr_command_complete 57
si_tran_start 57
Any one have any ideas for how to move forward? I keen to be able to help others with this si3124 card, and who also see the problem, pin pointing whats causing this would be a start....
Many thanks,
iMx
#####
cpu
us sy wt id
20 12 0 68
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
32.9 56.8 1085.1 6107.6 0.3 0.2 3.3 1.8 7 10 c0d0
22.9 98.7 765.9 6011.8 0.1 0.1 0.6 1.2 3 10 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
25.9 17.0 829.7 1735.3 0.6 0.1 14.1 2.8 9 12 c3t1d0
27.9 18.0 925.5 1703.4 0.6 0.1 13.5 2.9 10 13 c3t2d0
20.9 19.9 765.9 1898.8 0.7 0.2 16.4 4.2 12 17 c3t3d0
cpu
us sy wt id
3 27 0 70
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.0 287.8 64.2 23927.6 0.3 0.3 1.2 1.1 12 18 c0d0
2.0 268.8 64.2 24024.2 0.5 0.3 1.7 1.3 14 19 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 25.1 0.0 3100.7 9.6 0.2 381.2 6.0 100 15 c3t1d0
0.0 23.1 0.0 2844.0 9.6 0.1 417.1 6.3 100 14 c3t2d0
1.0 229.6 64.2 28160.5 4.3 0.5 18.5 2.3 53 52 c3t3d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
1 1 0 98
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
1 6 0 93
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 19.0 64.0 63.5 0.0 0.0 0.1 1.1 0 2 c0d0
1.0 17.0 64.0 63.0 0.0 0.0 0.2 1.1 0 2 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
1.0 222.0 64.0 25216.9 8.4 0.8 37.5 3.7 95 81 c3t1d0
1.0 225.0 64.0 25504.9 8.4 0.8 36.9 3.6 93 81 c3t2d0
0.0 17.0 0.0 63.0 0.0 0.0 0.4 1.8 0 3 c3t3d0
cpu
us sy wt id
21 4 0 75
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
38.0 77.0 1216.1 274.0 0.2 0.1 1.7 1.2 4 8 c0d0
37.0 57.0 1216.1 274.0 0.2 0.1 2.6 1.1 4 7 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
21.0 66.0 672.0 274.0 0.2 0.1 2.6 0.9 3 8 c3t1d0
16.0 60.0 544.0 274.0 0.3 0.1 3.4 1.5 6 11 c3t2d0
39.0 59.0 1280.1 274.0 0.2 0.1 2.3 0.9 6 9 c3t3d0