Discussion:
Using dtrace to pin point a kernel driver problem
iMx
2010-06-19 09:36:42 UTC
Permalink
Good Morning,

Im some what new to the dtrace world, but very keen to learn more! I have a small home NAS box that uses a silicon image 3124 RAID card, as purely an interface card, however ive been noticing some undesirable side affects when under relatively heavy IO - there is an acknowledged bug in the si3124 kernel driver. Im running b134 Opensolaris, from checking the source code there hasnt been a change in the si3124 driver for some time, so this is likely across a lot of Solaris releases.

There are 5 disks in 1 zpool, 3 are on the si3124 card and 2 are on the motherboard - of the 3 disks on the si3124, under load ie a scrub, one disk will hit 100 w% when monitoring using 'iostat -xcn 1' and block for 4-5 seconds - iostat output at bottom of this thread. This has been confirmed as a bug on the Oracle bug site (i can add myself as an interested party, but dont seem to be able to add comments?) and also kindly confirmed by Mark Logan at Oracle on the storage-discuss mailing list.

http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=622a4828714cd0382000b9764e83?bug_id=6796802

However, can someone help me to compile some dtrace statements/scripts to help pinpoint this further? Ive been playing around with a few in particular, however what i ideally need is when it hits 100 w% to then capture what the driver is doing, otherwise my data gained is clogged up with information from when the card is running ok; not during the 4-5 seconds blocking.


***@boss:~# dtrace -n si_\*:entry'{@[probefunc]=count();}'
dtrace: description 'si_*:entry' matched 71 probes
dtrace: aggregation size lowered to 2m
^C

si_watchdog_handler 8
si_claim_free_slot 57
si_deliver_satapkt 57
si_intr 57
si_intr_command_complete 57
si_tran_start 57

Any one have any ideas for how to move forward? I keen to be able to help others with this si3124 card, and who also see the problem, pin pointing whats causing this would be a start....

Many thanks,

iMx

#####

cpu
us sy wt id
20 12 0 68
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
32.9 56.8 1085.1 6107.6 0.3 0.2 3.3 1.8 7 10 c0d0
22.9 98.7 765.9 6011.8 0.1 0.1 0.6 1.2 3 10 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
25.9 17.0 829.7 1735.3 0.6 0.1 14.1 2.8 9 12 c3t1d0
27.9 18.0 925.5 1703.4 0.6 0.1 13.5 2.9 10 13 c3t2d0
20.9 19.9 765.9 1898.8 0.7 0.2 16.4 4.2 12 17 c3t3d0
cpu
us sy wt id
3 27 0 70
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.0 287.8 64.2 23927.6 0.3 0.3 1.2 1.1 12 18 c0d0
2.0 268.8 64.2 24024.2 0.5 0.3 1.7 1.3 14 19 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 25.1 0.0 3100.7 9.6 0.2 381.2 6.0 100 15 c3t1d0
0.0 23.1 0.0 2844.0 9.6 0.1 417.1 6.3 100 14 c3t2d0
1.0 229.6 64.2 28160.5 4.3 0.5 18.5 2.3 53 52 c3t3d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
0 1 0 99
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
1 1 0 98
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t1d0
0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 100 0 c3t2d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t3d0
cpu
us sy wt id
1 6 0 93
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 19.0 64.0 63.5 0.0 0.0 0.1 1.1 0 2 c0d0
1.0 17.0 64.0 63.0 0.0 0.0 0.2 1.1 0 2 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
1.0 222.0 64.0 25216.9 8.4 0.8 37.5 3.7 95 81 c3t1d0
1.0 225.0 64.0 25504.9 8.4 0.8 36.9 3.6 93 81 c3t2d0
0.0 17.0 0.0 63.0 0.0 0.0 0.4 1.8 0 3 c3t3d0
cpu
us sy wt id
21 4 0 75
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
38.0 77.0 1216.1 274.0 0.2 0.1 1.7 1.2 4 8 c0d0
37.0 57.0 1216.1 274.0 0.2 0.1 2.6 1.1 4 7 c4d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c3t0d0
21.0 66.0 672.0 274.0 0.2 0.1 2.6 0.9 3 8 c3t1d0
16.0 60.0 544.0 274.0 0.3 0.1 3.4 1.5 6 11 c3t2d0
39.0 59.0 1280.1 274.0 0.2 0.1 2.3 0.9 6 9 c3t3d0
Ryan Johnson
2010-06-21 08:25:18 UTC
Permalink
Post by iMx
Ive been playing around with a few in particular, however what i ideally need is when it hits 100 w% to then capture what the driver is doing, otherwise my data gained is clogged up with information from when the card is running ok; not during the 4-5 seconds blocking.
As a starting point, it seems like you'd want to know which underlying
operation was taking so much time. There are three possibilities:
1. A single function could be taking forever to complete
2. A pair of functions (acquire/release, start/finish) could take a long
time to complete
3. A set of functions is being called too many times, and the cumulative
time is long

Normal profiling should identify #3 fairly quickly (count up how often a
profile probe catches functions and output/reset the result every few
seconds)

#1 (and #2 if you know what pair to look for...) can be identified by
saving a timestamp at the start and spamming stdout whenever
finish-start is over some threshold.

If you've got a #2 case on your hands, there's almost certainly some #1
which contains it (even if in userspace), so you could start with that
and drill down.

Start with that and see what suggests itself from there. You could also
imagine all kinds of mixing and matching... track profile information
whenever a suspected function is in-progress... use speculative tracing
to print info only when it turns out that your threshold conditions have
been met, etc.

Hope that helps,
Ryan

Loading...