Check the archives for Jon Haslam's post on DTrace being sluggish. The original poster also had a T5440. The reason he fave is that DTrace will allocate 4MB per CPU and again if you are using aggregations.
In your case this means 2GB of allocations.Although you have loads of RAM it still has to be managed. If your system has been running for a long time, there may not be many large pages left. In the worst case 2GB is a huge number of 8K pages, and that's a lot if work!
So your run queues are growing because DTrace has given the system quite a lot of extra work to do. Jon suggests that you could tune the buffer size down, but that you may begin to see drops as a result.
Your simple profiling script should be fine. Obviously extreme profile frequencies will have a greater probe effect, but I've never considered even 997Hz that extreme (although you do have one of the slowest SPARC machines on the planet)!
I think your issue is more to do with initialising DTrace. It may have settled down if you had let it, but I don't blame you for pulling the plug. That isn't what DTrace claims on the tin. It's just that on sane hardware there will be less of an impact.
p.s. in cases like this, nothing beats a bit of good old vmstat, mpstat, iostat -xnz and (less old) prstat -Lm ... say 12 samples with 5 second interval of each
Post by testerThanks Mike.
Post by Mike GerdtsIs it unresponsive only when dtrace is running or
normally?
It becomes unresponsive after starting Dtrace.
Post by Mike GerdtsWith recent releases of Solaris, I've found systems to be
quite responsive
with a load average that is many times higher than
the number of CPU's
(as seen by mpstat - 128 for the typical dual
processor T5140/T5240).
The system is a T5440. 256G RAM.
Post by Mike GerdtsIt seems highly unlikely that the problem is related
to being short on
CPU (again, only at about 12% CPU utilization).
vmstat reports more than 95%CPU free. core utilization is between 2-3%
Post by Mike GerdtsIf it is unresponsive or sluggish before you start
dtrace, I would
No. It gets sluggish after Dtrace is started.
Post by Mike Gerdts- The machine is short on RAM and is paging. Use
vmstat to diagnose.
Look at the "b" column (blocked on I/O) and paging
related columns
such as sr (scan rate). You would see things as being
extremely
sluggish (e.g. when executing a command) because the
disk reads needed
to load the commands and related libraries are
getting queue behind
the IO requests for paging.
there's plenty of RAM ~240G
Post by Mike Gerdts- The network is having troubles. Look for a duplex
mismatch or
kstat -p e1000g | nawk '$NF != 0 && $0 ~
/(err|drop|fail)/'
- There is some other I/O problem. Does iostat -En
show hard errors
on any disk? Does "iostat -xzn 1" show svc_time +
wsvc_time over
20ms? How many I/Os are queued and active?
Your question is performance - but you jumped to the
conclusion that
dtrace would tell you the answer. It may, but there
are likely other
tools that will be helpful with a lot less effort and
less system
impact. perf-discuss may be a better list to ask for
more help.
we were checking application performance when we enaged this script to check where the hot spots were; we had to Ctrl-c dtrace because of it behavior.
Even now on a idle server (same system) here is what is what I see, although not that unresponsive now ( vey little load to start with)
Total: 162 processes, 2058 lwps, load averages: 0.79, 2.87, 2.26
Total: 162 processes, 2058 lwps, load averages: 1.03, 2.89, 2.27
After Dtrace
Total: 161 processes, 2057 lwps, load averages: 20.61, 6.88, 3.61
Total: 161 processes, 2057 lwps, load averages: 38.40, 10.76, 4.93
Total: 161 processes, 2057 lwps, load averages: 35.38, 10.59, 4.91
This time I was able to get some o/p from the script otherwise with load I have not seen script o/p. Now you can imagine the state of the system if the initail load was 10-15.
Please let mw know if you need more details.
--
This message posted from opensolaris.org
_______________________________________________
dtrace-discuss mailing list