CPU dispatcher and buffer questions

Discussion:

Scott Shurr

2011-07-01 13:30:33 UTC

Hello,
I have a customer who has some dtrace questions. I am guessing that
someone knows the answer to these, so I am asking here. Here are the
questions:

**********
In this document, we will describe how we assume that DTrace uses its
memory. Most assumptions result from [1]. We want these assumptions to
be validated by a DTrace expert from Oracle. This validation is
necessary to provide us confidence that DTrace can execute for a long
period of time (in the order of weeks) along with our software, without
introducing errors due to e.g. "dynamic variable drops". In addition, we
described a problem we experience with our DTrace script, for which we
want to have support from you.

[1] Sun Microsystems inc, "Solaris Dynamic Tracing Guide", September 2008.
Quotes from Solaris Dynamic Tracing Guide [1], with interpretation:
. "Each time the variable self->read is referenced in your D program,
the data object referenced is the one associated with the operating
system thread that was executing when the corresponding DTrace probe fired."
o Interpretation: Per CPU there is a dispatcher that has its own
thread, when it executes the sched:::on-cpu and sched:::off probes.
. "At that time, the ring buffer is consumed and processed. dtrace
processes each ring buffer in CPU order. Within a CPU's buffer, trace
records will be displayed in order from oldest to youngest."
Interpretation: There is a principal buffer per CPU

1) Impact on Business
We have a number of assumptions that we would like to verify about DTrace.

2) What is the OS version and the kernel patch level of the system?
SunOS nlvdhe321 5.10 Generic_141444-09 sun4v sparc SUNW,T5240

3) What is the Firmware level of the system?
SP firmware 3.0.10.2.b
SP firmware build number: 56134
SP firmware date: Tue May 25 13:02:56 PDT 2010
SP filesystem version: 0.1.22
**********
Thanks

Scott Shurr| Solaris and Network Domain, Global Systems Support
Email: Scott.Shurr-QHcLZuEGTsvQT0dZR+***@public.gmane.org <mailto:Scott.Shurr-QHcLZuEGTsvQT0dZR+***@public.gmane.org>
Phone: 781-442-1352
Oracle Global Customer Services

Log, update, and monitor your Service Request online using My Oracle
Support <http://www.oracle.com/us/support/044752.html>

Jim Mauro

2011-07-01 14:58:17 UTC

Permalink

I'm not sure I understand what is being asked here, but I'll take a shot...

Note it is virtually impossible to write a piece of software that is guaranteed
to have sufficient space to buffer a given amount of data when the rate
and size of the data flow is unknown. This is one of the robustness features
of dtrace - it's smart enough to know that, and smart enough to let the user
know when data can not be buffered.

Yes, buffers are allocated per-CPU. There are several buffer types, depending
on the dtrace invocation. Minimally, principle buffers are allocated per CPU
when a dtrace consumer (dtrace(1M)) is executed. Read;
http://wikis.sun.com/display/DTrace/Buffers+and+Buffering

The "self->read" describes a thread local variable, one of several variable
types available in DTrace. It defines the variable scope - each kernel thread
that's on the CPU when the probe(s) fires will have it's own copy of a
"self->" variable.

There is only one kernel dispatcher, not one per CPU. There are per-CPU run
queues managed by the dispatcher.

As for running a DTrace script for hours/days/weeks, I have never been down that
road. It is theoretically possible of course, and seems to be a good use of
speculative buffers or a ring buffer policy.

We can not guarantee it will execute without errors ("dynamic variable drops", etc).
We can guarantee you'll know when errors occur.

How can such guarantees be made with a dynamic tool like dtrace?
Does your customer know up-front how much data will be traced/processed/
consumed, and at what rate?

Read this;
http://blogs.oracle.com/bmc/resource/dtrace_tips.pdf

Thanks
/jim

Post by Scott Shurr
Hello,
**********
In this document, we will describe how we assume that DTrace uses its memory. Most assumptions result from [1]. We want these assumptions to be validated by a DTrace expert from Oracle. This validation is necessary to provide us confidence that DTrace can execute for a long period of time (in the order of weeks) along with our software, without introducing errors due to e.g. dynamic variable drops. In addition, we described a problem we experience with our DTrace script, for which we want to have support from you.
[1] Sun Microsystems inc, Solaris Dynamic Tracing Guide, September 2008.
Each time the variable self->read is referenced in your D program, the data object referenced is the one associated with the operating system thread that was executing when the corresponding DTrace probe fired.
o Interpretation: Per CPU there is a dispatcher that has its own thread, when it executes the sched:::on-cpu and sched:::off probes.
At that time, the ring buffer is consumed and processed. dtrace processes each ring buffer in CPU order. Within a CPU's buffer, trace records will be displayed in order from oldest to youngest.
Interpretation: There is a principal buffer per CPU
1) Impact on Business
We have a number of assumptions that we would like to verify about DTrace.
2) What is the OS version and the kernel patch level of the system?
SunOS nlvdhe321 5.10 Generic_141444-09 sun4v sparc SUNW,T5240
3) What is the Firmware level of the system?
SP firmware 3.0.10.2.b
SP firmware build number: 56134
SP firmware date: Tue May 25 13:02:56 PDT 2010
SP filesystem version: 0.1.22
**********
Thanks
<oracle.jpg>
Scott Shurr| Solaris and Network Domain, Global Systems Support
Phone: 781-442-1352
Oracle Global Customer Services
Log, update, and monitor your Service Request online using My Oracle Support
_______________________________________________
dtrace-discuss mailing list

Scott Shurr

2011-07-11 15:20:39 UTC

Permalink

My customer still thinks this is a bug:

**********
We think that DTrace is not working as designed. Therefore we wanted to
report a bug for DTrace. Thank you for the answers from mailing list,
but these do not solve our problem.

The DTrace script that we describe in the document results in "dynamic
variable drops", if the system is heavily loaded for a longer period of
time (around 24 hours). We assume that this is not the intended behavior
of DTrace.

We use a DTrace script with thread local variables (self->) and still
get "dynamic variable drops". DTrace has to be used for a longer period
of time in our systems, where the utilization of the system can be high.
In case of "dynamic variable drops" incorrect tracing is performed. As
described in the previous document we think our script is working
correctly and therefore we think that DTrace is working incorrectly. We
would like a solution for this.

The previous document contains some assumptions that we had to make
about DTrace, but which we could verify using the DTrace documentation.
In our script we use thread local variables (self->). We assume that
these variables result in at most one variable per hardware thread(which
is a part of a CPU). Because the thread local variable at a CPU can be
reused in consecutive calls of the probes and we use no other variables,
we assume that our script is not causing a full dynamic variable space.
But, DTrace appears to be doing something else that causes the dynamic
variable space to get full (for a loaded system after 24 hours), because
we still get "dynamic variable drops". This makes the DTrace solution
unreliable. We would like to see a solution for this problem.

I attached the version of the DTrace script that we use. For its
configuration, this script depends upon a process for its configuration.
After running this script for approximately 24 hours on a heavily loaded
machine, we get dynamic variable drops.
**********

It is my belief that this is not a bug, but I need something more to
give the customer to convince him of this. I've attached his script CSET.d
Thanks

Scott Shurr| Solaris and Network Domain, Global Systems Support
Email: Scott.Shurr-QHcLZuEGTsvQT0dZR+***@public.gmane.org <mailto:Scott.Shurr-QHcLZuEGTsvQT0dZR+***@public.gmane.org>
Phone: 781-442-1352
Oracle Global Customer Services

Log, update, and monitor your Service Request online using My Oracle
Support <http://www.oracle.com/us/support/044752.html>

Post by Jim Mauro
I'm not sure I understand what is being asked here, but I'll take a shot...
Note it is virtually impossible to write a piece of software that is guaranteed
to have sufficient space to buffer a given amount of data when the rate
and size of the data flow is unknown. This is one of the robustness features
of dtrace - it's smart enough to know that, and smart enough to let the user
know when data can not be buffered.
Yes, buffers are allocated per-CPU. There are several buffer types, depending
on the dtrace invocation. Minimally, principle buffers are allocated per CPU
when a dtrace consumer (dtrace(1M)) is executed. Read;
http://wikis.sun.com/display/DTrace/Buffers+and+Buffering
The "self->read" describes a thread local variable, one of several variable
types available in DTrace. It defines the variable scope - each kernel thread
that's on the CPU when the probe(s) fires will have it's own copy of a
"self->" variable.
There is only one kernel dispatcher, not one per CPU. There are per-CPU run
queues managed by the dispatcher.
As for running a DTrace script for hours/days/weeks, I have never been down that
road. It is theoretically possible of course, and seems to be a good use of
speculative buffers or a ring buffer policy.
We can not guarantee it will execute without errors ("dynamic variable drops", etc).
We can guarantee you'll know when errors occur.
How can such guarantees be made with a dynamic tool like dtrace?
Does your customer know up-front how much data will be traced/processed/
consumed, and at what rate?
Read this;
http://blogs.oracle.com/bmc/resource/dtrace_tips.pdf
Thanks
/jim

Post by Scott Shurr
Hello,
I have a customer who has some dtrace questions. I am guessing that
someone knows the answer to these, so I am asking here. Here are the
**********
In this document, we will describe how we assume that DTrace uses its
memory. Most assumptions result from [1]. We want these assumptions
to be validated by a DTrace expert from Oracle. This validation is
necessary to provide us confidence that DTrace can execute for a long
period of time (in the order of weeks) along with our software,
without introducing errors due to e.g. dynamic variable drops. In
addition, we described a problem we experience with our DTrace
script, for which we want to have support from you.
[1] Sun Microsystems inc, Solaris Dynamic Tracing Guide, September 2008.
Each time the variable self->read is referenced in your D
program, the data object referenced is the one associated with the
operating system thread that was executing when the corresponding
DTrace probe fired.
o Interpretation: Per CPU there is a dispatcher that has its own
thread, when it executes the sched:::on-cpu and sched:::off probes.
At that time, the ring buffer is consumed and processed. dtrace
processes each ring buffer in CPU order. Within a CPU's buffer, trace
records will be displayed in order from oldest to youngest.
Interpretation: There is a principal buffer per CPU
1) Impact on Business
We have a number of assumptions that we would like to verify about DTrace.
2) What is the OS version and the kernel patch level of the system?
SunOS nlvdhe321 5.10 Generic_141444-09 sun4v sparc SUNW,T5240
3) What is the Firmware level of the system?
SP firmware 3.0.10.2.b
SP firmware build number: 56134
SP firmware date: Tue May 25 13:02:56 PDT 2010
SP filesystem version: 0.1.22
**********
Thanks
<oracle.jpg>
Scott Shurr| Solaris and Network Domain, Global Systems Support
Phone: 781-442-1352
Oracle Global Customer Services
Log, update, and monitor your Service Request online using My Oracle
Support <http://www.oracle.com/us/support/044752.html>
_______________________________________________
dtrace-discuss mailing list

Jim Mauro

2011-07-11 15:36:17 UTC

Permalink

Your customer is wrong. There is no way for DTrace (or any other piece of software)
to guarantee sufficient buffer space all the time, every time.

You/we can not make any guarantees to the customer that a given DTrace script
will never run out of space for dynamic variables (thread-local variables,
associative array variables). There are things that can be done to minimize
the possibility of dynamic variable drops;
1. Clear unused dynamic variables in the D by assigning a value of zero.
2. Increase the space allocated for dynamic variables. Default is 1MB.
#pragma D option dynvarsize=2m

Send you customer a link to the slides posted previously.
Buy your customer a copy of "DTrace: Dynamic Tracing in Oracle Soiaris,
Mac OS X, and FreeBSD". Actually, but them two copies.

Please note also that among thousands (tens of thousands?) of DTrace users
using DTrace every day, we see very few complaints such as this. In fact,
with the exception of the early days of DTrace, we never see them anymore,
because we understand the error and we work around it. If your customer is
unwilling or unable to work within the constraints of this powerful tool, there's
nothing anyone can do about that. We can't change the laws of the universe
because your customer doesn't understand the complexities of dynamic
instrumentation.

Thanks
/jim

Post by Scott Shurr
**********
We think that DTrace is not working as designed. Therefore we wanted to report a bug for DTrace. Thank you for the answers from mailing list, but these do not solve our problem.
The DTrace script that we describe in the document results in "dynamic variable drops", if the system is heavily loaded for a longer period of time (around 24 hours). We assume that this is not the intended behavior of DTrace.
We use a DTrace script with thread local variables (self->) and still get "dynamic variable drops". DTrace has to be used for a longer period of time in our systems, where the utilization of the system can be high. In case of "dynamic variable drops" incorrect tracing is performed. As described in the previous document we think our script is working correctly and therefore we think that DTrace is working incorrectly. We would like a solution for this.
The previous document contains some assumptions that we had to make about DTrace, but which we could verify using the DTrace documentation. In our script we use thread local variables (self->). We assume that these variables result in at most one variable per hardware thread(which is a part of a CPU). Because the thread local variable at a CPU can be reused in consecutive calls of the probes and we use no other variables, we assume that our script is not causing a full dynamic variable space. But, DTrace appears to be doing something else that causes the dynamic variable space to get full (for a loaded system after 24 hours), because we still get "dynamic variable drops". This makes the DTrace solution unreliable. We would like to see a solution for this problem.
I attached the version of the DTrace script that we use. For its configuration, this script depends upon a process for its configuration. After running this script for approximately 24 hours on a heavily loaded machine, we get dynamic variable drops.
**********
It is my belief that this is not a bug, but I need something more to give the customer to convince him of this. I've attached his script CSET.d
Thanks
<oracle.jpg>
Scott Shurr| Solaris and Network Domain, Global Systems Support
Phone: 781-442-1352
Oracle Global Customer Services
Log, update, and monitor your Service Request online using My Oracle Support

#!/usr/sbin/dtrace -s
/*----------------------------------------------------------------------------|
| |
| DTrace script |
| |
|-----------------------------------------------------------------------------|
|
| Ident : CSET.d
| Description : This script traces thread executions
|
| History
| yyyy-mm-dd : <swchg> <author> [<platform> <release>]
| 2011-03-16 : Initial creation tbijlsma
| 2011-03-16 : <SWCHG00371518> <tbijlsma>
|
|-----------------------------------------------------------------------------|
| |
| Copyright (c) 2011, ASML Holding N.V. (including affiliates). |
| All rights reserved |
| |
|----------------------------------------------------------------------------*/
/* Do not provide additional output*/
#pragma D option quiet
/* Store the printf values in a circular buffer*/
#pragma D option bufpolicy=ring
/* Necessary to signal CSET.c */
#pragma D option destructive
uint64_t Stimercorrection; /* Offset set by the pid$1::configDtrace:entry probe*/
int *setOne; /* Pointer to an int with value one */
/* Initially sets Stimercorrection to zero
*/
BEGIN
{
Stimercorrection = 0;
}
/* This probe sets the moment that CSET calls the function "configDtrace" as
* time 0 and signals the function, by setting the value of arg0 to 1
*/
pid$1::configDtrace:entry
{
setOne = alloca(sizeof(int));
*setOne = 1;
Stimercorrection = timestamp;
copyout(setOne,arg0,sizeof(int));
}
/* When a SW thread starts executing, we store the start time */
sched:::on-cpu
/execname!="sched" && Stimercorrection/
{
self->startTime = timestamp - Stimercorrection;
}
/* When a SW thread stops executing at a processor, print information for the
* event in the ring buffer
*/
sched:::off-cpu
/self->startTime && execname!="sched" && Stimercorrection/
{
printf("%d|%d|%i|%i|%i|%s\n",
self->startTime, /* start time*/
timestamp - Stimercorrection,
cpu, /*number of CPU*/
tid, /*current thread*/
pid, /*process id*/
execname /*process name*/
);
self->startTime = 0;
}
/* If the main task calls the function retrieveTrace, we stop the DTrace script
* and the values in the ring buffer will be printed
*/
pid$1::retrieveTrace:entry
{
exit(0);
}
/* Stop the DTrace script if kill -9 has been called for CSET
* arg0 contains PID of killed process and arg1 the signal
*/
syscall::kill:entry
/arg0==$1 && ( arg1==9 || arg1==15 )/
{
exit(0);
}
/* Finally we print "EOT" to mark the end of a trace
*/
END
{
printf("EOT\n");
}
_______________________________________________
dtrace-discuss mailing list

Nico Williams

2011-07-11 17:04:56 UTC

Permalink

This assumption is wrong. It is a principal tenet of DTrace's design
that it will not critically interfere with production services. This
means, for example, that DTrace being starved for resources will not
result in a system hanging until resources are freed. This is also
the reason for DTrace's lack of looping facilities in its D language
(and byte compiled instruction set).

Nico
--

Nico Williams

2011-07-01 15:53:15 UTC

Permalink

Post by Scott Shurr
In this document, we will describe how we assume that DTrace uses its

memory. Most assumptions result from [1]. We want these assumptions to be
validated by a DTrace expert from Oracle. This validation is necessary to
provide us confidence that DTrace can execute for a long period of time (in
the order of weeks) along with our software, without introducing errors due
to e.g. âdynamic variable dropsâ. In addition, we described a problem we
experience with our DTrace script, for which we want to have support from
you.

DTrace is designed to be reliable in one sense only: it won't consume so
many resources as to kill production performance. This means that DTrace
cannot be reliable in the sense of not dropping probe data. This is
inescapable.

Nico
--

Joe Pallas

2011-07-11 18:28:06 UTC

Permalink

Post by Jim Mauro
You/we can not make any guarantees to the customer that a given DTrace script
will never run out of space for dynamic variables (thread-local variables,
associative array variables). There are things that can be done to minimize
the possibility of dynamic variable drops;
1. Clear unused dynamic variables in the D by assigning a value of zero.
2. Increase the space allocated for dynamic variables. Default is 1MB.
#pragma D option dynvarsize=2m

Jim, are you saying that if a script has a small upper bound on the number of active dynamic variables, DTrace may still run out of dynamic variable space? Because that certainly is, if not a bug, very surprising behavior [note: I wrote this before I found the undocumented information about cleanrate mentioned below].

The script provided looks entirely reasonable. It uses one thread-local variable which is set in sched:::on-cpu and cleared in sched:::off-cpu. Honestly, the only way I can imagine this growing beyond the number of hardware contexts is if there is a problem with the assumption that every thread that fires on-cpu also fires off-cpu. And if that's the case, I would indeed call that a bug.

The script doesn't do anything that isn't in several examples in the DTrace documentation. There's no reason I can see that it should consume dynamic variable space if everything is working correctly. The user's expectations are reasonable to an experienced DTrace user.

Scott, your customer's message did not include the exact complaint from dtrace. In particular, it didn't say whether the dynamic variable drops warning included "with non-empty dirty list". If it did, then the problem may be that heavy load results in so many threads being scheduled in a short time that the asynchronous dynamic variable reaper can't keep up with all the thread-local variables. Increasing the frequency of the cleanrate tunable may help with this problem (as described in slide 23 <http://blogs.oracle.com/bmc/resource/dtrace_tips.pdf>). This information is (still) not documented outside of a slide deck and the not-free book, so you may want to add an entry to CR 6221632.

joe

Nico Williams

2011-07-11 19:39:41 UTC

Permalink

The script provided looks entirely reasonable. It uses one thread-local variable which is set in sched:::on-cpu and cleared in sched:::off-cpu. Honestly, the only way I can imagine this growing beyond the number of hardware contexts is if there is a problem with the assumption that every thread that fires on-cpu also fires off-cpu. And if that's the case, I would indeed call that a bug.

Do your probe actions generate any trace data? If so then probe
firing rate matters. Unless your probes are using destructive ops
like chill() to the exclusion of any data-generating actions
(including system()), and to the exclusion of any copyin() and
friends, then your script can exhaust resources available to DTrace,
leading to drops.

The script you posted has a printf(), for example, which means that
the probe in question generates data that can be dropped. The buffer
into which the script's data get stored is a circular buffer -- if the
consumer (the user-land part of DTrace) does not consume the data from
that buffer fast enough then DTrace can drop traces. You can help
prevent this (but not guarantee that it won't happen) by running the
consumer at a higher priority than the victim process(es).

Nico
--
_______________________________________________
dtrace-discuss ma

Joe Pallas

2011-07-11 21:55:06 UTC

Permalink

Post by Nico Williams

Post by Joe Pallas
The script provided looks entirely reasonable. It uses one thread-local variable which is set in sched:::on-cpu and cleared in sched:::off-cpu. Honestly, the only way I can imagine this growing beyond the number of hardware contexts is if there is a problem with the assumption that every thread that fires on-cpu also fires off-cpu. And if that's the case, I would indeed call that a bug.

Do your probe actions generate any trace data? If so then probe
firing rate matters. Unless your probes are using destructive ops
like chill() to the exclusion of any data-generating actions
(including system()), and to the exclusion of any copyin() and
friends, then your script can exhaust resources available to DTrace,
leading to drops.
The script you posted has a printf(), for example, which means that
the probe in question generates data that can be dropped. The buffer
into which the script's data get stored is a circular buffer -- if the
consumer (the user-land part of DTrace) does not consume the data from
that buffer fast enough then DTrace can drop traces. You can help
prevent this (but not guarantee that it won't happen) by running the
consumer at a higher priority than the victim process(es).

Thanks. Three points:

1) A clarification: "The script you posted" is not my script. I'm not Scott, who posted the script, nor am I Scott's customer, who wrote the script. I'm just a member of the DTrace user community trying to be helpful. I happen to be an Oracle employee, but my involvement with Solaris and DTrace is strictly as a user.

2) The complaint is about "dynamic variable drops," so trace data from printf overrunning the principal buffer is not the issue in this particular case.

3) The script specified bufpolicy=ring. Unless I misunderstand the documentation, trace data should not generate drops with ring buffering, because it is always supposed to overwrite the oldest data.

If I'm wrong about either of those last two points, I'm always happy to improve my understanding of DTrace.

joe

Bryan Cantrill

2011-07-11 22:32:53 UTC

Permalink

First, I confess that I'm pretty uncomfortable answering Oracle
support questions here. Oracle has very aggressively monetized the
operating system with the notion that its support was of tremendous
value; that Oracle has to essentially rely on the broader community
for that support speaks volumes to the actual value that Oracle is
provided its customers in this regard. Put more bluntly: let no one
question the vitality of illumos when it is we in the illumos
community who are fielding a support call about your operating
system...

Now, on to the meat here. The problem here is a bug in the script,
albeit one that is due to some subtle semantics: threads that exit
will fire sched:::on-cpu but never fire sched:::off-cpu. (That is, a
thread exits while on the CPU; it does not give up the CPU to die.)
This can, of course, be easily confirmed with DTrace itself; try this
experiment:

# dtrace -n sched:::on-cpu,sched:::off-cpu,proc:::lwp-exit'/pid ==
$target/{printf("%d: %s\n", timestamp, probename)}' -q -c
"/usr/bin/sleep 1" | sort -n

When I run this on my illumos machine, I see:

153116965615211: on-cpu
153116965637283: off-cpu
153116965876279: on-cpu
153116966272791: off-cpu
153116966908637: on-cpu
153116966979203: off-cpu
153116967125718: on-cpu
153116967215732: off-cpu
153116967377948: on-cpu
153116967447287: off-cpu
153116967573339: on-cpu
153116968106208: off-cpu
153116968166568: on-cpu
153116968181498: off-cpu
153116968347277: on-cpu
153116968367427: off-cpu
153116980796598: on-cpu
153116980810590: off-cpu
153116980969177: on-cpu
153116981580498: off-cpu
153116981746760: on-cpu
153116981761285: off-cpu
153116981909940: on-cpu
153116981931360: off-cpu
153116996876523: on-cpu
153116996890628: off-cpu
153116997047335: on-cpu
153116999245574: off-cpu
153117999387773: on-cpu
153117999472949: lwp-exit

So the script is leaking a dynamic variable whenever a thread that has
been off CPU exits -- and it's not at all surprising that dynamic
variable drops are only seen after 24 hours and only under heavy load.
To fix this, the customer should add a clause for proc:::lwp-exit
that zeroes out their thread-local variable....

- Bryan

Continue reading on narkive:

Search results for 'CPU dispatcher and buffer questions' (Questions and Answers)

replies

Creating a "Why we should switch to Mac" Speech. Help Please?

started 2007-10-26 15:44:14 UTC

desktops