Discussion:
Reaping enablings on defunct providers
Bryan Cantrill
2011-07-02 18:27:14 UTC
Permalink
All,

A longstanding problem that we have had is that enablings on defunct
providers (e.g., USDT probes on dead processes) are not reaped: the
probes will exist as long as there exists an enabling for them. When
processes are turning over frequently (or when enablings are
long-running), this can clog up the probe space to the point that
DTrace probe creation will silently fail (an absolutely maddening
failure mode). This has been hit several times over the years (we
were nailed by it on our build machines at Fishworks) -- so when Theo
Schlossnagle mentioned to me that he was getting killed by this
problem in an environment with rapidly turning over Postgres
processes, I was embarrassed that I hadn't tackled it earlier. As it
turns out, it was a tad thorny for locking reasons, but a patch for
this problem is attached. We have integrated this into our bits at
Joyent (internal ticket is OS-454, "enablings on defunct providers
prevent providers from unregistering"), so you'll see this show up
soon at http://github.com/joyent/illumos-joyent -- but I wanted to
give everyone here a heads-up.

Anyway, patch is attached, with my thanks to Adam for a helpful
discussion on fasttrap's asynchronous provider retiring mechanics.
Note that Adam hasn't (yet) reviewed this, and its integration
upstream should wait until he's had a chance to look it over. Please
let me know if you have any questions or comments!

Thanks,
Bryan
Theo Schlossnagle
2011-07-02 18:37:55 UTC
Permalink
This is awesome. Thanks for taking the time to dive in and solve this one.

We do indeed have some high-turnover postgres systems that are
impossible to effectively trace due to this issue.

I'm excited to get this into a production kernel over here... in due time :-)
Post by Bryan Cantrill
All,
A longstanding problem that we have had is that enablings on defunct
providers (e.g., USDT probes on dead processes) are not reaped:  the
probes will exist as long as there exists an enabling for them.  When
processes are turning over frequently (or when enablings are
long-running), this can clog up the probe space to the point that
DTrace probe creation will silently fail (an absolutely maddening
failure mode).  This has been hit several times over the years (we
were nailed by it on our build machines at Fishworks) -- so when Theo
Schlossnagle mentioned to me that he was getting killed by this
problem in an environment with rapidly turning over Postgres
processes, I was embarrassed that I hadn't tackled it earlier.  As it
turns out, it was a tad thorny for locking reasons, but a patch for
this problem is attached.  We have integrated this into our bits at
Joyent (internal ticket is OS-454, "enablings on defunct providers
prevent providers from unregistering"), so you'll see this show up
soon at http://github.com/joyent/illumos-joyent -- but I wanted to
give everyone here a heads-up.
Anyway, patch is attached, with my thanks to Adam for  a helpful
discussion on fasttrap's asynchronous provider retiring mechanics.
Note that Adam hasn't (yet) reviewed this, and its integration
upstream should wait until he's had a chance to look it over. Please
let me know if you have any questions or comments!
       Thanks,
       Bryan
--
Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
Adam Leventhal
2011-07-12 23:38:15 UTC
Permalink
Hey Bryan,

This is great stuff, and -- as you say -- something that we've all
wanted for a long long time. The code looks great. The only addition
I'd ask for is if you considered having a case with bufpolicy=ring
and/or anonymous tracing.

I had a few questions:

Can you explain the purpose of dtrace_unregister_defunct_reap? Is the
idea to try for providers that were recently defunct and then give up
after a minute?

Should the fasttrap cleanup stuff use a taskq rather than its timeout
stuff (not asking you to do it of course)?

Can you talk a little about the approach you took? Obviously if you're
using speculative tracing or bufpolicy=ring or anonymous tracing it
limits the efficacy of what you've done. Would it have been possible
to disable the ECBs without destroying them? I assume that's horribly
facile, but I'd be interested to understand.

dtrace_buffer_t structures should be cache-size aligned since they're
per-CPU structures, yes? Would that be worth noting explicitly? And
why did you elect to have padding in two places rather than just 7
64-bit values at the end?

Thanks; again, this is really great.

Adam
Post by Bryan Cantrill
All,
A longstanding problem that we have had is that enablings on defunct
providers (e.g., USDT probes on dead processes) are not reaped:  the
probes will exist as long as there exists an enabling for them.  When
processes are turning over frequently (or when enablings are
long-running), this can clog up the probe space to the point that
DTrace probe creation will silently fail (an absolutely maddening
failure mode).  This has been hit several times over the years (we
were nailed by it on our build machines at Fishworks) -- so when Theo
Schlossnagle mentioned to me that he was getting killed by this
problem in an environment with rapidly turning over Postgres
processes, I was embarrassed that I hadn't tackled it earlier.  As it
turns out, it was a tad thorny for locking reasons, but a patch for
this problem is attached.  We have integrated this into our bits at
Joyent (internal ticket is OS-454, "enablings on defunct providers
prevent providers from unregistering"), so you'll see this show up
soon at http://github.com/joyent/illumos-joyent -- but I wanted to
give everyone here a heads-up.
Anyway, patch is attached, with my thanks to Adam for  a helpful
discussion on fasttrap's asynchronous provider retiring mechanics.
Note that Adam hasn't (yet) reviewed this, and its integration
upstream should wait until he's had a chance to look it over. Please
let me know if you have any questions or comments!
       Thanks,
       Bryan
_______________________________________________
Developer mailing list
http://lists.illumos.org/m/listinfo/developer
--
Adam Leventhal, Delphix
http://dtrace.org/blogs/ahl

275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
Bryan Cantrill
2011-07-13 00:09:25 UTC
Permalink
Hey Adam,
Post by Adam Leventhal
This is great stuff, and -- as you say -- something that we've all
wanted for a long long time. The code looks great. The only addition
I'd ask for is if you considered having a case with bufpolicy=ring
and/or anonymous tracing.
Supporting a ring buffer policy is a tad brutal, because it would
require processing the entire ring to verify that the ECB isn't in it.
There are ways we could optimize that, but they're ugly; unless
someone is burning with this use case, I'd rather punt on it.
Anonymous tracing will just work; once the enabling is claimed, the
buffer will be switched, and it will become reapable.
Post by Adam Leventhal
Can you explain the purpose of dtrace_unregister_defunct_reap? Is the
idea to try for providers that were recently defunct and then give up
after a minute?
Yes; you don't want to keep trying in perpetuity because you may be
blocked by either a ring buffer policy or speculative tracing or
something else that is managing to cache ECB state. So after a while,
you want to give up.
Post by Adam Leventhal
Should the fasttrap cleanup stuff use a taskq rather than its timeout
stuff (not asking you to do it of course)?
Eh, it's fine. ;) I don't think it would be a problem to reimplement
that as a task queue, of course, but I would wait until there was some
compelling reason to do so.
Post by Adam Leventhal
Can you talk a little about the approach you took? Obviously if you're
using speculative tracing or bufpolicy=ring or anonymous tracing it
limits the efficacy of what you've done. Would it have been possible
to disable the ECBs without destroying them? I assume that's horribly
facile, but I'd be interested to understand.
It's probably possible (as Wolfgang Thaler was fond of saying, "it's
just software, man"), but it would be additional complexity for cases
that are pretty borderline. The speculative tracing one is hard to
get excited about because I don't think such enablings would be
long-lived; ring-buffer enablings are longer lived, of course, but I
don't think they tend to include USDT probes. (Though I'm happy to
tackle this as follow-on work if someone has actually encountered this
case.) And as I mentioned above, anonymous tracing will just work.
(Though long-lived anonymous enablings of USDT probes must be really
unusual!)
Post by Adam Leventhal
dtrace_buffer_t structures should be cache-size aligned since they're
per-CPU structures, yes? Would that be worth noting explicitly? And
why did you elect to have padding in two places rather than just 7
64-bit values at the end?
It's just to reduce false sharing. I debated not padding it out; I
think it's unlikely to be a problem in practice as a given line can
only be shared by at most two CPUs, and that's generally not enough
sharing to get deeply pathological -- but the extra memory cost is
minimal, and there's value to maintaining as much CPU orthogonality as
possible with respect to probe evaluation. As for padding it out in
two places, the first pad is only there for 32-bit whereas the pad
that I added is bitness-neutral.
Post by Adam Leventhal
Thanks; again, this is really great.
Thanks for taking a look! Now on to my other wads. ;)

- Bryan
Adam Leventhal
2011-07-13 00:15:04 UTC
Permalink
Hey Bryan
Post by Bryan Cantrill
Supporting a ring buffer policy is a tad brutal, because it would
require processing the entire ring to verify that the ECB isn't in it.
 There are ways we could optimize that, but they're ugly; unless
someone is burning with this use case, I'd rather punt on it.
Anonymous tracing will just work; once the enabling is claimed, the
buffer will be switched, and it will become reapable.
I actually meant a negative test case.
Post by Bryan Cantrill
Yes; you don't want to keep trying in perpetuity because you may be
blocked by either a ring buffer policy or speculative tracing or
something else that is managing to cache ECB state.  So after a while,
you want to give up.
Would that be worth a one- or two-line comment?

Adam
--
Adam Leventhal, Delphix
http://dtrace.org/blogs/ahl

275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
Bryan Cantrill
2011-07-13 00:23:21 UTC
Permalink
Post by Adam Leventhal
Hey Bryan
Post by Bryan Cantrill
Supporting a ring buffer policy is a tad brutal, because it would
require processing the entire ring to verify that the ECB isn't in it.
 There are ways we could optimize that, but they're ugly; unless
someone is burning with this use case, I'd rather punt on it.
Anonymous tracing will just work; once the enabling is claimed, the
buffer will be switched, and it will become reapable.
I actually meant a negative test case.
As I imagine you saw, there's already one there that uses speculative
tracing and verifies that enablings are not reaped:

usr/src/cmd/dtrace/test/tst/common/usdt/tst.noreap.ksh

Having a test that uses ring buffering is certainly possible, it's
just a more complicated test to write. (I'm currently cheating a bit
by using one enabling to kick off another.) But I'm happy to add it
if you'd like to see it...
Post by Adam Leventhal
Post by Bryan Cantrill
Yes; you don't want to keep trying in perpetuity because you may be
blocked by either a ring buffer policy or speculative tracing or
something else that is managing to cache ECB state.  So after a while,
you want to give up.
Would that be worth a one- or two-line comment?
There's already the comment that it's padded out to avoid false
sharing; does it merit more than that?

- Bryan
Adam Leventhal
2011-07-13 00:25:40 UTC
Permalink
Post by Bryan Cantrill
Post by Adam Leventhal
I actually meant a negative test case.
As I imagine you saw, there's already one there that uses speculative
 usr/src/cmd/dtrace/test/tst/common/usdt/tst.noreap.ksh
Having a test that uses ring buffering is certainly possible, it's
just a more complicated test to write. (I'm currently cheating a bit
by using one enabling to kick off another.)  But I'm happy to add it
if you'd like to see it...
Yes. I saw that test case. It was just a suggestion to exercise the
other path in your code, but it's up to you.
Post by Bryan Cantrill
Post by Adam Leventhal
Would that be worth a one- or two-line comment?
There's already the comment that it's padded out to avoid false
sharing; does it merit more than that?
I guess that's sufficient, but I needed to remind myself that these
were per-CPU data structures. Your call.

Adam
--
Adam Leventhal, Delphix
http://dtrace.org/blogs/ahl

275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
Bryan Cantrill
2011-07-13 05:44:46 UTC
Permalink
Post by Adam Leventhal
Post by Bryan Cantrill
Post by Adam Leventhal
I actually meant a negative test case.
As I imagine you saw, there's already one there that uses speculative
 usr/src/cmd/dtrace/test/tst/common/usdt/tst.noreap.ksh
Having a test that uses ring buffering is certainly possible, it's
just a more complicated test to write. (I'm currently cheating a bit
by using one enabling to kick off another.)  But I'm happy to add it
if you'd like to see it...
Yes. I saw that test case. It was just a suggestion to exercise the
other path in your code, but it's up to you.
I added that test and integrated into our repo; patch is attached.
Post by Adam Leventhal
Post by Bryan Cantrill
Post by Adam Leventhal
Would that be worth a one- or two-line comment?
There's already the comment that it's padded out to avoid false
sharing; does it merit more than that?
I guess that's sufficient, but I needed to remind myself that these
were per-CPU data structures. Your call.
I'm going to leave that one as it is; in addition to the comment next
to the member, the block comment immediately above the structure also
spells that out:

/*
* DTrace Buffers
*
* Principal buffers, aggregation buffers, and speculative buffers are all
* managed with the dtrace_buffer structure. By default, this structure
* includes twin data buffers -- dtb_tomax and dtb_xamot -- that serve as the
* active and passive buffers, respectively. For speculative buffers,
* dtb_xamot will be NULL; for "ring" and "fill" buffers, dtb_xamot will point
* to a scratch buffer. For all buffer types, the dtrace_buffer structure is
* always allocated on a per-CPU basis; a single dtrace_buffer structure is
* never shared among CPUs. (That is, there is never true sharing of the
* dtrace_buffer structure; to prevent false sharing of the structure, it must
* always be aligned to the coherence granularity -- generally 64 bytes.)
*
...

Let me know if you have any other comments; otherwise, I'll assume
that you've moved on to reviewing unprivileged tick probes. ;)

- Bryan
Adam Leventhal
2011-07-13 16:32:30 UTC
Permalink
Hey Bryan,

Cool; looks good to me.

Adam
Post by Bryan Cantrill
Post by Adam Leventhal
Post by Bryan Cantrill
Post by Adam Leventhal
I actually meant a negative test case.
As I imagine you saw, there's already one there that uses speculative
 usr/src/cmd/dtrace/test/tst/common/usdt/tst.noreap.ksh
Having a test that uses ring buffering is certainly possible, it's
just a more complicated test to write. (I'm currently cheating a bit
by using one enabling to kick off another.)  But I'm happy to add it
if you'd like to see it...
Yes. I saw that test case. It was just a suggestion to exercise the
other path in your code, but it's up to you.
I added that test and integrated into our repo; patch is attached.
Post by Adam Leventhal
Post by Bryan Cantrill
Post by Adam Leventhal
Would that be worth a one- or two-line comment?
There's already the comment that it's padded out to avoid false
sharing; does it merit more than that?
I guess that's sufficient, but I needed to remind myself that these
were per-CPU data structures. Your call.
I'm going to leave that one as it is; in addition to the comment next
to the member, the block comment immediately above the structure also
/*
 * DTrace Buffers
 *
 * Principal buffers, aggregation buffers, and speculative buffers are all
 * managed with the dtrace_buffer structure.  By default, this structure
 * includes twin data buffers -- dtb_tomax and dtb_xamot -- that serve as the
 * active and passive buffers, respectively.  For speculative buffers,
 * dtb_xamot will be NULL; for "ring" and "fill" buffers, dtb_xamot will point
 * to a scratch buffer.  For all buffer types, the dtrace_buffer structure is
 * always allocated on a per-CPU basis; a single dtrace_buffer structure is
 * never shared among CPUs.  (That is, there is never true sharing of the
 * dtrace_buffer structure; to prevent false sharing of the structure, it must
 * always be aligned to the coherence granularity -- generally 64 bytes.)
 *
 ...
Let me know if you have any other comments; otherwise, I'll assume
that you've moved on to reviewing unprivileged tick probes. ;)
       - Bryan
--
Adam Leventhal, Delphix
http://dtrace.org/blogs/ahl

275 Middlefield Road, Suite 50
Menlo Park, CA 94025
http://www.delphix.com
Loading...