Discussion:
reduce dtrace overhead
Scott Feldstein
2010-06-15 17:10:28 UTC
Permalink
Hi,
I am trying to figure out how to reduce the performance impact of dtrace when run with java probes. Currently the script that i am running seems to significantly impact the performance of my java web app.

I've been trying to use the profile:::profile-1001hz probe to reduce the resolution of events processed by dtrace but have had no success (I'm probably not using it correctly).

Here is my script, any ideas how I can reduce the overhead? I am happy to reduce the resolution of events that are processed if there is a way to do that, or anything else anyone can think of.

thanks.

#!/usr/sbin/dtrace -CZs

self long start;
self short in_lock;
string package_s;
string package_e;

BEGIN
{
package_s = "com/foo/aaa";
package_e = "com/foo/zzz";
start = timestamp;
}

/*
* hotspot:::method-entry, hotspot:::method-return probe arguments:
* arg0: uintptr_t, Java thread id
* arg1: char*, a pointer to mUTF-8 string containing the name of
* the class of the method being entered
* arg2: uintptr_t, the length of the class name (in bytes)
* arg3: char*, a pointer to mUTF-8 string data which contains the
* name of the method being entered
* arg4: uintptr_t, the length of the method name (in bytes)
* arg5: char*, a pointer to mUTF-8 string data which contains the
* signature of the method being entered
* arg6: uintptr_t, the length of the signature(in bytes)
*/

hotspot$target:::method-return
/self->start && (copyin(arg1, arg2+1) >= package_s && copyin(arg1, arg2+1) <= package_e)/
{
self->str_ptr = (char*) copyin(arg3, arg4+1);
self->str_ptr[arg4] = '\0';
self->method_name = (string) self->str_ptr;

self->str_ptr = (char*) copyin(arg1, arg2+1);
self->str_ptr[arg2] = '\0';
self->class_name = (string) self->str_ptr;

@method_time[self->class_name, self->method_name] = sum((timestamp - self->start) / 1000000);
self->start = NULL;
}

hotspot$target:::method-entry
/(copyin(arg1, arg2+1) >= package_s && copyin(arg1, arg2+1) <= package_e)/
{
self->start = timestamp;

self->str_ptr = (char*) copyin(arg1, arg2+1);
self->str_ptr[arg2] = '\0';
self->class_name = (string) self->str_ptr;

self->str_ptr = (char*) copyin(arg3, arg4+1);
self->str_ptr[arg4] = '\0';
self->method_name = (string) self->str_ptr;

self->str_ptr = (char*) copyin(arg5, arg6+1);
self->str_ptr[arg6] = '\0';
self->signature = (string) self->str_ptr;

@method_count[self->class_name, self->method_name] = count();
}

END
{
printf("\nCOUNT\n======================\n");
printa(@method_count);
printf("\nTIME\n======================\n");
printa(@method_time);
exit(0);
}
--
This message posted from opensolaris.org
Ryan Johnson
2010-06-15 18:02:52 UTC
Permalink
Post by Scott Feldstein
I am trying to figure out how to reduce the performance impact of dtrace when run with java probes. Currently the script that i am running seems to significantly impact the performance of my java web app.
Userspace probes still trap to the kernel for handling, including
predicate evaluation, so the script below is essentially turning every
function into a pair of syscalls (granted, the hotspot provider doesn't
seem to provide a cheaper alternative...).
Post by Scott Feldstein
I've been trying to use the profile:::profile-1001hz probe to reduce the resolution of events processed by dtrace but have had no success (I'm probably not using it correctly).
For minimal overhead I would use something in the 100-200hz range
(5-10ms between samples) and aggregate over jstack(). It won't tell you
how many calls you're getting, but it will show clearly which functions
are eating up all the time and who called them. I haven't done Java
before, but here's what I'd do for C code:

profile-$FREQ /pid==$PID/ {
@profile[ustack()]=count();
}

Regards,
Ryan
Scott Feldstein
2010-06-15 19:29:06 UTC
Permalink
Ryan,
Thanks for the reply.

How do I apply the sampling technique that you mentioned to the method-entry and method-return blocks of code? That is my big problem. The ustack() just returns the stack as the OS sees it, not the jvm. Maybe I just need to dig through the java probes a bit more to see if i can find something with the same functionality.

thanks.
--
This message posted from opensolaris.org
Ryan Johnson
2010-06-15 20:25:51 UTC
Permalink
Post by Scott Feldstein
Ryan,
Thanks for the reply.
How do I apply the sampling technique that you mentioned to the method-entry and method-return blocks of code? That is my big problem. The ustack() just returns the stack as the OS sees it, not the jvm. Maybe I just need to dig through the java probes a bit more to see if i can find something with the same functionality.
thanks.
Hi Scott,

Profiling and entry/return probes are orthogonal approaches. You can use
both simultaneously, though that's rarely useful. Entry/return
intercepts *every* call and return, jumping to the kernel to do dtrace-y
things. Very expensive (adds ~50usec/call), especially if there are lots
of small functions. That's what the script you submitted was doing.

I'm suggesting to use profiling, where the kernel raises interrupts on
every core in the system at the specified frequency (e.g. 101hz), and
the interrupt's handler runs the dtrace probe (in my case stuffing the
stack trace in an aggregation). Much cheaper (~50usec every 1-10msec),
but you only get a sampling back instead of exact traces -- whatever
happened to be running on that core when the interrupt fired.

Also, as I mentioned before, you probably want 'jstack' instead of
'ustack' (see
http://wikis.sun.com/display/DTrace/Actions+and+Subroutines for
explanations of stack/ustack/jstack). I'd start simple, with a profile
probe aggregating over jstack, and go from there.

Hope that makes sense,
Ryan
Scott Feldstein
2010-06-15 22:15:18 UTC
Permalink
Yes this makes sense and jstack(1) looks like it is the missing piece. I'll try it out.

Initially I just didn't think about the idea that using count with the profile:::profile-xxhz probe would give me what I wanted (duh!)

thanks Ryan!
--
This message posted from opensolaris.org
Loading...