Discussion:
Can dtrace cpc provider support 2+ probes?
Jin Yao
2010-06-17 06:29:48 UTC
Permalink
I write 2 dtrace scripts in order to measure "clk" and "rma" for sytem workload
on nhm-ex (32 cores) when specjbb2005 runs.

***@shz-OS:~# ./test_clk_rma.d
......................
clk
309547
rma
32
kcpc_int
309579
......................
clk
309931
rma
57
kcpc_int
309985
......................
clk
310158
rma
25
kcpc_int
310183
^C

***@shz-OS:~# ./test_rma.d
............
rma
1531
kcpc_int
1531
............
rma
1645
kcpc_int
1645
............
rma
1537
kcpc_int
1537

I find the "rma" in test_clk_rma.d output is smaller than the "rma"
from test_rma.d. I guess some overflow interrupts lost when the dtrace
script test_clk_rma.d runs. But if it's true, why most of time are in
user space not in system space (from the output of vmstat and mpstat)?

The script sources are bellow and I also changed the setting to
"dcpc-min-overflow=100;" in "/kernel/drv/dcpc.conf" before tests.

***@shz-OS:~# cat test_clk_rma.d
#!/usr/sbin/dtrace -s

#pragma D option quiet

cpc:::cpu_clk_unhalted.ref-all-1000000
{
@clk = count();
}

cpc:::mem_uncore_retired.remote_dram-all-100
{
@rma = count();
}

kcpc_hw_overflow_intr:entry
{
@kcpc_int = count();
}

tick-5s
{
printf("......................\n");
printf("clk");
printa(@clk);
trunc(@clk);
printf("rma");
printa(@rma);
trunc(@rma);
printf("kcpc_int");
printa(@kcpc_int);
trunc(@kcpc_int);
}

***@shz-OS:~# cat test_rma.d
#!/usr/sbin/dtrace -s

#pragma D option quiet

cpc:::mem_uncore_retired.remote_dram-all-100
{
@rma = count();
}

kcpc_hw_overflow_intr:entry
{
@kcpc_int = count();
}

tick-5s
{
printf("............\n");
printf("rma");
printa(@rma);
trunc(@rma);
printf("kcpc_int");
printa(@kcpc_int);
trunc(@kcpc_int);
}

Can anybody give me some suggestions?

Thanks
Jin Yao
--
This message posted from opensolaris.org
Jin Yao
2010-06-18 06:57:16 UTC
Permalink
Hi kuriakose,

There was no cpc sampling job when my dtrace scripts (test_clk_rma.d or test_rma.d) running.
I tried your test script, it didn't hook the function core_pcbe_sample.
eg: some pieces from trace log.

2 4076 core_pcbe_overflow_bitmap:rdmsr 38E 17179869184(0x400000000)
2 4074 core_pcbe_overflow_bitmap:wrmsr 390 17179869184(0x400000000)
2 4072 core_pcbe_allstop:wrmsr 38F 0(0x0)
2 4072 core_pcbe_allstop:wrmsr 38F 0(0x0)
2 4073 core_pcbe_program:wrmsr 390 13835058085346934799(0xC00000070000000F)
2 4073 core_pcbe_program:wrmsr C1 281474976710555(0xFFFFFFFFFF9B)
2 4073 core_pcbe_program:wrmsr 186 5443599(0x53100F)
2 4073 core_pcbe_program:wrmsr 30B 281474975710655(0xFFFFFFF0BDBF)
2 4073 core_pcbe_program:wrmsr 38D 2816(0xB00)
2 4073 core_pcbe_program:wrmsr 38F 17179869185(0x400000001)

Another question is I checked the codes of kcpc_hw_overflow_intr,
one bitmap indicates which counters get overflow. And the function
will not reset the counter when the related bit in bitmap is not set.

bitmap = pcbe_ops->pcbe_overflow_bitmap();
if (dtrace_cpc_in_use) {
/* Reset any counters that have overflowed */
for (i = 0; i < ctx->kc_set->ks_nreqs; i++) {
req = ctx->kc_set->ks_req[i];
if (bitmap & (1 << req.kr_picnum)) {
pcbe_ops->pcbe_configure(req.kr_picnum,
req.kr_event, req.kr_preset,
req.kr_flags, req.kr_nattrs,
req.kr_attr, &(req.kr_config),
(void *)ctx);
}
}

pcbe_ops->pcbe_program(ctx);
return (DDI_INTR_CLAIMED);
}

So it looks like the rma counter will not be reset to initial value when the clk counter overflows.
Correct me if I'm wrong.

Thanks
Jin Yao
--
This message posted from opensolaris.org
Jonathan Haslam
2010-06-18 16:38:13 UTC
Permalink
Post by Jin Yao
So it looks like the rma counter will not be reset to initial value when the clk counter overflows.
Correct me if I'm wrong.
I think that's what Kuriakose stated in his reply to your perf-discuss
posting. Yes, this looks to be a bug and I think Kuriakose is going
to file it. Obviously the workaround in the short term is to only use
a single event.

Jon.
Jin Yao
2010-06-19 13:17:40 UTC
Permalink
Yes, the core_pcbe_program processes all cfgs in list one by one. In my sample,
there are 2 cfgs, "clk" and "rma" in cfg-list. The core_pcbe_program will program
them all. So in kcpc_hw_overflow_intr, though the overflow bitmap skip the
counters which are not overflow, but the core_pcbe_program will still reset them.
That's the problem.

Thanks
Jin Yao
Post by Jin Yao
2 4073 core_pcbe_program:wrmsr C1
281474976710555(0xFFFFFFFFFF9B)
That line shows the rma counter being reset to
MAX-100.
pcbe_configure() will update the pcbe_config with the
value that has to
be written to the counter. That is correctly done
here for the clk
counter that overflowed.
The pcbe_config for the rma counter still has the
initial value of
MAX-100, so on the subsequent pcbe_program() call
when all active
counters are programmed, MAX-100 will be written to
the rma counter.
Hi kuriakose,
There was no cpc sampling job when my dtrace
scripts (test_clk_rma.d or test_rma.d) running.
I tried your test script, it didn't hook the
function core_pcbe_sample.
eg: some pieces from trace log.
2 4076 core_pcbe_overflow_bitmap:rdmsr 38E
17179869184(0x400000000)
2 4074 core_pcbe_overflow_bitmap:wrmsr 390
17179869184(0x400000000)
2 4072 core_pcbe_allstop:wrmsr 38F
0(0x0)
2 4072 core_pcbe_allstop:wrmsr 38F
0(0x0)
2 4073 core_pcbe_program:wrmsr 390
13835058085346934799(0xC00000070000000F)
2 4073 core_pcbe_program:wrmsr C1
281474976710555(0xFFFFFFFFFF9B)
2 4073 core_pcbe_program:wrmsr 186
5443599(0x53100F)
2 4073 core_pcbe_program:wrmsr 30B
281474975710655(0xFFFFFFF0BDBF)
2 4073 core_pcbe_program:wrmsr 38D
2816(0xB00)
2 4073 core_pcbe_program:wrmsr 38F
17179869185(0x400000001)
Another question is I checked the codes of
kcpc_hw_overflow_intr,
one bitmap indicates which counters get overflow.
And the function
will not reset the counter when the related bit in
bitmap is not set.
bitmap = pcbe_ops->pcbe_overflow_bitmap();
if (dtrace_cpc_in_use) {
/* Reset any counters that have overflowed */
for (i = 0; i< ctx->kc_set->ks_nreqs; i++) {
req = ctx->kc_set->ks_req[i];
if (bitmap& (1<< req.kr_picnum)) {
pcbe_ops->pcbe_configure(req.kr_picnum,
req.kr_event, req.kr_preset,
req.kr_flags, req.kr_nattrs,
req.kr_attr,&(req.kr_config),
(void *)ctx);
}
pcbe_ops->pcbe_program(ctx);
return (DDI_INTR_CLAIMED);
}
So it looks like the rma counter will not be reset
to initial value when the clk counter overflows.
Correct me if I'm wrong.
Thanks
Jin Yao
_______________________________________________
perf-discuss mailing list
--
This message posted from opensolaris.org
Loading...