Discussion:
dtrace io on zfs
John Dzilvelis
2008-08-22 02:33:03 UTC
Permalink
Hello,
I have recently started using iosnoop from the dtrace toolkit to monitor file level io. It works great (!) on ufs file systems, but I am unable to see any file names when monitoring processes running on a zfs file system. As is documented, "<none>" is displayed for the file name when it can not be determined from the file system.  
Ideally I would like to gather file level ( named) IO Read/Write Bytes as well as IO Read/Write Counts, like the output of iosnoop.  
During my search of the list archives it appears that this issue came up in March 2008. But I haven't been able to determine if there was a resolution.
If there has not been a solution to this, then it would be great if someone could point me in the right direction so that I can experiment with some other probes. I've noticed that the fbt provider has many zfs related probes, but I am unable to find the docs that describe them individually in detail.

Thanks,
johndz
Ben Rockwood
2008-08-22 11:31:34 UTC
Permalink
Post by John Dzilvelis
Hello,
I have recently started using iosnoop from the dtrace toolkit to monitor file level io. It works great (!) on ufs file systems, but I am unable to see any file names when monitoring processes running on a zfs file system. As is documented, "<none>" is displayed for the file name when it can not be determined from the file system.
Ideally I would like to gather file level ( named) IO Read/Write Bytes as well as IO Read/Write Counts, like the output of iosnoop.
During my search of the list archives it appears that this issue came up in March 2008. But I haven't been able to determine if there was a resolution.
If there has not been a solution to this, then it would be great if someone could point me in the right direction so that I can experiment with some other probes. I've noticed that the fbt provider has many zfs related probes, but I am unable to find the docs that describe them individually in detail.
Tracing IO through ZFS is very complex because of the various layers
through which IO passes. If you want to watch arbitrary file io
requests you'll want to stick with syscall probes and the like.
DTracing IO at a high level is easy, same at a low level, its the in
between with ZFS thats tough.

The FBT provider exposes every functions entry and return, so for a
given fbt:zfs:whatever probe just look for that function in the ZFS code
(cvs.opensolaris.org).

How you trace will depend on what you want to know, but I find that a
good starting point is to aggregate callstacks from low level zio
functions, thus allowing me to see how we got there; ie: dtrace
'fbt:zfs:zio_read:entry{ @[stack(20)] = count(); }'. Based on the call
stack I can use other FBT probes to explore. When all else fails I have
gone so far as to trace every function in the kernel (this works for
about 1-2 seconds and produces an output file around 500M on my personal
workstation at home)... not something I'd do in production but an
amazing learning experience. ;)

benr.
Jonathan Edwards
2008-08-22 14:56:37 UTC
Permalink
Post by Ben Rockwood
Post by John Dzilvelis
Hello,
I have recently started using iosnoop from the dtrace toolkit to
monitor file level io. It works great (!) on ufs file systems, but
I am unable to see any file names when monitoring processes running
on a zfs file system. As is documented, "<none>" is displayed for
the file name when it can not be determined from the file system.
Ideally I would like to gather file level ( named) IO Read/Write
Bytes as well as IO Read/Write Counts, like the output of iosnoop.
During my search of the list archives it appears that this issue
came up in March 2008. But I haven't been able to determine if
there was a resolution.
If there has not been a solution to this, then it would be great if
someone could point me in the right direction so that I can
experiment with some other probes. I've noticed that the fbt
provider has many zfs related probes, but I am unable to find the
docs that describe them individually in detail.
Tracing IO through ZFS is very complex because of the various layers
through which IO passes. If you want to watch arbitrary file io
requests you'll want to stick with syscall probes and the like.
DTracing IO at a high level is easy, same at a low level, its the in
between with ZFS thats tough.
The FBT provider exposes every functions entry and return, so for a
given fbt:zfs:whatever probe just look for that function in the ZFS code
(cvs.opensolaris.org).
How you trace will depend on what you want to know, but I find that a
good starting point is to aggregate callstacks from low level zio
functions, thus allowing me to see how we got there; ie: dtrace
stack I can use other FBT probes to explore. When all else fails I have
gone so far as to trace every function in the kernel (this works for
about 1-2 seconds and produces an output file around 500M on my personal
workstation at home)... not something I'd do in production but an
amazing learning experience. ;)
yep - you won't see filenames from iosnoop on zfs since it doesn't
call bdev_strategy() directly which is what io:::start works with ..
this came up last year with Mauro on the zfs-discuss alias .. if
you're looking for file pathnames take a look at the following example
I whipped up after that discussion for use on a project to track
timings to zfs based files (which can be particularly enlightening.)

#!/usr/sbin/dtrace -s

# pragma D option quiet

zfs_write:entry,
zfs_read:entry,
zfs_putpage:entry,
zfs_getpage:entry
{
self->ts = timestamp;
self->filepath = args[0]->v_path;
}

zfs_write:return,
zfs_read:return,
zfs_putpage:return,
zfs_getpage:return
/self->ts && self->filepath/
{
printf("%s on %s took %d nsecs\n", probefunc,
stringof(self->filepath), timestamp - self->ts);
self->ts = 0;
self->filepath = 0;
}

---
for IO sizes and counts though .. take a look at the calls to extract
the right arguments you might be looking for

---
.je
Jon Haslam
2008-08-22 15:17:18 UTC
Permalink
For those that are interested there is an open bug on this:

6266202: DTrace io provider doesn't work with ZFS

Jon.
Post by Jonathan Edwards
Post by Ben Rockwood
Post by John Dzilvelis
Hello,
I have recently started using iosnoop from the dtrace toolkit to
monitor file level io. It works great (!) on ufs file systems, but
I am unable to see any file names when monitoring processes running
on a zfs file system. As is documented, "<none>" is displayed for
the file name when it can not be determined from the file system.
Ideally I would like to gather file level ( named) IO Read/Write
Bytes as well as IO Read/Write Counts, like the output of iosnoop.
During my search of the list archives it appears that this issue
came up in March 2008. But I haven't been able to determine if
there was a resolution.
If there has not been a solution to this, then it would be great if
someone could point me in the right direction so that I can
experiment with some other probes. I've noticed that the fbt
provider has many zfs related probes, but I am unable to find the
docs that describe them individually in detail.
Tracing IO through ZFS is very complex because of the various layers
through which IO passes. If you want to watch arbitrary file io
requests you'll want to stick with syscall probes and the like.
DTracing IO at a high level is easy, same at a low level, its the in
between with ZFS thats tough.
The FBT provider exposes every functions entry and return, so for a
given fbt:zfs:whatever probe just look for that function in the ZFS code
(cvs.opensolaris.org).
How you trace will depend on what you want to know, but I find that a
good starting point is to aggregate callstacks from low level zio
functions, thus allowing me to see how we got there; ie: dtrace
stack I can use other FBT probes to explore. When all else fails I have
gone so far as to trace every function in the kernel (this works for
about 1-2 seconds and produces an output file around 500M on my personal
workstation at home)... not something I'd do in production but an
amazing learning experience. ;)
yep - you won't see filenames from iosnoop on zfs since it doesn't
call bdev_strategy() directly which is what io:::start works with ..
this came up last year with Mauro on the zfs-discuss alias .. if
you're looking for file pathnames take a look at the following example
I whipped up after that discussion for use on a project to track
timings to zfs based files (which can be particularly enlightening.)
#!/usr/sbin/dtrace -s
# pragma D option quiet
zfs_write:entry,
zfs_read:entry,
zfs_putpage:entry,
zfs_getpage:entry
{
self->ts = timestamp;
self->filepath = args[0]->v_path;
}
zfs_write:return,
zfs_read:return,
zfs_putpage:return,
zfs_getpage:return
/self->ts && self->filepath/
{
printf("%s on %s took %d nsecs\n", probefunc,
stringof(self->filepath), timestamp - self->ts);
self->ts = 0;
self->filepath = 0;
}
---
for IO sizes and counts though .. take a look at the calls to extract
the right arguments you might be looking for
---
.je
_______________________________________________
dtrace-discuss mailing list
Sergio Troiano
2010-08-19 15:29:19 UTC
Permalink
Hi,
I'm working in a zfs_log monitor, mi idea is to create a backup fileserver, at first place the source server log each zfs syscall and then another server takes them and replicate the information copying each diference (via NFS),

My problem is that some transactions don't appear in the log, i've been reading a lot about it thinking about "drops" , then i changed the swhitchrate and the buffersize but they didn't work.
i leave you mi d scritp:
Tanks!

#!/usr/sbin/dtrace -s
#pragma D option bufsize=10m
#pragma D option switchrate=20hz
# pragma D option quiet

zfs_mkdir:entry
{
self->ts = timestamp;
self->filepath = args[0]->v_path;
self->dirname = args[1];
self->perms = args[2]->va_mode;

}
zfs_mkdir:return
/self->filepath && self->ts && args[1] == 0/

{

printf("%s\t%s\t%d\n", probefunc, stringof(self->filepath), timestamp - self->ts);

self->filepath = 0;
self->dirname = 0;
self->ts = 0;
self->perms = 0;

}

zfs_write:entry,
zfs_read:entry
{
self->ts = timestamp;
self->filepath = args[0]->v_path;
}
zfs_write:return,
zfs_read:return
/self->filepath && self->ts && args[1] == 0/
{
printf("%s\t%s\t%d\n", probefunc, stringof(self->filepath),timestamp );
self->filepath = 0;
self->ts = 0;

}

zfs_create:entry

{
self->ts = timestamp;
self->filepath = args[0]->v_path;
self->filename = args[1];
self->perms = args[2]->va_mode;
}
zfs_create:return
/self->ts && self->filepath && self->filename && self->perms && args[1] == 0/
{
printf("%s\t%s/%s\t%o\t%d\n", probefunc, stringof(self->filepath), stringof(self->filename), self->perms,timestamp);
self->filepath = 0;
self->filename = 0;
self->ts = 0;
self->perms = 0;
}

zfs_rmdir:entry
{
self->ts = timestamp;
self->filepath = args[0]->v_path;
self->dirname = args[1];
}
zfs_rmdir:return
/self->filepath && self->dirname && self->ts && args[1] == 0/
{
printf("%s\t%s/%s\t%d\n", probefunc, stringof(self->filepath), stringof(self->dirname), timestamp);
self->filepath = 0;
self->dirname = 0;
self->ts = 0;

}

zfs_remove:entry
{
self->ts = timestamp;
self->filepath = args[0]->v_path;
self->filename = args[1];
}
zfs_remove:return
/self->filepath && self->ts && self->filename && args[1] == 0/
{
printf("%s\t%s/%s\t%d\n", probefunc, stringof(self->filepath), stringof(self->filename), timestamp);
self->filepath = 0;
self->filename = 0;
self->ts = 0;
}
zfs_rename:entry
{

self->ts = timestamp;
self->filepathsource = args[0]->v_path;
self->filenamesource = args[1];
self->filepathdest = args[2]->v_path;
self->filenamedest = args[3];
}
zfs_rename:return
/self->ts && self->filepathsource && self->filenamesource && self->filepathdest && self->filenamedest && args[1] == 0/
{
printf("%s\t%s/%s\t%s/%s\t%d\n", probefunc, stringof(self->filepathsource),
stringof(self->filenamesource),stringof(self->filepathdest), stringof(self->filenamedest), timestamp );
self->filepathsource = 0;
self->filenamesource = 0;
self->filepathdest = 0;
self->filenamedest = 0;
self->ts = 0;
}


zfs_symlink:entry
{
self->ts = timestamp;
self->filepathdest = args[0]->v_path;
self->filenamedest = args[1];
self->linkname = args[3];
self->perms = args[2]->va_mode;
}

zfs_symlink:return
/self->filepathdest&& self->ts && self->filenamedest && self->linkname && args[1] == 0/
{

printf("%s\t%s\t%s/%s\t%o\t%d\n", probefunc, stringof(self->linkname), stringof(self->filepathdest), stringof(self->filenamedest),
self->perms,timestamp );
self->filepathsdest= 0;
self->filenamesdest= 0;
self->flinkname= 0;
self->ts = 0;
self->perms = 0;
}
zfs_link:entry
{
self->ts = timestamp;
self->filepathdest = args[0]->v_path;
self->filenamedest = args[2];
self->filenamesource = args[1]->v_path;
}

zfs_link:return
/self->ts && self->filepathdest && self->filenamedest && self->filenamesource && args[1] == 0/
{
printf("%s\t%s\t%s/%s\t%d\n", probefunc, stringof(self->filenamesource),stringof(self->filepathdest), stringof(self->filenamedest), timestamp );
self->filepathdest = 0;
self->filenamesource = 0;
self->filenamesource = 0;
self->ts = 0;
}
zfs_setattr:entry
{
self->ts = timestamp;
self->fileid = args[0]->v_path;
}

zfs_setattr:return
/self->ts && self->fileid && args[1] == 0/
{
printf("%s\t%s\t%d\n", probefunc, stringof(self->fileid), timestamp );
self->fileid = 0;
self->ts = 0;
}
zfs_readdir:entry
{
self->ts = timestamp;
self->fileid = args[0]->v_path;
}

zfs_readdir:return
/self->ts && self->fileid && args[1] == 0/
{
printf("%s\t%s\t%d\n", probefunc, stringof(self->fileid), timestamp);
self->fileid = 0;
self->ts = 0;
}
--
This message posted from opensolaris.org
Sergio Troiano
2010-08-20 19:31:06 UTC
Permalink
i want to add more information:

I can see dtrace isn't generating some zfs_mkdir's syscalls , i deleted all the "if conditionals" but it didn't work either.
i'm using OpenSolaris, Is there any difference between Open Solaris and Solairs? (speaking about dtrace).

I've read zfs_vnops.c looking for another sycall which create directories but i haven't found it.
Maybe there is another way to create directories (another syscall?) and i'm not getting this syscall in my dtrace script.


Thanks!
--
This message posted from opensolaris.org
Loading...