Solaris Internals Resource Threshold being hit

Discussion:

Robin Cotgrove

2010-10-29 16:00:06 UTC

I need some assistance and guidance in writing a DTRACE script or even better, finding an example one which would help me identify what's going on our system. Intermittently, and we think it might be happening after about 60 days, on a E2900, 192GB, 24 core, Solaris 10 11.06 system with a fairly new patch cluster (Generic_142900-13) we are running into a problem whereby we suddenly hit a problem which results in processes failing to start and getting the error message 'resource temporarily unavailable' error. This is leading to Oracle crash/startup issues.

I ran a simple du command at the time it was happening at got the following response.

‘du: No more processes: Resource temporarily unavailable’

Approximately 6500 TCP connections on server at time. 6000 unix processes. The max UNIX processes per user is set to 29995. 60GB free physical memory and no swap being used. Absolutely baffling us at mo.

Not managed to truss a failing command when it happened yet because it's so intermitttent in it's nature.

We've checked all the usual suspects including max processes per users and cannot find the cause. Need a way to monitor all the internal kernel resources to see what we're hitting. Suggestions please on a postcard. All welcome.

Robin Cotgrove

--
This message posted from opensolaris.org

Mike Gerdts

2010-10-29 16:58:26 UTC

Permalink

Does anything get logged to /var/adm/messages?

Post by Robin Cotgrove
Approximately 6500 TCP connections on server at time. 6000 unix processes. The max UNIX processes per user is set to 29995. 60GB free physical memory and no swap being used. Absolutely baffling us at mo.

Swap may not be used, but it is certainly reserved. Note that Solaris
has multiple definitions of swap. That disk space you allocated and
called "swap" is one thing. The overall RAM and swap device backed
address space is another.

Unlike Linux (default config), Solaris does not allow memory to be
overcommitted. If something does malloc(1024 * 1024 * 1024 * 1024),
the call will fail on Solaris unless you have 1 TB of free "swap"
(memory + swap devices). On Linux, the malloc would likely succeed.
At such a time as you actually start writing to more pages of memory
than your system has in RAM + swap devices, the allocated memory, the
Linux Out of Memory Killer will kick in and start selecting things to
kill to free up memory.

We can see this with two runs of /opt/DTT/Mem/swapinfo.d on my
OpenSolaris system. You can get this for Solaris 10 as part of the
DTraceToolkit.

# /opt/DTT/Mem/swapinfo.d
...
Swap _______Total 2496 MB
Swap Resv 619 MB
Swap Avail 1877 MB
Swap (Minfree) 222 MB

# /opt/DTT/Mem/swapinfo.d
...
Swap _______Total 2224 MB
Swap Resv 2047 MB
Swap Avail 176 MB
Swap (Minfree) 222 MB

One thing I just noticed - minfree does not become 176 MB as I would
have expected. Be careful with that value!

Why was there such a big difference in Avail? Because I ran this program:

/* Save as foo.c then compile with gcc -o foo foo.c */
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv) {
if ( malloc(1024 * 1024 * 1700) == NULL ) {
perror("malloc");
exit(1);
}
sleep(5);
exit(0);
}

A likely scenario that would cause a database server to temporarily
reserve a lot more swap is when a new oracle process is created. When
a process forks, memory is reserved for all of the pages of memory
that are anonymous (e.g. not an mmapped file or device), read-write,
and not shared. This is required to support the copy-on-write
mechanism used by the virtual memory system. You can use pmap to take
a look at the memory mappings of a process to get an idea of how much
space this takes.

To look at the amount of available swap that matters, refer to the
swap column of vmstat. For things like this that are transient, you
may have trouble seeing it, even with "vmstat 1". Note that while you
are looking at vmstat output, you should always ignore the first line
of output - it is a pretty much useless average since boot. If you
need to get values at a higher resolution, you may want to adapt
swapinfo.d from the DTraceToolkit to use the profile provider to
quantize the available swap value.

Post by Robin Cotgrove
Not managed to truss a failing command when it happened yet because it's so intermitttent in it's nature.
We've checked all the usual suspects including max processes per users and cannot find the cause. Need a way to monitor all the internal kernel resources to see what we're hitting. Suggestions please on a postcard. All welcome.

It seems quite likely to me that you will find that the swap that is
available to reserve temporarily dips to a minuscule value. If this
is the case, adding more swap will help.

--
Mike Gerdts
http://mgerdts.blogspot.com/

Jim Mauro

2010-10-29 18:27:59 UTC

Permalink

Mike is correct. Pretty much every time I've seen this, it's
VM (VM = virtual memory = swap) related.

There's a DTrace script below you can run when you hit this
problem that will show us which system call is failing with an
EAGAIN error. It is most likely fork(2) (and yes, I know printing
the errno in the return action is superfluous given we use it
in the predicate - it's me being OCD and sanity checking).

A second DTrace script further down should provide a kernel
stack trace if it is a fork(2) failure.

Or....(disk is cheap) "swap -a" (add swap space) and see if the
problem goes away.

Thanks
/jim

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall:::entry
{
self->flag[probefunc] = 1;
}
syscall:::return
/self->flag[probefunc] && errno == 11/
{
printf("syscall: %s, arg0: %d, arg1: %d, errno: %d\n\n",probefunc,arg0,arg1,errno);
self->flag[probefunc] = 0;
}

------------------------------------------------------------------------------------------------------------------------

#!/usr/sbin/dtrace -s

#pragma D option quiet

syscall::forksys:entry
{
self->flag = 1;
@ks[stack(),ustack()] = count();
}
syscall::forksys:return
/self->flag && arg0 == -1 && errno != 0/
{
printf("fork failed, errno: %d\n",errno);
printa(@ks);
clear(@ks);
exit(0);
}

Post by Robin Cotgrove
I need some assistance and guidance in writing a DTRACE script or even better, finding an example one which would help me identify what's going on our system. Intermittently, and we think it might be happening after about 60 days, on a E2900, 192GB, 24 core, Solaris 10 11.06 system with a fairly new patch cluster (Generic_142900-13) we are running into a problem whereby we suddenly hit a problem which results in processes failing to start and getting the error message 'resource temporarily unavailable' error. This is leading to Oracle crash/startup issues.
I ran a simple du command at the time it was happening at got the following response.
‘du: No more processes: Resource temporarily unavailable’
Approximately 6500 TCP connections on server at time. 6000 unix processes. The max UNIX processes per user is set to 29995. 60GB free physical memory and no swap being used. Absolutely baffling us at mo.
Not managed to truss a failing command when it happened yet because it's so intermitttent in it's nature.
We've checked all the usual suspects including max processes per users and cannot find the cause. Need a way to monitor all the internal kernel resources to see what we're hitting. Suggestions please on a postcard. All welcome.
Robin Cotgrove
--
This message posted from opensolaris.org
_______________________________________________
dtrace-discuss mailing list

Robin Cotgrove

2010-10-29 19:50:51 UTC

Permalink

Sorry guys. Swap is not the issue. We've had this confirmed by Oracle and I can clearly see there is 96GB of swap awailable on the system and ~50GB of main memory.

Not everything relating to forking problems is swap. We have had a similar forking issue in the past and solved it with swap file addition and in one case, it was shared memory was being restricted a Solaris project setting. File descriptor limits being hit is another good one. Max processes per user is another common one. All lot's of common reasons. This one is weird and we don't know what it is.

Like the dtrace scripts though. Very useful to make things a lot clearer for people to interpret values.

--
This message posted from opensolaris.org

Mike Gerdts

2010-10-29 20:45:39 UTC

Permalink

Post by Robin Cotgrove
Sorry guys. Swap is not the issue. We've had this confirmed by Oracle and I can clearly see there is 96GB of swap awailable on the system and ~50GB of main memory.

By who at Oracle? Not everyone is equally qualified. I would tend to
trust Jim Mauro (who co-wrote the books[1] on Solaris internals,
performance, & dtrace) over most of the people you will get to through
normal support channels.

1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/

How do you know that available swap doesn't momentarily drop? I've
run into plenty of instances where a system has tens of gigabytes of
free memory but is woefully short on reservable swap (virtual memory,
as Jim approximates). Usually "vmstat 1" is helpful in observing
spikes, but as I said before this could miss very short spikes. If
you've already done this to see that swap is unlikely to be an issue,
knowing that would be useful to know. If you are measuring the amount
of reservable swap with "swap -l", you are doing it wrong.

I do agree that there can be other shortfalls that can cause this.
This may call for speculative tracing of stacks across the fork entry
and return calls, displaying results only when the fork fails with
EAGAIN. Jim's second script is similar to what I suggest, except that
it doesn't show the code path taken between syscall::forksys:entry and
syscall::forksys:return.

Also, I would be a little careful running the second script as is for
long periods of time if you have a lot of forksys activity with unique
stacks. I think that as it is @ks may grow rather large over time
because the successful forks are not cleared.

--
Mike Gerdts
http://mgerdts.blogspot.com/

Jim Mauro

2010-10-29 21:01:21 UTC

Permalink

Thanks Mike. Good point on the script.

Indeed, use of speculative tracing would be a better
fit here. I'll see if I can get something together and
send it out.

Thanks,
/jim

Post by Mike Gerdts

Post by Robin Cotgrove
Sorry guys. Swap is not the issue. We've had this confirmed by Oracle and I can clearly see there is 96GB of swap awailable on the system and ~50GB of main memory.

By who at Oracle? Not everyone is equally qualified. I would tend to
trust Jim Mauro (who co-wrote the books[1] on Solaris internals,
performance, & dtrace) over most of the people you will get to through
normal support channels.
1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/
How do you know that available swap doesn't momentarily drop? I've
run into plenty of instances where a system has tens of gigabytes of
free memory but is woefully short on reservable swap (virtual memory,
as Jim approximates). Usually "vmstat 1" is helpful in observing
spikes, but as I said before this could miss very short spikes. If
you've already done this to see that swap is unlikely to be an issue,
knowing that would be useful to know. If you are measuring the amount
of reservable swap with "swap -l", you are doing it wrong.
I do agree that there can be other shortfalls that can cause this.
This may call for speculative tracing of stacks across the fork entry
and return calls, displaying results only when the fork fails with
EAGAIN. Jim's second script is similar to what I suggest, except that
it doesn't show the code path taken between syscall::forksys:entry and
syscall::forksys:return.
Also, I would be a little careful running the second script as is for
long periods of time if you have a lot of forksys activity with unique
because the successful forks are not cleared.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
dtrace-discuss mailing list

James Litchfield

2010-10-29 21:29:53 UTC

Permalink

This is what Oracle says about swap for 11gR2. The comment about
subtracting ISM is not
correct. A simple test shows that ISM does consume swap (even if it's
not DISM). Think
about what happens when a memory segment is created (before it goes to
ISM), if someone
happens to attach in non-ISM mode and when everyone detaches from the
segment and it
ceases to be ISM). In the first and last stage swap space is *required*
and the VM system
reserves the space needed when the segment is first created.

I would be cautious about Oracle assurances...

Jim
---

go to the following for full list of available oracle book.
http://www.oracle.com/pls/db112/homepage
which links to the 11gr2 install guide
Db install guides
http://www.oracle.com/pls/db112/portal.portal_db?selected=11&frame=
which links to the following section on memory
http://download.oracle.com/docs/cd/E11882_01/install.112/e17163/pre_install.htm#sthref62
------
2.2.1 Memory Requirements
The following are the memory requirements for installing Oracle
Database 11g Release 2.
*
At least 4 GB of RAM
# /usr/sbin/prtconf | grep "Memory size"
If the size of the RAM is less than the required size, then you must
install more memory before continuing.
*
The following table describes the relationship between installed
On Solaris, if you use non-swappable memory, like ISM, then you
should deduct the memory allocated to this space from the available
RAM before calculating swap space.
RAM Swap Space
Between 4 GB and 16 GB Equal to the size of RAM
More than 16 GB 16 GB
Thanks Mike. Good point on the script.
Indeed, use of speculative tracing would be a better
fit here. I'll see if I can get something together and
send it out.
Thanks,
/jim

Post by Mike Gerdts

Post by Robin Cotgrove
Sorry guys. Swap is not the issue. We've had this confirmed by Oracle and I can clearly see there is 96GB of swap awailable on the system and ~50GB of main memory.

By who at Oracle? Not everyone is equally qualified. I would tend to
trust Jim Mauro (who co-wrote the books[1] on Solaris internals,
performance,& dtrace) over most of the people you will get to through
normal support channels.
1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/
How do you know that available swap doesn't momentarily drop? I've
run into plenty of instances where a system has tens of gigabytes of
free memory but is woefully short on reservable swap (virtual memory,
as Jim approximates). Usually "vmstat 1" is helpful in observing
spikes, but as I said before this could miss very short spikes. If
you've already done this to see that swap is unlikely to be an issue,
knowing that would be useful to know. If you are measuring the amount
of reservable swap with "swap -l", you are doing it wrong.
I do agree that there can be other shortfalls that can cause this.
This may call for speculative tracing of stacks across the fork entry
and return calls, displaying results only when the fork fails with
EAGAIN. Jim's second script is similar to what I suggest, except that
it doesn't show the code path taken between syscall::forksys:entry and
syscall::forksys:return.
Also, I would be a little careful running the second script as is for
long periods of time if you have a lot of forksys activity with unique
because the successful forks are not cleared.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
dtrace-discuss mailing list

_______________________________________________
dtrace-discuss mailing list

--
Oracle <http://www.oracle.com>
James Litchfield | Senior Consultant
Phone: +1 4082237059 <tel:+1%204082237059> | Mobile: +1 4082180790
<tel:+1%204082180790>
Oracle Oracle ACS
California
Green Oracle <http://www.oracle.com/commitment> Oracle is committed to
developing practices and products that help protect the environment

Robin Cotgrove

2010-10-29 22:37:14 UTC

Permalink

Post by James Litchfield
This is what Oracle says about swap for 11gR2. The
comment about
subtracting ISM is not
correct. A simple test shows that ISM does consume
swap (even if it's
not DISM). Think
about what happens when a memory segment is created
(before it goes to
ISM), if someone
happens to attach in non-ISM mode and when everyone
detaches from the
segment and it
ceases to be ISM). In the first and last stage swap
space is *required*
and the VM system
reserves the space needed when the segment is first
created.

I agree with you. In our case disabling the use of DISM really helped to make the platform more stable and helped with overall memory usage.

By the way, we using Oracle 10.2.0.4. No use of Oracle 11gR2 yet.

We have 192GB of physical memory and 96GB of swap device. The SGA/PGA sizes of all the Oracle DB's fit well within the 192GB leaving a consistent ~50GB spare. Memory consumption stays stable on the platform and doesn't go up and down. This is the nature of the Oracle DB's allocating memory at start-up.

Post by James Litchfield
I would be cautious about Oracle assurances...

Yep

Post by James Litchfield
Jim
---

go to the following for full list of available

oracle book.

http://www.oracle.com/pls/db112/homepage
which links to the 11gr2 install guide
Db install guides

http://www.oracle.com/pls/db112/portal.portal_db?selec
ted=11&frame=

which links to the following section on memory

http://download.oracle.com/docs/cd/E11882_01/install.1
12/e17163/pre_install.htm#sthref62

------
2.2.1 Memory Requirements
The following are the memory requirements for

installing Oracle

Database 11g Release 2.
*
At least 4 GB of RAM
To determine the RAM size, enter the
# /usr/sbin/prtconf | grep "Memory size"
If the size of the RAM is less than the required

size, then you must

install more memory before continuing.
*
The following table describes the

relationship between installed

On Solaris, if you use non-swappable memory,

like ISM, then you

should deduct the memory allocated to this space

from the available

RAM before calculating swap space.
RAM Swap Space
Between 4 GB and 16 GB Equal to the size

of RAM

More than 16 GB 16 GB
Thanks Mike. Good point on the script.
Indeed, use of speculative tracing would be a

better

fit here. I'll see if I can get something together

and

send it out.
Thanks,
/jim

On Fri, Oct 29, 2010 at 2:50 PM, Robin

Post by Robin Cotgrove
Sorry guys. Swap is not the issue. We've had this

confirmed by Oracle and I can clearly see there is
96GB of swap awailable on the system and ~50GB of
main memory.

By who at Oracle? Not everyone is equally

qualified. I would tend to

trust Jim Mauro (who co-wrote the books[1] on

Solaris internals,

performance,& dtrace) over most of the people you

will get to through

normal support channels.
1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/
How do you know that available swap doesn't

momentarily drop? I've

run into plenty of instances where a system has

tens of gigabytes of

free memory but is woefully short on reservable

swap (virtual memory,

as Jim approximates). Usually "vmstat 1" is

helpful in observing

spikes, but as I said before this could miss very

short spikes. If

you've already done this to see that swap is

unlikely to be an issue,

knowing that would be useful to know. If you are

measuring the amount

of reservable swap with "swap -l", you are doing

it wrong.

I do agree that there can be other shortfalls that

can cause this.

This may call for speculative tracing of stacks

across the fork entry

and return calls, displaying results only when the

fork fails with

EAGAIN. Jim's second script is similar to what I

suggest, except that

it doesn't show the code path taken between

syscall::forksys:entry and

syscall::forksys:return.
Also, I would be a little careful running the

second script as is for

long periods of time if you have a lot of forksys

activity with unique
large over time

because the successful forks are not cleared.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
dtrace-discuss mailing list

_______________________________________________
dtrace-discuss mailing list

--
Oracle <http://www.oracle.com>
James Litchfield | Senior Consultant
+1 4082180790
<tel:+1%204082180790>
Oracle Oracle ACS
California
Green Oracle <http://www.oracle.com/commitment>
Oracle is committed to
developing practices and products that help protect
the environment
<div id="jive-html-wrapper-div">
This is what Oracle says about swap for 11gR2.
The comment about
subtracting ISM is not 
correct. A simple test shows that ISM does consume
swap (even if
it's not DISM). Think 
about what happens when a memory segment is created
(before it goes
to ISM), if someone 
happens to attach in non-ISM mode and when everyone
detaches from
the segment and it 
ceases to be ISM). In the first and last stage swap
space is
*required* and the VM system 
reserves the space needed when the segment is first
created. 
 
I would be cautious about Oracle assurances... 
 
Jim 
--- 
 
<blockquote type="cite">go to the following for
full list of
available oracle book. 
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
ref="http://www.oracle.com/pls/db112/homepage">http://
www.oracle.com/pls/db112/homepage</a>
 
 
which links to the 11gr2 install guide 
Db install guides 
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
ref="http://www.oracle.com/pls/db112/portal.portal_db?
selected=11&frame=">http://www.oracle.com/pls/db11
2/portal.portal_db?selected=11&frame=</a>
 
 
which links to the following section on memory
 
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
ref="http://download.oracle.com/docs/cd/E11882_01/inst
all.112/e17163/pre_install.htm#sthref62">http://downlo
ad.oracle.com/docs/cd/E11882_01/install.112/e17163/pre
_install.htm#sthref62</a>
 
 
 
------ 
2.2.1 Memory Requirements 
 
The following are the memory requirements for
installing Oracle
Database 11g Release 2. 
 
    * 
 
      At least 4 GB of
RAM 
 
      To determine the RAM
size, enter the following command: 
 
# /usr/sbin/prtconf | grep "Memory size" 
 
If the size of the RAM is less than the required
size, then you
must install more memory before continuing. 
 
    * 
 
      The following
table describes the relationship between
installed RAM and the configured swap space
recommendation: 
 
      Note: 
      On Solaris, if
you use non-swappable memory, like ISM, then
you should deduct the memory allocated to this
space from the
available RAM before calculating swap space.
 
     
RAM     Swap Space 
      Between 4 GB and
16 GB     Equal to the size of
RAM 
      More than 16
GB     16 GB </blockquote>
 
 
 
<blockquote
om"
type="cite">
<pre wrap="">Thanks Mike. Good point on the script.
Indeed, use of speculative tracing would be a better
fit here. I'll see if I can get something together
and
send it out.
Thanks,
/jim
</pre>
<blockquote type="cite">
<pre wrap="">On Fri, Oct 29, 2010 at 2:50 PM, Robin
Cotgrove <a class="moz-txt-link-rfc2396E"
</pre>
<blockquote type="cite">
<pre wrap="">Sorry guys. Swap is not the issue.
We've had this confirmed by Oracle and I can clearly
see there is 96GB of swap awailable on the system
and ~50GB of main memory.
/pre>
</blockquote>
<pre wrap="">
By who at Oracle? Not everyone is equally qualified.
I would tend to
rust Jim Mauro (who co-wrote the books[1] on Solaris
internals,
performance, & dtrace) over most of the people
you will get to through
normal support channels.
1. <a class="moz-txt-link-freetext"
href="http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/">h
ttp://www.amazon.com/Jim-Mauro/e/B001ILM8NC/</a>
How do you know that available swap doesn't
momentarily drop? I've
run into plenty of instances where a system has tens
of gigabytes of
free memory but is woefully short on reservable swap
(virtual memory,
as Jim approximates). Usually "vmstat 1" is helpful
in observing
spikes, but as I said before this could miss very
short spikes. If
you've already done this to see that swap is unlikely
to be an issue,
knowing that would be useful to know. If you are
measuring the amount
of reservable swap with "swap -l", you are doing it
wrong.
I do agree that there can be other shortfalls that
can cause this.
This may call for speculative tracing of stacks
across the fork entry
and return calls, displaying results only when the
fork fails with
EAGAIN. Jim's second script is similar to what I
suggest, except that
it doesn't show the code path taken between
syscall::forksys:entry and
syscall::forksys:return.
Also, I would be a little careful running the second
script as is for
long periods of time if you have a lot of forksys
activity with unique
large over time
because the successful forks are not cleared.
--
Mike Gerdts
<a class="moz-txt-link-freetext"
href="http://mgerdts.blogspot.com/">http://mgerdts.blo
gspot.com/</a>
_______________________________________________
dtrace-discuss mailing list
<a class="moz-txt-link-abbreviated"
</pre>
</blockquote>
<pre wrap="">
_______________________________________________
dtrace-discuss mailing list
<a class="moz-txt-link-abbreviated"
</pre>
</blockquote>
 
 
<div class="moz-signature">-- 
<a href="http://www.oracle.com"
target="_blank"><img
alt="Oracle"
border="0" height="26" width="114"></a> 
nt size="2" color="#666666" face="Verdana, Arial,
Helvetica,
sans-serif">James Litchfield | Senior
Consultant 
Phone: <a href="tel:+1%204082237059">+1
4082237059</a> |
Mobile: <a href="tel:+1%204082180790">+1
4082180790</a> 
Oracle Oracle
ACS 
California 
r>
<a href="http://www.oracle.com/commitment"
target="_blank"><img
alt="Green
Oracle" align="abscenter" border="0"
height="28" width="44"></a>
Oracle is committed to developing
practices and
products that help protect the
environment


</div>
</div>_______________________________________________
dtrace-discuss mailing list

--
This message posted from opensolaris.org

Phil Harman

2010-10-29 22:55:07 UTC

Permalink

Oracle often seems to recommend 1:1 (which is often not enough, especially with DISM). You don't even have 1:1.

Solaris also uses free memory as part of its swap space allocation. Locked memory, such as ISM/DISM eats free memory, and so reduces your available swap further.

You should confirm that DISM is off by running "pmap -x" against a process from each of your DBs (the shared memory should appear as "ism")

Commands like "swap -s" and good ol' "vmstat 5" are useful for monitoring swap. You should also run "echo :: memstat | mdb -k" from time to time to get a feel for hiw your RAM is being used" (on large machines, I've seen it take up to an hour to complete, and it will hig a CPU for the duration, but it seems to have little other impact on the system).

Post by Robin Cotgrove

I agree with you. In our case disabling the use of DISM really helped to make the platform more stable and helped with overall memory usage.
By the way, we using Oracle 10.2.0.4. No use of Oracle 11gR2 yet.
We have 192GB of physical memory and 96GB of swap device. The SGA/PGA sizes of all the Oracle DB's fit well within the 192GB leaving a consistent ~50GB spare. Memory consumption stays stable on the platform and doesn't go up and down. This is the nature of the Oracle DB's allocating memory at start-up.

Post by James Litchfield
I would be cautious about Oracle assurances...

Yep

Post by James Litchfield
Jim
---

go to the following for full list of available

oracle book.

http://www.oracle.com/pls/db112/homepage
which links to the 11gr2 install guide
Db install guides

http://www.oracle.com/pls/db112/portal.portal_db?selec
ted=11&frame=

which links to the following section on memory

http://download.oracle.com/docs/cd/E11882_01/install.1
12/e17163/pre_install.htm#sthref62

------
2.2.1 Memory Requirements
The following are the memory requirements for

installing Oracle

Database 11g Release 2.
*
At least 4 GB of RAM
To determine the RAM size, enter the
# /usr/sbin/prtconf | grep "Memory size"
If the size of the RAM is less than the required

size, then you must

install more memory before continuing.
*
The following table describes the

relationship between installed

On Solaris, if you use non-swappable memory,

like ISM, then you

should deduct the memory allocated to this space

from the available

RAM before calculating swap space.
RAM Swap Space
Between 4 GB and 16 GB Equal to the size

of RAM

More than 16 GB 16 GB
Thanks Mike. Good point on the script.
Indeed, use of speculative tracing would be a

better

fit here. I'll see if I can get something together

and

send it out.
Thanks,
/jim

On Fri, Oct 29, 2010 at 2:50 PM, Robin

Post by Robin Cotgrove
Sorry guys. Swap is not the issue. We've had this

confirmed by Oracle and I can clearly see there is
96GB of swap awailable on the system and ~50GB of
main memory.

By who at Oracle? Not everyone is equally

qualified. I would tend to

trust Jim Mauro (who co-wrote the books[1] on

Solaris internals,

performance,& dtrace) over most of the people you

will get to through

normal support channels.
1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/
How do you know that available swap doesn't

momentarily drop? I've

run into plenty of instances where a system has

tens of gigabytes of

free memory but is woefully short on reservable

swap (virtual memory,

as Jim approximates). Usually "vmstat 1" is

helpful in observing

spikes, but as I said before this could miss very

short spikes. If

you've already done this to see that swap is

unlikely to be an issue,

knowing that would be useful to know. If you are

measuring the amount

of reservable swap with "swap -l", you are doing

it wrong.

I do agree that there can be other shortfalls that

can cause this.

This may call for speculative tracing of stacks

across the fork entry

and return calls, displaying results only when the

fork fails with

EAGAIN. Jim's second script is similar to what I

suggest, except that

it doesn't show the code path taken between

syscall::forksys:entry and

syscall::forksys:return.
Also, I would be a little careful running the

second script as is for

long periods of time if you have a lot of forksys

activity with unique
large over time

because the successful forks are not cleared.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
dtrace-discuss mailing list

_______________________________________________
dtrace-discuss mailing list

--
This message posted from opensolaris.org
_______________________________________________
dtrace-discuss mailing list

James Litchfield

2010-10-30 06:09:58 UTC

Permalink

A recent S10 kernel patch *drastically* reduced the time consumed
by ::memstat. On large systems, it will often take just a minute
or two. I just tried it on a lightly loaded 512GB M9K and it was
less than 3 minutes.

Jim
----

Post by Phil Harman
Oracle often seems to recommend 1:1 (which is often not enough, especially with DISM). You don't even have 1:1.
Solaris also uses free memory as part of its swap space allocation. Locked memory, such as ISM/DISM eats free memory, and so reduces your available swap further.
You should confirm that DISM is off by running "pmap -x" against a process from each of your DBs (the shared memory should appear as "ism")
Commands like "swap -s" and good ol' "vmstat 5" are useful for monitoring swap. You should also run "echo :: memstat | mdb -k" from time to time to get a feel for hiw your RAM is being used" (on large machines, I've seen it take up to an hour to complete, and it will hig a CPU for the duration, but it seems to have little other impact on the system).

Post by Robin Cotgrove

I agree with you. In our case disabling the use of DISM really helped to make the platform more stable and helped with overall memory usage.
By the way, we using Oracle 10.2.0.4. No use of Oracle 11gR2 yet.
We have 192GB of physical memory and 96GB of swap device. The SGA/PGA sizes of all the Oracle DB's fit well within the 192GB leaving a consistent ~50GB spare. Memory consumption stays stable on the platform and doesn't go up and down. This is the nature of the Oracle DB's allocating memory at start-up.

Post by James Litchfield
I would be cautious about Oracle assurances...

Yep

Post by James Litchfield
Jim
---

go to the following for full list of available

oracle book.

http://www.oracle.com/pls/db112/homepage
which links to the 11gr2 install guide
Db install guides

http://www.oracle.com/pls/db112/portal.portal_db?selec
ted=11&frame=

which links to the following section on memory

http://download.oracle.com/docs/cd/E11882_01/install.1
12/e17163/pre_install.htm#sthref62

------
2.2.1 Memory Requirements
The following are the memory requirements for

installing Oracle

Database 11g Release 2.
*
At least 4 GB of RAM
To determine the RAM size, enter the
# /usr/sbin/prtconf | grep "Memory size"
If the size of the RAM is less than the required

size, then you must

install more memory before continuing.
*
The following table describes the

relationship between installed

On Solaris, if you use non-swappable memory,

like ISM, then you

should deduct the memory allocated to this space

from the available

RAM before calculating swap space.
RAM Swap Space
Between 4 GB and 16 GB Equal to the size

of RAM

More than 16 GB 16 GB
Thanks Mike. Good point on the script.
Indeed, use of speculative tracing would be a

better

fit here. I'll see if I can get something together

and

send it out.
Thanks,
/jim

On Fri, Oct 29, 2010 at 2:50 PM, Robin

Post by Robin Cotgrove
Sorry guys. Swap is not the issue. We've had this

confirmed by Oracle and I can clearly see there is
96GB of swap awailable on the system and ~50GB of
main memory.

By who at Oracle? Not everyone is equally

qualified. I would tend to

trust Jim Mauro (who co-wrote the books[1] on

Solaris internals,

performance,& dtrace) over most of the people you

will get to through

normal support channels.
1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/
How do you know that available swap doesn't

momentarily drop? I've

run into plenty of instances where a system has

tens of gigabytes of

free memory but is woefully short on reservable

swap (virtual memory,

as Jim approximates). Usually "vmstat 1" is

helpful in observing

spikes, but as I said before this could miss very

short spikes. If

you've already done this to see that swap is

unlikely to be an issue,

knowing that would be useful to know. If you are

measuring the amount

of reservable swap with "swap -l", you are doing

it wrong.

I do agree that there can be other shortfalls that

can cause this.

This may call for speculative tracing of stacks

across the fork entry

and return calls, displaying results only when the

fork fails with

EAGAIN. Jim's second script is similar to what I

suggest, except that

it doesn't show the code path taken between

syscall::forksys:entry and

syscall::forksys:return.
Also, I would be a little careful running the

second script as is for

long periods of time if you have a lot of forksys

activity with unique
large over time

because the successful forks are not cleared.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
dtrace-discuss mailing list

_______________________________________________
dtrace-discuss mailing list

--
Oracle<http://www.oracle.com>
James Litchfield | Senior Consultant
+1 4082180790
<tel:+1%204082180790>
Oracle Oracle ACS
California
Green Oracle<http://www.oracle.com/commitment>
Oracle is committed to
developing practices and products that help protect
the environment
<div id="jive-html-wrapper-div">
This is what Oracle says about swap for 11gR2.
The comment about
subtracting ISM is not 
correct. A simple test shows that ISM does consume
swap (even if
it's not DISM). Think 
about what happens when a memory segment is created
(before it goes
to ISM), if someone 
happens to attach in non-ISM mode and when everyone
detaches from
the segment and it 
ceases to be ISM). In the first and last stage swap
space is
*required* and the VM system 
reserves the space needed when the segment is first
created. 
 
I would be cautious about Oracle assurances... 
 
Jim 
--- 
 
<blockquote type="cite">go to the following for
full list of
available oracle book. 
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
ref="http://www.oracle.com/pls/db112/homepage">http://
www.oracle.com/pls/db112/homepage</a>
 
 
which links to the 11gr2 install guide 
Db install guides 
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
ref="http://www.oracle.com/pls/db112/portal.portal_db?
selected=11&frame=">http://www.oracle.com/pls/db11
2/portal.portal_db?selected=11&frame=</a>
 
 
which links to the following section on memory
 
<a moz-do-not-send="true"
class="moz-txt-link-freetext"
ref="http://download.oracle.com/docs/cd/E11882_01/inst
all.112/e17163/pre_install.htm#sthref62">http://downlo
ad.oracle.com/docs/cd/E11882_01/install.112/e17163/pre
_install.htm#sthref62</a>
 
 
 
------ 
2.2.1 Memory Requirements 
 
The following are the memory requirements for
installing Oracle
Database 11g Release 2. 
 
    * 
 
      At least 4 GB of
RAM 
 
      To determine the RAM
size, enter the following command: 
 
# /usr/sbin/prtconf | grep "Memory size" 
 
If the size of the RAM is less than the required
size, then you
must install more memory before continuing. 
 
    * 
 
      The following
table describes the relationship between
installed RAM and the configured swap space
recommendation: 
 
      Note: 
      On Solaris, if
you use non-swappable memory, like ISM, then
you should deduct the memory allocated to this
space from the
available RAM before calculating swap space.
 
     
RAM     Swap Space 
      Between 4 GB and
16 GB     Equal to the size of
RAM 
      More than 16
GB     16 GB</blockquote>
 
 
 
<blockquote
om"
type="cite">
<pre wrap="">Thanks Mike. Good point on the script.
Indeed, use of speculative tracing would be a better
fit here. I'll see if I can get something together
and
send it out.
Thanks,
/jim
</pre>
<blockquote type="cite">
<pre wrap="">On Fri, Oct 29, 2010 at 2:50 PM, Robin
Cotgrove<a class="moz-txt-link-rfc2396E"
</pre>
<blockquote type="cite">
<pre wrap="">Sorry guys. Swap is not the issue.
We've had this confirmed by Oracle and I can clearly
see there is 96GB of swap awailable on the system
and ~50GB of main memory.
/pre>
</blockquote>
<pre wrap="">
By who at Oracle? Not everyone is equally qualified.
I would tend to
rust Jim Mauro (who co-wrote the books[1] on Solaris
internals,
performance,& dtrace) over most of the people
you will get to through
normal support channels.
1.<a class="moz-txt-link-freetext"
href="http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/">h
ttp://www.amazon.com/Jim-Mauro/e/B001ILM8NC/</a>
How do you know that available swap doesn't
momentarily drop? I've
run into plenty of instances where a system has tens
of gigabytes of
free memory but is woefully short on reservable swap
(virtual memory,
as Jim approximates). Usually "vmstat 1" is helpful
in observing
spikes, but as I said before this could miss very
short spikes. If
you've already done this to see that swap is unlikely
to be an issue,
knowing that would be useful to know. If you are
measuring the amount
of reservable swap with "swap -l", you are doing it
wrong.
I do agree that there can be other shortfalls that
can cause this.
This may call for speculative tracing of stacks
across the fork entry
and return calls, displaying results only when the
fork fails with
EAGAIN. Jim's second script is similar to what I
suggest, except that
it doesn't show the code path taken between
syscall::forksys:entry and
syscall::forksys:return.
Also, I would be a little careful running the second
script as is for
long periods of time if you have a lot of forksys
activity with unique
large over time
because the successful forks are not cleared.
--
Mike Gerdts
<a class="moz-txt-link-freetext"
href="http://mgerdts.blogspot.com/">http://mgerdts.blo
gspot.com/</a>
_______________________________________________
dtrace-discuss mailing list
<a class="moz-txt-link-abbreviated"
</pre>
</blockquote>
<pre wrap="">
_______________________________________________
dtrace-discuss mailing list
<a class="moz-txt-link-abbreviated"
</pre>
</blockquote>
 
 
<div class="moz-signature">-- 
<a href="http://www.oracle.com"
target="_blank"><img
alt="Oracle"
border="0" height="26" width="114"></a> 
nt size="2" color="#666666" face="Verdana, Arial,
Helvetica,
sans-serif">James Litchfield | Senior
Consultant 
Phone:<a href="tel:+1%204082237059">+1
4082237059</a> |
Mobile:<a href="tel:+1%204082180790">+1
4082180790</a> 
Oracle Oracle
ACS 
California
r>
<a href="http://www.oracle.com/commitment"
target="_blank"><img
alt="Green
Oracle" align="abscenter" border="0"
height="28" width="44"></a>
Oracle is committed to developing
practices and
products that help protect the
environment


</div>
</div>_______________________________________________
dtrace-discuss mailing list

--
This message posted from opensolaris.org
_______________________________________________
dtrace-discuss mailing list

_______________________________________________
dtrace-discuss mailing list

Phil Harman

2010-10-29 22:39:25 UTC

Permalink

+1

I have seen many instancesof this. It is trivial to add swap, but I'm simply tired of the number of times DBA's have protested "we have enough" or even tried FUD like "Oracle won't support us if we add more" (yes, I had that one within the lasr year). Just do it!

As has already been pointed out, Solaris has a swap reservation model. It's a bit like car insurance: you have to have it to drive on the road, but you hope you'll never need it. Solaris won't let you drive underinsured.

This is what Oracle says about swap for 11gR2. The comment about subtracting ISM is not
correct. A simple test shows that ISM does consume swap (even if it's not DISM). Think
about what happens when a memory segment is created (before it goes to ISM), if someone
happens to attach in non-ISM mode and when everyone detaches from the segment and it
ceases to be ISM). In the first and last stage swap space is *required* and the VM system
reserves the space needed when the segment is first created.
I would be cautious about Oracle assurances...
Jim
---

go to the following for full list of available oracle book.
http://www.oracle.com/pls/db112/homepage
which links to the 11gr2 install guide
Db install guides
http://www.oracle.com/pls/db112/portal.portal_db?selected=11&frame=
which links to the following section on memory
http://download.oracle.com/docs/cd/E11882_01/install.112/e17163/pre_install.htm#sthref62
------
2.2.1 Memory Requirements
The following are the memory requirements for installing Oracle Database 11g Release 2.
*
At least 4 GB of RAM
# /usr/sbin/prtconf | grep "Memory size"
If the size of the RAM is less than the required size, then you must install more memory before continuing.
*
On Solaris, if you use non-swappable memory, like ISM, then you should deduct the memory allocated to this space from the available RAM before calculating swap space.
RAM Swap Space
Between 4 GB and 16 GB Equal to the size of RAM
More than 16 GB 16 GB
Thanks Mike. Good point on the script.
Indeed, use of speculative tracing would be a better
fit here. I'll see if I can get something together and
send it out.
Thanks,
/jim

Post by Mike Gerdts

Post by Robin Cotgrove
Sorry guys. Swap is not the issue. We've had this confirmed by Oracle and I can clearly see there is 96GB of swap awailable on the system and ~50GB of main memory.

By who at Oracle? Not everyone is equally qualified. I would tend to
trust Jim Mauro (who co-wrote the books[1] on Solaris internals,
performance, & dtrace) over most of the people you will get to through
normal support channels.
1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/
How do you know that available swap doesn't momentarily drop? I've
run into plenty of instances where a system has tens of gigabytes of
free memory but is woefully short on reservable swap (virtual memory,
as Jim approximates). Usually "vmstat 1" is helpful in observing
spikes, but as I said before this could miss very short spikes. If
you've already done this to see that swap is unlikely to be an issue,
knowing that would be useful to know. If you are measuring the amount
of reservable swap with "swap -l", you are doing it wrong.
I do agree that there can be other shortfalls that can cause this.
This may call for speculative tracing of stacks across the fork entry
and return calls, displaying results only when the fork fails with
EAGAIN. Jim's second script is similar to what I suggest, except that
it doesn't show the code path taken between syscall::forksys:entry and
syscall::forksys:return.
Also, I would be a little careful running the second script as is for
long periods of time if you have a lot of forksys activity with unique
because the successful forks are not cleared.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
dtrace-discuss mailing list

_______________________________________________
dtrace-discuss mailing list

--
<oracle_sig_logo.gif>
James Litchfield | Senior Consultant
Phone: +1 4082237059 | Mobile: +1 4082180790
Oracle Oracle ACS
California
<green-for-email-sig_0.gif> Oracle is committed to developing practices and products that help protect the environment
_______________________________________________
dtrace-discuss mailing list

Robin Cotgrove

2010-10-29 21:23:20 UTC

Permalink

On Fri, Oct 29, 2010 at 2:50 PM, Robin Cotgrove

Post by Robin Cotgrove
Sorry guys. Swap is not the issue. We've had this

confirmed by Oracle and I can clearly see there is
96GB of swap awailable on the system and ~50GB of
main memory.
By who at Oracle? Not everyone is equally qualified.
I would tend to
rust Jim Mauro (who co-wrote the books[1] on Solaris
internals,
performance, & dtrace) over most of the people you
will get to through
normal support channels.

Agreed. The normal support channel told us the GUDS script would be better to capture the root cause over producing a memory dump.

1. http://www.amazon.com/Jim-Mauro/e/B001ILM8NC/
How do you know that available swap doesn't
momentarily drop?

Because I have been monitoring it during the issues with vmstat and I also understand the workload on the platform to know that nothing is starting with huge memory requirements suddenly. This is a VCS cluster with Oracle Database Resource Groups. DISM usage by the various Oracle DB's is not in use as we ran into that a bug with that some months ago. We've seen patched the system but we don't need the use of DISM on this dev/test Oracle VCS cluster.

I've run into plenty of instances where a system has tens

of gigabytes of
free memory but is woefully short on reservable swap
(virtual memory,
as Jim approximates). Usually "vmstat 1" is helpful
in observing
spikes, but as I said before this could miss very
short spikes. If
you've already done this to see that swap is unlikely
to be an issue,
knowing that would be useful to know. If you are
measuring the amount
of reservable swap with "swap -l", you are doing it
wrong.

Agreed. I don't use it and I don't trust the output from the top utility either :-)

I do agree that there can be other shortfalls that
can cause this.
This may call for speculative tracing of stacks
across the fork entry
and return calls, displaying results only when the
fork fails with
EAGAIN. Jim's second script is similar to what I
suggest, except that
it doesn't show the code path taken between
syscall::forksys:entry and
syscall::forksys:return.
Also, I would be a little careful running the second
script as is for
long periods of time if you have a lot of forksys
activity with unique
large over time
because the successful forks are not cleared.
--
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
dtrace-discuss mailing list

--
This message posted from opensolaris.org

James Litchfield

2010-10-29 19:57:33 UTC

Permalink

I would start with adding swap. oracle's swap recommendations are
utterly bogus.

Jim
===

Post by Jim Mauro
Mike is correct. Pretty much every time I've seen this, it's
VM (VM = virtual memory = swap) related.
There's a DTrace script below you can run when you hit this
problem that will show us which system call is failing with an
EAGAIN error. It is most likely fork(2) (and yes, I know printing
the errno in the return action is superfluous given we use it
in the predicate - it's me being OCD and sanity checking).
A second DTrace script further down should provide a kernel
stack trace if it is a fork(2) failure.
Or....(disk is cheap) "swap -a" (add swap space) and see if the
problem goes away.
Thanks
/jim
#!/usr/sbin/dtrace -s
#pragma D option quiet
syscall:::entry
{
self->flag[probefunc] = 1;
}
syscall:::return
/self->flag[probefunc]&& errno == 11/
{
printf("syscall: %s, arg0: %d, arg1: %d, errno: %d\n\n",probefunc,arg0,arg1,errno);
self->flag[probefunc] = 0;
}
------------------------------------------------------------------------------------------------------------------------
#!/usr/sbin/dtrace -s
#pragma D option quiet
syscall::forksys:entry
{
self->flag = 1;
@ks[stack(),ustack()] = count();
}
syscall::forksys:return
/self->flag&& arg0 == -1&& errno != 0/
{
printf("fork failed, errno: %d\n",errno);
exit(0);
}

Post by Robin Cotgrove
I need some assistance and guidance in writing a DTRACE script or even better, finding an example one which would help me identify what's going on our system. Intermittently, and we think it might be happening after about 60 days, on a E2900, 192GB, 24 core, Solaris 10 11.06 system with a fairly new patch cluster (Generic_142900-13) we are running into a problem whereby we suddenly hit a problem which results in processes failing to start and getting the error message 'resource temporarily unavailable' error. This is leading to Oracle crash/startup issues.
I ran a simple du command at the time it was happening at got the following response.
du: No more processes: Resource temporarily unavailable
Approximately 6500 TCP connections on server at time. 6000 unix processes. The max UNIX processes per user is set to 29995. 60GB free physical memory and no swap being used. Absolutely baffling us at mo.
Not managed to truss a failing command when it happened yet because it's so intermitttent in it's nature.
We've checked all the usual suspects including max processes per users and cannot find the cause. Need a way to monitor all the internal kernel resources to see what we're hitting. Suggestions please on a postcard. All welcome.
Robin Cotgrove
--
This message posted from opensolaris.org
_______________________________________________
dtrace-discuss mailing list

_______________________________________________
dtrace-discuss mailing list

Robin Cotgrove

2010-11-01 15:38:50 UTC

Permalink

Thanks Jim.

We'll run those 2 dtrace if and when it happens again. Box has been rebooted and issue (which was seen 2 days in a row) has not re-occured. The workload has not changed. The virtual stays ~flat-lined all the time on the box because of the nature of the workload. The issue seems to re-occur every 60 days, so I think something is leaking, but I'm still convinced it's not system virtual memory.

Interesting what you say about the ISM/DISM stuff. The way we stopped DISM being used for Oracle 10g was setting the SGA MAX and TARGET to equal values in the init.ora etc.. and that stopped an ORACLE process per sid called ora_dism_xxxxxxxx on startup. Things behaved much better once we did that. We believed we were experiencing a known Solaris bug around DISM leak. We have since patched the system but not re-enabled the use of DISM.

--
This message posted from opensolaris.org

Enda o'Connor - Sun Microsystems Ireland - Software Engineer

2010-11-01 16:21:09 UTC

Permalink

Post by Robin Cotgrove
Thanks Jim.
We'll run those 2 dtrace if and when it happens again. Box has been rebooted and issue (which was seen 2 days in a row) has not re-occured. The workload has not changed. The virtual stays ~flat-lined all the time on the box because of the nature of the workload. The issue seems to re-occur every 60 days, so I think something is leaking, but I'm still convinced it's not system virtual memory.
Interesting what you say about the ISM/DISM stuff. The way we stopped DISM being used for Oracle 10g was setting the SGA MAX and TARGET to equal values in the init.ora etc.. and that stopped an ORACLE process per sid called ora_dism_xxxxxxxx on startup. Things behaved much better once we did that. We believed we were experiencing a known Solaris bug around DISM leak. We have since patched the system but not re-enabled the use of DISM.

Hi
Just out of interest could you check the permissions on oradism

ls -l $OH/bin/oradism

it should have root ownership and sticky bit set.
-bash-3.00$ ls -l oradism
-rwsr-x--- 1 root oinstall 1320256 Sep 11 09:48 oradism
-bash-3.00$

Have seen the very odd case of people tarring an OH as oracle user and
extracting as oracle user and losing the root bit/sticky bit, which
means oradism is effectively broke. I haven't read the thread in
entirety but certainly things like performance will suffer, also for
DISM as it's not pinned entirely in memory at all times, it must be
backed with SWAP.
Also to use ISM set SGA_MAX_SIZE to be either equal to or smaller than
the constituents of the entire SGA, not SGA_MAX.

But for DISM, swap is vital though, very very vital in the long run.

If you have the alert logs from when it was running with DISM, grep for
WARNING and see if any relate to dism, also if trying DISM check the
GRANSTATE in x$ksmge view to see if all allocations are locked.

Also is this x86/sparc, what type of box, are the oracle binaries
actually local etc.

Enda