Teach Yourself Oracle 8 In 21 Days
Teach Yourself Oracle 8 In 21 Days
- Day 20 -
Effectively Tuning and Optimizing the Database
Performance can be one of the most fun or most frustrating aspects of administering a database. It is almost always extremely challenging, and forces you to use all your experience to isolate the bottleneck and determine a response. I find performance tuning and system optimization to be extremely rewarding and challenging; I enjoy it, and I hope you do too.
I hate to refer to performance bottlenecks as bugs or problems because those terms suggest something is broken. Sometimes bottlenecks occur because you have reached the capacity of your system, and no tuning or changes can alter that fact; other solutions must be found. The capacity planner must try to anticipate both when the system will reach capacity and what new resources are needed before this happens.
NOTE: Oracle instance and OS tuning should be performed in conjunction with application tuning. Because much of the instance tuning is based on the data-access patterns and memory usage of the application, changes to the application might result in the opportunity to retune the instance for greater optimization. A perfectly optimized Oracle instance cannot make up for a poorly tuned application. Unfortunately, this book does not cover application tuning in detail, but several good books on the subject are available. In fact, I have written just such a book: Oracle Performance Tuning and Optimization (Sams Publishing).
Performance tuning and capacity planning are closely related. They rely on and play off of each other. The capacity-planning stage assumes that you have a well-tuned system, and bases most of its recommendations on this. The system-tuning stage assumes you have a system that can support the number of users and the number of tasks being performed. The capacity-planning and system-performance engineers often work together to determine where the bottleneck is and how to resolve it.
What Is a Bottleneck?
New Term: The term bottleneck refers to the component or components that have a limiting factor on the system. Some bottlenecks result from inherent limitations of your system, but others can be fixed via parameter tuning. Many times, additional components are needed to fix bottlenecks. System tuning typically involves finding system bottlenecks and eliminating them. Examples of bottlenecks that are typically found in an Oracle system include
- Disk drive bottleneck--A disk drive can handle only so many I/Os per second before
latencies increase to unacceptable levels. The systemwide I/Os per second is fixed,
so if your CPU and memory system requests I/Os more quickly than the I/O subsystem
can deliver, a disk bottleneck will occur.
- Memory bottleneck--If you have inadequate memory, you won't have a good cache-hit
rate. A high cache-hit rate indicates that most of the data you need is in the Oracle
buffer cache. This reduces the number of needed I/Os.
- CPU bottleneck--In the case of a CPU bottleneck, you are running your system
as quickly as the CPUs can run.
- Network bottleneck--A network bottleneck occurs when so much traffic is over
the network, you can't transfer data any faster. This is important to backup and
- Bus bottleneck--A bus bottleneck occurs when your hardware cannot transfer data
across the CPU to memory or I/O bus. This is a hardware limitation.
- Other bottlenecks--Numerous other bottlenecks, either hardware, software, or a combination of the two, can occur.
Some bottlenecks can be fixed easily, and others are difficult to overcome. Later today you will see how to determine the location of a bottleneck and receive some hints on how to fix bottlenecks.
NOTE: Every system is different. There is no magic diagnostic tool that can tell you how to tune your system. It is an acquired skill that you will develop with practice.
Performance Tuning Methodology
Determining what, if anything, is a bottleneck in your system can be difficult. I have found that following a tuning methodology or plan can help you to reach the correct conclusion in the shortest amount of time. By tuning methodology, I mean a way of looking at performance issues and determining of the root of those issues. Here is the performance tuning and optimization methodology I use:
- 1. Analyze the system--Determine whether a performance issue actually
exists. You might determine this by running transactions and checking response times
or by monitoring the system statistics. You should look at OS stats as well as Oracle
2. Determine the cause--Here you form a hypothesis about where the bottleneck is by looking at the analysis from step 1.
3. Develop a solution or a test--If you are certain you have determined where the bottleneck is, develop a solution for it. If not, develop a test to verify your hypothesis. When you develop the test, you should know what the results of the test will tell you.
4. Run the test or implement the solution--If you have hit upon a solution, implement it. If not, run your test under carefully controlled conditions. This test will be used to determine whether you are right or wrong.
5. Analyze the results--After the test has been run, you must analyze the result. Always view your results with skepticism. If something is too good to be believed, perhaps you should not believe it.
NOTE: You should know what you expect from the test you have developed. Many times a test result can provide good data if the result is positive, but no useful information if the result is negative. For example, adding another disk drive will tell you that you had a disk bottleneck if this addition increases performance, but does not tell you what the problem is if performance does not increase. These tests are still valuable, even though only one result gives you good data.
Although you don't need to use my system, it helps if you develop some system for Troubleshooting.
Analyze the System
Analyzing the system gives you a good place to start. The first step in analyzing a system is to determine whether you have a problem. You should periodically monitor the following:
- Response times--Do users complain that the system seems slower? Periodically
ask your user community how the performance of the system seems to them; some-times
end users are reluctant to tell you about prospective problems until they reach a
critical state. It is a good idea to have a test query that you can run to compare
current response times with previous response times. If your test query takes significantly
longer to run than previously, you are seeing some kind of system slowdown.
- Throughput--Look at the system as a whole by examining the system performance
monitor (perfmon for NT, sar for UNIX). Compare current and past results. Is your
CPU utilization significantly higher than in previous samples? What about disk usage?
You will look at these in more detail later today.
- Your intuition--Sometimes the system just feels wrong. If you think something isn't quite right, check it out. You know your system better than anyone.
After you analyze the system, you might determine that there are no performance issues, that everything is running fine. Great! But if you do come across performance problems, you can start determining the cause and possible solutions.
Determine the Cause
The next stage is to form a hypothesis of what you think the problem is. This should be based on analysis of the system and what you know of the system's design and capacity. For example, if you have a system with five 4.3GB SCSI disk drives for datafiles, the system should be able to support up to 350 I/Os per second before you see large disk latencies. If you are driving those disk drives at 450 I/Os per second, you could hypothesize that you are having a disk I/O problem. This hypothesis should be specific enough to help lead you to a solution to the bottleneck.
Develop a Solution or a Test
Develop a solution to the bottleneck or some sort of test to confirm your hypothesis. Whether you need to develop a solution or a test depends on how certain you are about having discovered the bottleneck and how expensive the solution is. If you are certain you have a disk bottleneck and have more disk drives available, it is not necessary to test your theory. But if you are not certain or think that other issues might be involved, it is a good idea to develop a test. Be sure your test will give you meaningful information.
WARNING: Remember the difference between cause and effect. You might see something that looks like a disk bottleneck when in fact a lack of memory is causing unnecessary I/O.
Anticipate Your Results
The test you devise to confirm or rule out a certain bottleneck should lead to expected results. If you are unsure what positive and negative results would indicate, the test is not worth running. I often run tests where a positive result is meaningful but a negative tells me nothing. This test is still worth running because that positive result might either solve the problem or rule out a cause. Some examples of the kinds of testing you can do are shown later today.
Run the Test or Implement the Solution
Either run a test that will give you valuable information about the bottleneck or try to fix the problem. Carefully monitor the system in both cases. Even if it seems that the performance of the system has immediately and dramatically improved, you should still monitor the system and collect some data. If you run into a performance bottleneck in the future, this data can be valuable.
Analyze the Results
Monitor the system after a change has been made or a test has been run to determine whether the bottleneck has been eliminated. In the case of a capacity issue, the bottleneck might appear to have been eliminated, but the fix might be only temporary. Carefully analyze your system so you can anticipate when you will run out of capacity again.
You will sometimes find that the results of your test prove that your hypothesis was incorrect. If this is the case, return to step 2 to determine the cause of the bottleneck.
Question Your Results
Sometimes testing does not afford a valid result. This typically occurs when you make a change that you expect to either increase or decrease performance, and neither occurs. This indicates that one of two things has occurred:
- The bottleneck is somewhere else--If you have a bus bandwidth problem and add
more memory, there will be no improvement because you are limited by the speed of
- The test was flawed--This could occur for any number of reasons. Perhaps you modified the wrong parameter file, or perhaps the change requires a system reboot.
If your results are flawed, be skeptical. Check the results. It's okay if you don't believe the results and want to try another test. Only rarely does a test turn out to be flawed or do other problems occur.
Operating System Overview
If you understand the basic concepts of how the operating system works, you will have a better feel for how to monitor and tune the operating system.
Microsoft Windows NT, a relatively new operating system, has quickly gained popularity. Windows NT has file and print services similar to those offered by NetWare, but Windows NT is used primarily as an application server or client operating system. Windows NT comes in two varieties:
- NT Workstation--NT Workstation is designed for the client users and does not contain many of the management pieces that come standard with NT Server.
- NT Server--NT Server includes the management tools necessary for maintaining a server.
Windows NT, a multiprocessor operating system, can take advantage of scaleable performance increases via the addition of CPUs. Windows NT is also a server operating system; there is no facility within Windows NT to provide login functionality from terminals. From the beginning, Windows NT was designed to support clients over network connections. The Windows NT operating system provides functionality such as 16-bit Windows application support and a GUI, but because of this functionality, some overhead is associated with Windows NT.
Architectural Overview of Windows NT
New Term: Windows NT is based on a microkernel architecture. In a microkernel architecture, the core kernel is small because most of the OS functionality is removed from the kernel. Because of this, the system is modular; large pieces of the OS can be easily replaced. Although a microkernel does not provide additional performance, it does provide a flex-ibility that OS and device-driver developers can exploit. Because of the flexible subsystem architecture, OS changes such as file systems, hardware architecture, and memory subsystems can easily be replaced.
In Windows NT, hardware support is provided through the Hardware Abstraction Layer (HAL). The HAL is employed by hardware vendors to support their systems' base architectures and to provide device-driver support. Because of the microkernel and HAL, Windows NT can support different architectures such as Intel, DEC Alpha, MIPS, PowerPC, and so on. The microkernel is the common code base in which each of these architectures is rooted.
Most 16-bit applications written for Windows can run on non-Intel architectures through the use of a compatibility mode driver. Applications written specifically for Windows NT can be ported to non-Intel platforms with minimal effort. However, applications that have not been written to take advantage of a particular architecture might not run optimally. Oracle is currently supported only on the Intel architecture. As with all Oracle ports, Oracle has been optimized for Windows NT and specifically optimized for Windows NT on Intel platforms.
The Windows NT architecture provides for the use of threads, sometimes known as lightweight processes. Through the use of threads instead of processes, much of the overhead associated with process switching is reduced. Threads are automatically created when the Oracle instance is started (see Table 20.1).
Table 20.1. Oracle service threads.
|Thread number||Oracle process|
|0, 1||Oracle Service|
A shadow thread is created on behalf of each user accessing the Oracle database. Remember, The shadow processes communicate with the user and interact with Oracle to carry out the user's requests. For example, if the user process requests a piece of data not already in the SGA, the shadow process is responsible for reading the data blocks from the datafiles into the SGA. Under Windows NT, these processes are invoked as kernel threads. Each of these threads has its own thread number.
UNIX, born in 1969, predates NetWare, Windows NT, and OS/2 by many years. UNIX is different from NetWare, Windows NT, and OS/2 in that it was not designed as a client/server operating system. That is not to say that UNIX is not now used as a network server, but that network service was not the original intent of the operating system.
Even though UNIX is fairly standard, a divergence in UNIX has occurred over the years. Because almost every major computer company develops and sells a version of UNIX, there are differences in the UNIX offerings from the different vendors.
The PC server market boasts three main UNIX versions: SCO UNIX, SCO UnixWare, and Solaris from SunSoft. IBM calls its UNIX offering AIX. Hewlett-Packard has a UNIX operating system called HP-UX; AT&T Global Information Solutions simply calls its product UNIX System V. SunSoft produces versions of UNIX for both Intel and Sun SPARC processors; its products are called Solaris and Solaris X86. Currently, most vendors (with the exception of SCO) base their versions of UNIX on one common core: UNIX System V Release 4, which is sometimes referred to as UNIX SVR4.
In many cases, applications are binarily compatible between operating systems on the same platform, but it is not uncommon for the vendor to add special enhancements in the operating system for performance. Oracle always takes advantage of OS-specific features, even though it would be simpler to ship just one binary.
Architectural Overview of UNIX
UNIX consists of a core piece called the kernel surrounded by applications and tools. The kernel contains all hardware support, device drivers, scheduling routines, and the network stack. Unlike the microkernel architecture used in Windows NT, the UNIX kernel contains all the core operating system functionality.
UNIX is a virtual-memory operating system but is very configurable. In the UNIX operating system, not all memory is allocated and used in the same manner. Shared memory used for the Oracle SGA is treated differently from normal memory. Shared memory is allocated at boot time, is not available to general user processes, and must be allocated through shared memory system calls in the operating system.
The fact that shared memory is treated differently allows certain functions to occur. Some varieties of UNIX allow you to allocate shared memory using a 4MB memory page size. This arrangement cuts down on page entries that must be maintained by the operating system and guarantees larger chunks of contiguous memory. Shared memory is also unique in that it is nonpageable. This means that if you have enough shared memory to create the SGA at instance startup, you don't ever have to worry about the SGA being paged out.
Other features that have been added over the years to the UNIX operating system to improve database performance include realtime processing, high-speed semaphores, and asynchronous I/O. Not all these features are in every implementation of UNIX. Check with your OS vendor to see what is available for your environment.
Finding a Performance Bottleneck
Finding a performance bottleneck can be difficult. There is no proven formula or list of things to look for that immediately tells you what the problem is. Finding a performance bottleneck is an investigative task, and the more experience you get, the better at it you will be.
Knowing Your System's Limitations
To determine whether a system component is prompting a performance bottleneck, ascertain the limitations of that component when possible. Doing so allows you to determine whether you are exceeding that limitation. Unfortunately, these limitations are difficult to gauge in most cases, but it is possible to characterize the component in some cases. The limitations of the I/O subsystem and in the networking components are ascertainable.
The I/O Subsystem
Some pieces of the I/O subsystem, such as the disk drive, have been well tested and documented over the last few years; limitations of the controller and bus are not as well known. The disk drive is a physical piece of hardware that usually comes with a specification. This specification tells you certain things about the disk drive, such as
- Average seek time--This is the average time it takes for the disk arm to move
from one location to another. This relates to the number of random I/Os per second
that you can perform.
- Average track-to-track seek time--This is the time it takes to move to the next
adjacent track on the disk drive. This relates to the number of sequential I/Os per
second that you can perform.
- Transfer time--This is the time it takes to move the data from the disk to the
- Rotations per second--This is how quickly the disk spins. This relates to how quickly you can rotate to the next bit of data on the disk.
A typical disk drive might have an average seek time of 9 milliseconds; for random I/Os, it takes 9 milliseconds to move to the next data. Add 1 millisecond for data transfer to make 10 milliseconds per random I/O. Divide 1 second by 10 milliseconds to see how many times per second you can perform a random seek:
Seeks = 1 Sec / 0.01 Sec/Seek = 100 Seeks
On Day 4, "Properly Sizing Your Database and Planning for Growth," you saw the "knee of the curve" theory. According to that theory, you don't want to exceed 60-70% of your available capacity; otherwise, times increase. This is why you should not exceed 60-70 seeks per second for random I/O.
The Network Subsystem
The network subsystem is another area where you can compare your throughput against a known maximum. Ethernet networks are available in a number of different throughput levels. This is usually indicated in the name of the network hardware. A few of the network hardware specifications include
- 10BaseT--This is a 10Mb/second ethernet. With this hardware you can have a peak
throughput of 10Mb, or approximately 1.25MB/second.
- 100BaseT--This is a 100Mb/second ethernet. With this hardware you can have a peak throughput of 100Mb, or approximately 12.5MB/second.
Monitor your system to you determine whether you are close to meeting or exceeding the available throughput. Remember that smaller packets are not as efficient, so if you send many small requests (common in database applications), you might not realize the full potential of the network.
A good way to determine how much network bandwidth you are using is by employing a network monitor. Network monitors can be hardware or software devices. Most operating systems do not provide any way to monitor the network from your system because your network card normally passes only packets addressed to that machine. Other packets are ignored.
Memory problems sometimes manifest themselves as other problems. For example, a lack of memory might cause excessive I/O, which will appear as an I/O problem. Of prime importance is the fact that the system is not paging or swapping. Paging and swapping occur when no physical memory is available for code processing. The CPU pages, or moves data out of memory, onto a paging area on disk. This operation can be quite expensive. If you are seeing significant paging, add more memory to the system or reduce the Oracle DB block buffer cache to free more memory.
WARNING: If your system is paging or swapping, you have a severe performance problem. No amount of tuning will make up for the performance degradation caused by this paging or swapping. If you are seeing excessive paging, get more memory. If you are at the limit of memory that your system can support, start looking for a new system.
Monitoring the System
After you know your limitations, you can determine whether you are exceeding them by monitoring the system. There are several ways to monitor the system; one of the best ways under Windows NT is with the Performance Monitor, or perfmon. On a UNIX system, you can monitor many objects with the sar utility. Most operating systems also offer a graphical monitoring tool. These tools are good, but are specific to the brand of hardware you purchase. Look in your OS documentation for instructions on how to use these tools.
Oracle provides a set of useful scripts for monitoring the performance of the RDBMS. These scripts are called UTLBSTAT (begin) and UTLESTAT (end). UTLBSTAT should be run first, followed shortly thereafter by UTLESTAT. UTLESTAT displays a large number of statistic and counter values based on the time since UTLBSTAT was run.
Select Programs | Administrative Tools (Common) | Performance Monitor to invoke perfmon. The Performance Monitor appears in chart mode, which is fine for small numbers of items (such as a trend over an extended period of time). If you are looking at a large amount of data, use report mode; in this mode, values are displayed onscreen as numerical values. Invoke report mode by selecting it via the View drop-down menu or by clicking the Report button.
WARNING: perfmon adds some CPU overhead to the system. If you extend the report inverval, perfmon will run less often, thus reducing CPU overhead. Increasing the report interval also increases the accuracy of the report. I recommend setting the report interval to 15 or 30 seconds.
Looking for I/O Bottlenecks with perfmon
To use perfmon to view events in the system, you must select counters. Each of these counters represents a different event that is monitored. To view counter information, do the following:
- 1. Click the + button to add a new counter.
2. From the Add to Report screen, choose the PhysicalDisk object from the Object drop-down menu, as shown in Figure 20.1.
Choose the PhysicalDisk object.
- 3. You are presented with a number of objects from which you can select. You can monitor many useful objects, but my opinion is that the following items are most important in monitoring the disk I/O subsystem:
- Disk Reads/Sec--The number of reads per second to that disk or volume.
- Disk Writes/Sec--The number of writes per second to that disk or volume.
- Disk Transfers/Sec--The total number of reads and writes per second to that disk
- Avg. Disk Sec/Read--The average time it takes for the read operation to occur.
- Avg. Disk Sec/Write--The average time it takes for the read operation to occur.
- Avg. Disk Sec/Transfer--The average time it takes for the read and write operations
- Avg. Disk Queue Length--The average number of I/Os in the disk I/O subsystem.
NOTE: A hardware disk array will appear to the NT system as a single disk. Divide the number of I/Os per second by the number of disk drives to get the I/Os per second per disk drive. This is not 100% accurate because the I/O balance may not be perfect, but it is the best that you can do.
Of particular note should be the number of I/Os per second per disk drive and the disk latency. A typical disk drive or disk array should complete I/Os in 20-30 milliseconds (0.020-0.030 seconds). If any volume is much higher than that (more than 0.060 seconds), you are probably seeing some sort of disk bottleneck. Consider adding more disk drives.
If you are running on UNIX, you can get this information using sar or other monitoring tools.
Looking for Memory Bottlenecks with perfmon
In Windows NT, the Pages/Sec counter under the Memory objects can be used to indicate that the system is paging. If you see significant activity with this counter, you are using too much memory. If you cannot add more memory, you should reduce the memory used by Oracle and other processes or reduce the number of users on the system.
Using UTLBSTAT and UTLESTAT
Within the directory \ORANT\RDBMS80\ADMIN are a number of scripts used internally within Oracle. You ran CATPROC.ORA and CATALOG.ORA from this directory when you created a database. UTLBSTAT and UTLESTAT are also run from here.
UTLBSTAT sets up some internal tables and takes a snapshot of some of the internal Oracle counters. After you run under load for a time, UTLESTAT takes another snapshot and compares the results. UTLESTAT displays a large number of statistic and counter values based on the time since UTLBSTAT was run.
NOTE: The UTLBSTAT.SQL and UTLESTAT.SQL scripts each have a CONNECT INTERNAL string at the top. Most likely, this will not work under NT. Simply comment out this line and run the script from the INTERNAL user account. Place the word Rem at the beginning of those lines.
The UTLBSTAT and UTLESTAT scripts can be run from the SVRMGR30 utility. After you connect in as INTERNAL or SYS, you can run the UTLBSTAT script with the following syntax:
After you run for a while under load, you can finish monitoring Oracle and view the results by running the UTLESTAT script with the following syntax:
NOTE: The usefulness of the data retrieved from UTLBSTAT and UTLESTAT might depend on the length of time for which the data is collected. When looking at specific events, five minutes might be enough. When looking at long-term performance, you might need to run for hours or more. If you are interested in a specific event such as a long-running query, the runtime of the event will be sufficient.
The result of UTLESTAT is, in its entirety, quite long. Following is the resulting data from UTLESTAT only; the echo of the SQL statements has been removed:
LIBRARY GETS GETHITRATI PINS PINHITRATI RELOADS INVALIDATI ------------ ---------- ---------- ---------- ---------- ---------- ---------- BODY 0 1 0 1 0 0 CLUSTER 97 1 50 1 0 0 INDEX 0 1 0 1 0 0 OBJECT 0 1 0 1 0 0 PIPE 0 1 0 1 0 0 SQL AREA 120 .808 364 .874 0 0 TABLE/PROCED 235 .923 265 .879 0 0 TRIGGER 0 1 0 1 0 0 Statistic Total Per Transact Per Logon Per Second ---------------------------- ----------- ------------ ----------- ----------- DBWR buffers scanned 601 601 150.25 2.86 DBWR free buffers found 581 581 145.25 2.77 DBWR lru scans 35 35 8.75 .17 DBWR make free requests 35 35 8.75 .17 DBWR summed scan depth 601 601 150.25 2.86 DBWR timeouts 68 68 17 .32 SQL*Net roundtrips to/from 129 129 32.25 .61 background timeouts 210 210 52.5 1 buffer is not pinned count 2175 2175 543.75 10.36 buffer is pinned count 6343 6343 1585.75 30.2 bytes received via SQL*Net 11212 11212 2803 53.39 bytes sent via SQL*Net to c 8270 8270 2067.5 39.38 calls to get snapshot scn: 145 145 36.25 .69 calls to kcmgas 1 1 .25 0 calls to kcmgcs 10 10 2.5 .05 calls to kcmgrs 181 181 45.25 .86 cleanouts only - consistent 5 5 1.25 .02 cluster key scan block gets 801 801 200.25 3.81 cluster key scans 506 506 126.5 2.41 commit cleanouts 16 16 4 .08 commit cleanouts successful 16 16 4 .08 consistent gets 4557 4557 1139.25 21.7 cursor authentications 6 6 1.5 .03 db block changes 104 104 26 .5 db block gets 141 141 35.25 .67 enqueue releases 26 26 6.5 .12 enqueue requests 20 20 5 .1 execute count 144 144 36 .69 free buffer inspected 7 7 1.75 .03 free buffer requested 1383 1383 345.75 6.59 immediate (CR) block cleano 5 5 1.25 .02 logons cumulative 4 4 1 .02 messages received 38 38 9.5 .18 messages sent 39 39 9.75 .19 no work - consistent read g 4063 4063 1015.75 19.35 opened cursors cumulative 109 109 27.25 .52 parse count (hard) 23 23 5.75 .11 parse count (total) 120 120 30 .57 physical reads 1369 1369 342.25 6.52 physical writes 45 45 11.25 .21 recursive calls 2030 2030 507.5 9.67 redo blocks written 49 49 12.25 .23 redo entries 67 67 16.75 .32 redo size 25360 25360 6340 120.76 redo small copies 67 67 16.75 .32 redo synch writes 1 1 .25 0 redo wastage 644 644 161 3.07 redo writes 3 3 .75 .01 session logical reads 4698 4698 1174.5 22.37 session pga memory 645168 645168 161292 3072.23 session pga memory max 645168 645168 161292 3072.23 session uga memory 13772 13772 3443 65.58 session uga memory max 144296 144296 36074 687.12 sorts (memory) 85 85 21.25 .4 sorts (rows) 783 783 195.75 3.73 table fetch by rowid 2148 2148 537 10.23 table fetch continued row 2 2 .5 .01 table scan blocks gotten 1085 1085 271.25 5.17 table scan rows gotten 456 456 114 2.17 table scans (long tables) 2 2 .5 .01 table scans (short tables) 17 17 4.25 .08 total file opens 5 5 1.25 .02 user calls 139 139 34.75 .66 user commits 1 1 .25 0 write requests 6 6 1.5 .03 Average Write Queue Length -------------------------- 0 Event Name Count Total Time Avg Time --------------------------------- ------------ ------------- ------------- SQL*Net message from client 150 0 0 SQL*Net message to client 150 0 0 control file sequential read 68 0 0 db file scattered read 137 0 0 db file sequential read 413 0 0 file identify 3 0 0 file open 9 0 0 log file sync 1 0 0 Event Name Count Total Time Avg Time --------------------------------- ------------ ------------- ------------- control file parallel write 2 0 0 control file sequential read 2 0 0 db file parallel write 6 0 0 db file scattered read 1 0 0 db file sequential read 5 0 0 db file single write 2 0 0 log file parallel write 3 0 0 log file sync 1 0 0 pmon timer 70 0 0 rdbms ipc message 250 0 0 smon timer 1 0 0 LATCH_NAME GETS MISSES HIT_RATIO SLEEPS SLEEPS/MISS ------------------- ---------- ----------- ----------- ----------- ----------- Active checkpoint 69 0 1 0 0 Checkpoint queue l 1488 0 1 0 0 Token Manager 22 0 1 0 0 cache buffer handl 43 0 1 0 0 cache buffers chai 10389 0 1 0 0 cache buffers lru 1451 0 1 0 0 dml lock allocatio 11 0 1 0 0 enqueue hash chain 24 0 1 0 0 enqueues 62 0 1 0 0 ktm global data 1 0 1 0 0 library cache 3099 0 1 0 0 library cache load 64 0 1 0 0 list of block allo 1 0 1 0 0 messages 536 0 1 0 0 modify parameter v 4 0 1 0 0 multiblock read ob 288 0 1 0 0 ncodef allocation 3 0 1 0 0 process allocation 1 0 1 0 0 redo allocation 115 0 1 0 0 row cache objects 1318 0 1 0 0 session allocation 56 0 1 0 0 session idle bit 288 0 1 0 0 session switching 3 0 1 0 0 shared pool 1063 0 1 0 0 sort extent pool 1 0 1 0 0 system commit numb 193 0 1 0 0 transaction alloca 7 0 1 0 0 undo global data 101 0 1 0 0 LATCH_NAME NOWAIT_GETS NOWAIT_MISSES NOWAIT_HIT_RATIO ------------------- --------------- ---------------- ---------------- Token Manager 3 0 1 cache buffers chai 8643 0 1 cache buffers lru 1465 0 1 library cache 3 0 1 multiblock read ob 1 0 1 process allocation 1 0 1 row cache objects 3 0 1 vecio buf des 3 0 1 CLASS COUNT TIME ------------------ ---------------- ---------------- 0 rows selected. UNDO_SEGMENT TRANS_TBL_GETS TRANS_TBL_WAITS UNDO_BYTES_WRITTEN SEGMENT_SIZE_BYTES XACTS SHRINKS WRAPS ------------------ ------------------ ------------------ ------------------ ------------------ ------------------ - ------------------ ------------------ 0 8 0 2718 1226752 0 0 0 1 row selected. NAME VALUE -------------------------------------- -------------------------------------- background_dump_dest %RDBMS80%\trace compatible 184.108.40.206.0 control_files D:\DATABASE\ctl1dogs.ora db_block_size 2048 db_files 1024 db_name dogs dml_locks 100 log_buffer 2048, 8192 log_checkpoint_interval 10000 max_dump_file_size 10240 processes 20, 50 remote_login_passwordfile SHARED sequence_cache_hash_buckets 10 shared_pool_size 3500000 snapshot_refresh_processes 1 user_dump_dest %RDBMS80%\trace NAME GET_REQS GET_MISS SCAN_REQ SCAN_MIS MOD_REQS COUNT CUR_USAG --------------- -------- -------- -------- -------- -------- -------- -------- dc_free_extents 9 9 0 0 0 27 10 dc_rollback_seg 2 0 0 0 0 7 2 dc_users 28 0 0 0 0 14 1 dc_objects 169 16 0 0 0 144 138 dc_synonyms 2 1 0 0 0 12 1 dc_usernames 128 0 0 0 0 20 2 dc_object_ids 108 5 0 0 0 104 97 TABLE_SPACE READS BLKS_READ READ_TIME WRITES BLKS_WRT WRITE_TIME MEGABYTES ------------------------------------------------------------------------------ ---------- ---------- ---------- ---- ---- ---------- ---------- ---------- DOGS 0 0 0 0 0 0 10 SYSTEMB 555 1378 0 53 53 0 26 TABLE_SPACE FILE_NAME READS BLKS_READ READ_TIME WRITES BLKS_WRT WRITE_TIME MEGABYTES ------------------------------ ------------------------------------------------ " WIDTH="14" HEIGHT="9" ALIGN="BOTTOM" BORDER="0">; ---------- ---------- ---------- ---- ---- ---------- ---------- ---------- DOGS D:\DATABASE\DOGS1.DBF 0 0 0 0 0 0 10 SYSTEMB D:\DATABASE\SYSDOGS.ORA 555 1378 0 53 53 0 26 START_TIME END_TIME ------------------------ ------------------------ 15-jun-97 16:39:39 15-jun-97 16:43:09 BANNER ---------------------------------------------------------------- Oracle8 Server Release 220.127.116.11.2 - Beta PL/SQL Release 18.104.22.168.2 - Beta CORE Version 22.214.171.124.1 - Production TNS for 32-bit Windows: Version 126.96.36.199.0 - Beta NLSRTL Version 188.8.131.52.1 - Beta
ANLYSIS: As you can see, a wealth of data collected within Oracle can be retrieved via the UTLBSTAT and UTLESTAT SQL scripts. This data is retrieved mainly from the Oracle V$ views. These views are the window into the Oracle internal performance tables.
NOTE: When using a script such as UTLESTAT where the output data is useful, employ the spool option under SVRMGR30. This option copies the onscreen data into a file. If you specify a filename, that name will be used. Use the following syntax:
SVRMGR> SPOOL stats;
I like to divide tuning into the following components:
- Software tuning--This involves both the OS and the RDBMS.
- Hardware tuning--This includes proper system design and disk layout.
- Application tuning--This involves properly designing the application and database to take advantage of performance features and planning.
This section focuses on things that can be tuned in the Oracle RDBMS. A complete list of Oracle tuning parameters is given in Appendix B, "Oracle Tuning Parameters."
The System Global Area (SGA) contains the shared pool, the redo log buffer, and the database block buffers.
The Shared Pool
The shared pool contains the library cache, the data-dictionary cache, and the shared session area (with the multithreaded server).
The Library Cache
The library cache contains the shared SQL and PL/SQL areas. You can improve performance by increasing the cache-hit rate in the library cache and by speeding access to the library cache by holding infrequently used SQL statements in cache longer.
The V$LIBRARYCACHE table contains statistics about how well you are using the library cache. The important columns to view in this table are PINS and RELOADS:
- PINS--The number of times the item in the library cache was executed.
- RELOADS--The number of times the library cache missed and the library object was reloaded.
A small number of reloads relative to the number of executions indicates a high cache-hit rate.
The Data-Dictionary Cache
The data-dictionary cache contains a set of tables and views that Oracle uses as a reference to the database. Here Oracle stores information about the logical and physical structure of the database.
To check the efficiency of the data-dictionary cache, check the cache-hit rate. Statistics for the data-dictionary cache are stored in the dynamic performance table V$ROWCACHE (the data-dictionary cache is sometimes known as the row cache). The important columns to view in this table are GETS and GETMISSES:
- GETS--The total number of requests for the particular item.
- GETMISSES--The total number of requests resulting in cache misses.
To determine your cache-hit ratio, use the following SQL statement or look at PINHITRATIO in the output of UTLESTAT:
SELECT SUM(reloads) "Cache Misses", SUM(pints) "Executions", 100 * (SUM(reloads) / SUM(pins) ) "Cache Miss Percent" FROMB V$librarycache;
This will show you the cache-miss percent:
Cache Hit % = 100% - Cache Miss %
A low number of cache misses is expected, especially during startup when the cache has not been populated.
Shared Session Information
In a multithreaded server configuration, the session information is also stored in the shared pool. This information includes the private SQL areas as well as sort areas. Ensure that you do not run out of memory for this shared session information.
To determine whether you need to increase space for these shared sessions, you can extract the sum of memory allocated for all sessions and the maximum amount of memory allocated for sessions from the dynamic performance table V$SESSTAT. If the maximum amount of memory used is high, it might be necessary to increase the size of the shared pool. Because the shared pool is used for other functions as well, such as the library cache and the data-dictionary cache, it is a good idea to increase the size of the shared pool to accommodate this additional memory usage. If you have enough memory in your system, increase the shared pool by the maximum amount of memory used by the shared server processes; if you have a limited amount of memory, use the sum of memory allocated to sessions that was obtained when an average number of users was connected and running as a basis for the amount of memory by which to increase the shared pool.
Database Block Buffer Cache
The most important Oracle cache in the system is probably the buffer cache. The buffer cache makes up the majority of the Oracle SGA and is used for every query and update in the system. The statistics for the buffer cache are kept in the dynamic performance table V$SYSSTAT and are included in the output from UTLESTAT. The important columns to view in this table are listed here:
- PHYSICAL READS--This indicates the total number of requests that result
in a disk access. This is a cache miss.
- DB BLOCK GET--This indicates the number of requests for blocks in current
mode. Buffers are typically retrieved in current mode for INSERT, UPDATE,
and DELETE statements.
- CONSISTENT GET--This indicates the number of requests for blocks in consistent mode. Buffers are typically retrieved in consistent mode for queries.
The sum of the values in DB BLOCK GETS and CONSISTENT GETS represents the total number of requests for data. The cache-hit ratio is determined using this formula:
Cache-hit ratio = 1 - (PHYSICAL READS / (DB BLOCK GETS + CONSISTENT GETS))
The block buffers are the most important area of the SGA and must be tuned because of the large effect they have on the system and the number of resources they consume.
I prefer to separate the performance-enhancement options from the general tuning of Oracle. Performance enhancements tend to be things that might or might not help your configuration and application; in fact, they might hurt. On the other hand, tuning parameters always help, based on the correct interpretation of Oracle statistics. The following sections review a few of the enhancements you have seen throughout the book.
Depending on your configuration and data-access patterns, you might be able to benefit from using a larger block size. With a larger block size, you get the benefit of less wasted space and more efficient I/O in certain conditions. Here are a few guidelines that might help you decide whether changing the size of DB_BLOCK_SIZE can help you:
- OLTP systems benefit from smaller blocks--If your application is OLTP in nature,
you will not benefit from larger blocks. OLTP data typically fits well in the default
block size; larger blocks unnecessarily eject blocks from the SGA.
- DSS systems benefit from larger blocks--In the DSS system in which table scans
are common, retrieving more data at a time results in a performance increase.
- Larger databases benefit from larger blocks--Larger databases see a space benefit
from less wastage per block.
- Databases with large rows benefit from larger blocks--If your rows are extremely large (as is sometimes the case with images or text) and don't fit in the default block, you will see a definite benefit from a larger block size.
Because unnecessarily changing the block size increases I/O overhead, this change does carry some risk. Change the block size with caution.
New Term: A cluster, sometimes called an index cluster, is an optional method of storing tables in an Oracle database. Within a cluster, multiple related tables are stored together to improve access time to the related items. Clusters are useful in cases where related data is often accessed together. The existence of a cluster is transparent to users and applications; the cluster affects only how data is stored.
A cluster can be useful for tables in which data is primarily accessed together in a join. In such situations, the reduced I/O needed to bring the additional data into the SGA and the fact that the data is already cached can be a big advantage. However, for situations in which the tables have a large number of INSERT statements or the data is not frequently accessed together, a cluster is not useful and should not be used. Do not cluster tables if full-table scans are often performed on only one of the tables in the cluster. The additional space required by the cluster and the additional I/O reduces performance.
Reduction of Fragmentation
New Term: Fragmentation occurs when pieces of the database are no longer contiguous. Fragmentation can consist of disk fragmentation or tablespace fragmentation. Both of these types of fragmentation usually affect performance. Disk fragmentation usually causes multiple I/Os to occur when one I/O would have been sufficient (for example, with chained or migrated rows). Disk fragmentation can occur when the extents that comprise the database segments are noncontiguous; this is sometimes caused by excessive dynamic growth.
Tablespace fragmentation is caused by the dropping and creating of segments. This can produce large free areas between segments, which results in the inefficient use of space and excessive disk seeks over the empty areas. Tablespace fragmentation can also prevent Oracle from taking advantage of multiblock reads.
One way to eliminate fragmentation is to export the table or tablespace data, remove and re-create the table or tablespace, and import the data. By eliminating fragmentation, you can reduce excessive I/Os and CPU usage, streamlining data access. Any overhead and unnecessary I/Os you can reduce will improve system performance.
New Term: A hash cluster is similar to a cluster except that it uses a hash function rather than an index to reference the cluster key. A hash cluster stores the data based on the result of a hash function. The hash function is a numeric function that determines the data block in the cluster based on the value of the cluster key. To achieve good performance from a hash cluster, you must meet the following criteria:
- The cluster key value must be unique.
- The majority of queries must be equality queries on the cluster key.
- The size of the table must be static (little growth occurs).
- The value of the cluster key must not change.
If you can take advantage of hashing by meeting this strict criteria, you will see good performance. Hashing is extremely efficient under the right conditions; however, having a hash cluster under the wrong conditions can degrade performance.
New Term: An index, like the index in this book, is an optional structure designed to help you achieve faster access to your data. When optimally configured and used, indexes can significantly reduce I/O to the datafiles and greatly improve performance. You must first decide whether an index is appropriate for the data and access patterns in your particular system. Having decided to use an index, you must decide which columns to index. Indexing appropriately can greatly improve performance by reducing I/Os and speeding access times.
Careful planning and periodic testing with SQL Trace can lead to the effective use of indexes, with optimal performance being the outcome. Even though indexes have been built on tables in your database, they might not necessarily be used. If the SQL statement is not correctly formed, the index might be bypassed and a full-table scan might be the result. This is why the application developer and the system designer must communicate constantly throughout the development and deployment cycle. The application should be tested and the SQL statements analyzed to ensure that indexes are being used where intended. Indexes are covered in detail on Day 13, "Using Indexes and Sequences."
When performing table scans, Oracle can read more than one block at a time, thus speeding I/Os. Consequently, a larger chunk of data can be read from the disk, thus eliminating some disk seeks. The reduction of disk seeks and the reading of larger blocks reduces both I/O and CPU overhead.
The amount of data read in a multiblock read is specified by the Oracle initialization parameter DB_FILE_MULTIBLOCK_READ_COUNT. The value for this parameter should always be set high because there is rarely any disadvantage in doing so. The size of the individual I/O requests depends on DB_FILE_MULTIBLOCK_READ_COUNT and DB_BLOCK_SIZE. A good value for multiblock reads is 64KB.
Multiblock writes are similar to multiblock reads and have many of the same requirements. Multiblock writes are available through the direct path loader as well as through sorts and index creations. As with multiblock reads, the multiblock writes reduce I/O and CPU overhead by writing multiple database blocks in one larger I/O operation.
The amount of data written in a multiblock write is specified by the Oracle initialization parameter DB_FILE_MULTIBLOCK_WRITE_COUNT. The size of the individual I/O requests depends on both DB_FILE_MULTIBLOCK_WRITE_COUNT and DB_BLOCK_SIZE. As with multiblock reads, a good value is 64KB.
The Oracle Parallel Query Option
The Oracle Parallel Query option makes it possible for some Oracle functions to be processed by multiple server processes. The functions affected are queries, index creation, data loading, and recovery. For each of these functions, the general principle is the same: Keep processing while Oracle waits for I/O.
For most queries, the time spent waiting for the data to be retrieved from disk usually overshadows the amount of time actually spent processing the results. With the Parallel Query option, you can compensate for this wasted time by using several server processes to execute the query. While one process is waiting for I/Os to complete, other processes can execute. If you are running on a Symmetric Multiprocessor (SMP) computer, a cluster, or a Massively Parallel Processing (MPP) machine, you can take maximum advantage of the Parallel Query option. The amount of parallelism can be tuned with several of the Oracle initialization parameters:
- PARALLEL_DEFAULT_MAX_SCANS--This parameter specifies the maximum number
of query servers to used by default for a query. This valued is used only if no value
is specified in a PARALLEL hint or in the PARALLEL definition clause.
This parameter limits the number of query servers used by default when the value
of the PARALLEL_DEFAULT_SCANSIZE is used by the query coordinator.
- PARALLEL_DEFAULT_SCANSIZ--This parameter specifies the number of query
servers to be used for a particular table. The size of the table divided by PARALLEL_DEFAULT_SCANSIZE
determines the number of query servers, up to PARALLEL_DEFAULT_MAX_SCANS.
- PARALLEL_MAX_SERVER--This parameter specifies the maximum number of
query servers or parallel recovery processes available for this instance.
- RECOVERY_PARALLELIS--This parameter specifies the number of processes to be used for instance or media recovery. A large value can greatly reduce instance recovery time. A value of 0 or 1 indicates that parallel recovery will not be performed and that recovery will be serial. A good value for this parameter is in the range of the number of disks you have (up to 50).
I am a big fan of the Parallel Query option. I have seen great improvements from the use of parallel queries as well as dramatic reductions in recovery time when the parallel recovery feature is used. You saw this in yesterday's lesson, "Advanced Oracle Options."
The Oracle Parallel Server Option
The Oracle Parallel Server option is one of the most innovative and impressive options available from Oracle. With the Parallel Server option, you can cluster several computers using a shared-disk subsystem and have multiple Oracle instances access the same database. If your application is suitable, you can see good scalability from adding computers.
The Oracle Parallel Server option uses a sophisticated locking mechanism in conjunction with a shared-disk subsystem to allow multiple instances to access the same data. If you have an application that can take advantage of the Oracle parallel server architecture, you should see some good performance improvements.
The two areas that can most influence the performance of your parallel server system are data partitioning and PCM lock management. Both of these can make a huge difference in the performance of your system.
- Partitioning--By properly partitioning your data to reduce lock traffic and contention for blocks between servers, you can enhance performance. Try to balance your users so that the users accessing the same tables are on the same machine; doing so can reduce contention for locks.
- PCM locks--By carefully managing the number of locks on each table, you can enhance performance. Tables with a lot of traffic between nodes should have more locks than tables with less contention. By balancing the number of locks, you can reduce overhead.
TIP: By taking advantage of read-only tablespaces where applicable, you can reduce the number of PCM locks in your system. Because read-only tablespaces do not allow updates, no locking is necessary.
Multiprocessor environments might benefit from tuning of the parameter SPIN_COUNT. Under normal circumstances, if a latch is not available, the process sleeps and then wakes up to try the latch again. If you are on a multiprocessor system, it is likely that the process holding the latch is currently processing on another CPU and will be finished in a short time. If you set SPIN_COUNT to a value greater than 0, the process spins while counting down from SPIN_COUNT to 0. If the latch is still not available, the process goes to sleep.
Setting SPIN_COUNT can hurt performance if you're not careful. This parameter should be set only for multiprocessor computers and should be monitored for effectiveness. A good value to try is 2,000. The value of SPIN_COUNT specifies how many times the process will spin before putting itself to sleep. Because the speed of processors varies, the time it takes to spin also varies; however, the speed of the process holding the desired resource will also vary with the speed of the processor.
OS tuning is specific and depends on the OS you are running. I will present some of the basics on tuning Windows NT and UNIX here.
NT Tuning Considerations
With Windows NT, probably the most important tuning consideration is ensuring that sufficient physical memory is available to run Oracle and the required number of server processes. Windows NT uses virtual memory, which means that Oracle and user processes can allocate an almost unlimited amount of memory through paging. If you are not careful, you might overconfigure the amount of virtual memory that you are using and exceed the amount of physical memory in the system. If this occurs, the system will begin paging and performance will be severely degraded.
In addition to the concern about memory, other areas of importance in tuning the Windows NT system include the network and I/O subsystems and the reduction in OS overhead.
Windows NT uses only virtual memory. Because of this, all memory in the system is treated equally, limited only by the size of the paging file; you must be wary of user processes consuming large amounts of memory and causing paging out the SGA. In Windows NT, there are no limitations on memory used for user processes, disk caching, print caching, and so on; thus, it is best to dedicate the Windows NT server to either file and print services or to application services.
NOTE: The amount of virtual memory with Windows NT is limited by the size of the paging file. If the paging file is too small, you will get warning messages and be forced to resize it and reboot.
The size of one process's virtual memory, also known as the working set size, is limited to 2GB under Windows NT 4.0 and earlier, and 3GB with the Windows NT Enterprise Edition.
If you use asynchronous I/O (AIO) and Oracle can lock down memory for AIO, the database block buffers are not swappable. Monitor the system on a regular basis to ensure that no paging is occurring in the system.
Reduce Unnecessary Memory Usage
You can take some measures to reduce the amount of memory used by Windows NT. Invoke the Control Panel's Network Settings screen and choose the Maximize Throughput for Network Applications option. This optimizes the server memory for network applications, reducing some of the file-system caching and overhead in memory management. Also remove any network protocols not needed by the system to cut down on system overhead and memory usage via the Control Panel.
Also use the Control Panel to turn off any services you are not using. Doing so reduces memory usage and CPU overhead. By reducing all the unnecessary services, you can increase the performance of the system.
To maximize performance, allocate as much memory as possible to the SGA. Use the techniques discussed on Days 9, "Managing Data," and 10, "Administering User Accounts," to determine whether the memory can best be used for the shared pool or for database block buffers. Because the SGA resides in virtual memory, it is important that you not allocate so much memory for the SGA that the system pages. The overhead of paging and swapping overshadows any benefit you might gain from a larger SGA. Remember to save memory for the user processes. You should frequently monitor your system to ensure that no paging occurs at any time.
You can easily determine the amount of memory necessary for your application on a per-user basis. Start Oracle and note the amount of available memory by using the Windows NT Performance Monitor. Monitor the Available Bytes option under Memory in the Performance Monitor. After users begin accessing the application in a typical manner, record the amount of memory again. Take the difference and divide this result by the number of users accessing the application. Multiply this per-user memory usage value by the maximum number of users who might be connected to the application to determine the amount of memory you must reserve for user connections. Be sure to add a little extra memory just in case. Be careful that a user's PGA does not consume too much system memory.
To increase the number of Oracle connections, you might also have to adjust the Oracle initialization parameter PROCESSES. This parameter should reflect the maximum number of user connections you expect to have plus the Oracle background processes. You should also include some extra processes for administrative tasks.
Minimal network tuning is necessary with Windows NT. However, you can enhance performance by prioritizing the network bindings. Do so the with the Control Panel's Network Configuration utility. If you put the network protocol you use most frequently first on the list, that protocol gets highest priority. Place other network protocols in the list in the order in which you use them. Removing any protocols you do not use can also boost performance by reducing memory consumption and CPU overhead. Also be sure that the system is configured as a server in the Network Configuration screen.
It is important that performance not be bound by physical I/O rates. If you ensure that the system disks are not driven past their limits, you can avoid this. This is covered in detail on Days 14, "Using Oracle Clusters, Stored Procedures, and Database Links," and 15, "Managing Job Queues and Using Oracle Auditing."
With Windows NT, asynchronous I/O (AIO) is always enabled. There is no need to adjust any tuning parameters to ensure that AIO is enabled.
The default block size for Oracle on Windows NT is 2,048. This might be sufficient. If you have a large database or if the data access is primarily sequential, you might want to increase DB_BLOCK_SIZE to 4,096. Although the data blocks take up more space in the SGA, the number of I/Os performed is significantly reduced. It doesn't take much more overhead and time to retrieve 4KB of data from the disks than it does to get 2KB of data.
If data access is primarily sequential, you might benefit by setting the DB_BLOCK_SIZE parameter to 8,192. Because sequential access to the database will read the next block anyway, larger block sizes will read that data into the SGA before you need it. If your data access is random, you will waste space in the SGA. The value you choose for the block size affects performance either for better or worse. If you are unsure, leave the parameter at the default of 2,048.
When creating Windows NT file systems, you have several choices:
Although each of these performs well in certain situations, I recommend using NTFS. NTFS provides you with the best level of overall performance and is the file system that Microsoft endorses.
UNIX Tuning Considerations
As is true for NT and UNIX, the most important tuning consideration is ensuring that sufficient memory is available to run Oracle and the required number of server processes. Because UNIX is a virtual memory operating system, you can always start more server processes; but if you are not careful, you might begin paging.
Many varieties of UNIX have made available extended features such as the Post-Wait Semaphore and asynchronous I/O to enhance Oracle performance. Other areas of importance in tuning the UNIX system include the network and I/O subsystem.
Memory is important in the UNIX operating system. As you have seen in earlier lessons, the way to increase performance is to maximize the use of the SGA to avoid costly disk I/Os. In UNIX as in all operating systems, it is important to avoid paging. Because the SGA is in shared memory in UNIX, the SGA is guaranteed not to page--but the server processes can page if too much memory is being used.
It is important to allocate enough shared memory to accommodate the SGA. If you do not have enough memory to create the SGA, the instance does not start. If this happens, you must either configure the operating system to allow more shared memory or reduce the size of the SGA. If you are using enhanced 4MB shared memory pages, you can allocate the amount of shared memory only in 4MB units.
Be careful not to allocate so much shared memory for the SGA that user and server processes page. Be sure to periodically monitor the system to ensure that no paging is occurring. To maximize memory used by the SGA, reduce all unnecessary memory used by the operating system and limit the amount of memory that can be allocated by users.
Reduce Unnecessary Memory Usage
One of the best ways to free memory for Oracle is to siphon from memory used by the file-system buffers. These buffers are used by the OS to cache data. A significant performance increase can incur when files that have been cached are accessed. But Oracle uses a feature called synchronous I/O to ensure that writes to the disk are not returned until the data has actually been written. Because Oracle must guarantee that I/Os have been written to the disk to ensure data integrity, OS disk write caching cannot be performed.
By default, the number of file-system buffers is determined by the amount of memory in the system. Because Oracle bypasses the disk cache on writes and uses the SGA for reads, you don't really need a large number of file-system buffers. By reducing this number, you might see a slight performance decrease with OS operations, but any additional memory allocated to the SGA increases Oracle performance. The parameters used to tune the file-system buffers is fairly vendor dependent; this should be documented in your OS manuals.
If you reduce the unnecessary memory used for disk caching, more memory can be allocated to the SGA. Don't reduce the disk cache so far that it is difficult to run OS commands and access Oracle parameter files. Do not set the disk cache buffers to less than 600. These values represent the number of 512 byte blocks.
To maximize performance, allocate as much memory as possible to the SGA. Use the techniques discussed earlier today to determine whether memory is best used for the shared pool or for databases block buffers.
In UNIX, the shared memory area used by Oracle for the SGA is usually contiguous. However, if you have multiple instances of Oracle that have started and stopped several times, the shared memory area might no longer be contiguous. If you use 4MB pages, you are guaranteed to have at least 4MB of contiguous memory.
The amount of memory allocated for shared memory is the product of two OS-tunable parameters: SHMMAX and SHMSEG. SHMMAX specifies the maximum size of a shared memory segment; SHMSEG specifies the maximum number of shared memory segments available in the system. Applications are responsible for allocating only the amount of shared memory they need and so do not waste space. In SCO UNIX and UnixWare, the parameters SHMMAX and SHMSEG are located in the UNIX parameter file /ETC/CONF/CF.D/STUNE. In Solaris, the shared memory parameters are set in the file /ETC/SYSTEM. Your OS administrator's manual should have more information about setting system parameters.
NOTE: It is much more efficient for Oracle to have one large shared memory segment than several smaller ones. Therefore, SHMMAX should have a value larger than the size of the SGA.
Most varieties of UNIX that run on the Intel platform have a variety of 4MB pages. By referencing your OS vendor-specific documentation, you can determine how to tune this.
The use of 4MB pages can improve performance dramatically, especially if you are running on an Intel Pentium processor.
NOTE: In UNIX implementations that use 4MB pages, there is usually a threshold at which 4MB pages are used. This threshold is approximately 3MB. If you allocate a 2.5MB shared memory segment, you do not get a 4MB page.
The amount of memory allocated to a single user must also be tuned in the UNIX operating system. Because Oracle is treated the same as any other user, you must allocate enough memory for Oracle to use for the SGA and the server processes. By referencing your OS-specific documentation, you can see how to configure this.
Ensure that sufficient memory is available so that Oracle can allocate for the SGA. Remember to save memory for the user processes as well. You should frequently monitor your system to ensure that no paging is occurring at any time.
You can easily determine the amount of memory necessary for your application on a per-user basis. Start Oracle and note the amount of available memory available with the UNIX utility sar -r. The output from sar -r consists of freemem (free memory pages) and freeswp (free swap pages). The value given in the freemem column is the number of 4KB pages available. When users begin accessing the application in a typical manner, record the amount of memory again. Take the difference and divide this result by the number of users accessing the application. This value is the per-user memory usage. Multiply this value by the maximum number of users who might connect to the application to determine the amount of memory you must reserve for user connections. Be sure to leave a little extra memory just in case.
The size of a user's PGA is not bound by any initialization parameters. Because of this, be careful that a user's PGA does not consume too much system memory.
The UNIX operating system parameters MAXUP and NPROC must also be set to allow a sufficient number of users to connect. Remember that when users connect to Oracle, an Oracle shadow process is created under the Oracle user ID. Therefore, you must increase not only the number of processes systemwide, but also the per-user process limits.
The per-user process limit is set with the OS parameter MAXUP. The maximum number of processes systemwide is set by the OS parameter NPROC. Both values are in the stune file. NPROC should be at least 50 greater than MAXUP to account for OS processes.
To increase the number of Oracle connections, you might also have to adjust the Oracle initialization parameter PROCESSES. The PROCESSES parameter should reflect the maximum number of user connections you expect to have plus the Oracle background processes. You should also include some extra processes for administrative tasks.
With UNIX, the amount of network tuning is usually minimal. Typically, no network tuning is required, and the network tuning that is required is vendor specific. Refer to your OS documentation for any networking tuning that is required.
As with all other operating systems described today, it is important to ensure that performance is not bound by physical I/O rates. Be sure that random I/Os do not exceed the physical limitations of the disk drives. Refer to Days 14 and 15 for details.
With UNIX, you have the choice of using the UNIX file system for your data storage or the RAW device interface. This choice is not always an easy one. The RAW device interface is more difficult to manage but provides a higher level of performance. File-system files are much easier to use but have more overhead associated with them.
Using the UNIX file system is easier than using RAW devices. Using file-system files, Oracle simply creates the file. However, when using the file system, Oracle must contend with the UNIX disk-caching system and use synchronous writes to ensure that the write request does not return to the DBWR or LGWR before it has actually written the data to disk. With the UNIX file system, there is also the overhead of the data being read into the UNIX disk cache and then being copied to the SGA. This arrangement causes additional overhead on reads. Finally, when you use the file system, you are not guaranteed to have contiguous blocks on the disk--in fact, you are almost guaranteed not to have contiguous blocks.
RAW Device Interface
The RAW device interface allows for the least amount of overhead you can achieve with UNIX I/Os. When using the RAW device interface, UNIX simply assigns a section of the disk to each RAW device. This portion of the disk is contiguous; accesses to it bypass all disk caching and file-system overhead.
RAW devices are not as easy to manage because each device is considered one big chunk of data for the operating system. Backup and recovery must be handled slightly differently because file copies do not work, and the size of the raw device cannot be changed after it is created. Backup operations must be done via the UNIX DD command or by a third-party backup utility that supports raw devices.
RAW devices give greater performance with less overhead and are fully supported by Oracle. Whether you use RAW devices is a decision you must make based on ease of use and increased performance.
With UNIX, AIO is not always enabled. It is necessary to enable AIO in both the OS and in Oracle. By using AIO, the DBWR can manage many I/Os at once, eliminating the need for multiple DBWR processes. List I/O allows the DBWR to pass to the OS a list of AIO commands, reducing the number of calls it must make.
NOTE: If asynchronous I/O is not available to you, you can compensate by adding DBWR processes. You should have one or two DBWR processes per data disk. Use the parameter DB_WRITERS to increase the number of DBWR processes.
For some UNIX implementations, the following Oracle initialization parameters for asynchronous I/O should also be set:
- USE_ASYNC_IO--This parameter tells Oracle that the DBWR should use asynchronous I/O. Set it to TRUE.
- LGWR_USE_ASYNC_IO--This parameter tells Oracle that the LGWR should use asynchronous I/O. Set it to TRUE.
You should always use asynchronous I/O (if possible). When you use asynchronous I/O, you can keep the number of DBWR processes to one and therefore reduce process overhead.
The I/O system should be designed and implemented with the following goals in mind:
- Isolate sequential I/O--By isolating sequential I/O so that it is purely sequential to the disk, you can greatly enhance throughput. Any random I/O to these disks degrades performance. Writes to the redo log are sequential.
- Spread out random I/O--Random I/O performance can be increased by adding more disk drives to the system and spreading out the I/Os among the disks. I/Os to the datafiles are typically random (especially in OLTP systems).
If you follow these guidelines and plan your system so that your disk drives can support the amount of disk I/O demanded of them, I/O should not be a problem.
One of the most common areas that performance can be enhanced is in the application itself. SQL statements can often be redesigned to dramatically improve performance. I recently worked on a project where one of the engineers redesigned the SQL statements of a set of queries that took more than four hours to run to more effectively use indexes; after the engineer made the changes, the queries took less than 10 minutes to run.
Here are a few things that can be done to improve the application itself. My other book, Oracle Performance Tuning and Optimization (Sams Publishing) covers these topics in great detail.
- Use EXPLAIN PLAN--By using EXPLAIN PLAN, you can analyze the
execution plan the optimizer has chosen for the SQL statement.
- Use SQL*Trace--By using SQL*Trace, you can analyze the execution of the SQL statement
and determine any potential bottlenecks.
- Tune SQL statements--You can improve performance by configuring SQL statements
to take advantage of such things as indexes, clusters, and hash clusters.
- Use the Oracle optimizer--Take advantage of the optimizer to improve performance.
- Use procedures and packages--Use procedures, functions, and packages to improve
- Provide for data integrity--Consider the importance of data integrity and how
to optimally provide for it.
- Use hints--Use hints to take advantage of information you know about the data and application.
If you tune the SQL statements and application to take advantage of indexes, clusters, and other features, you should see significant improvement in performance.
Today you looked at various ways of determining whether some sort of performance bottleneck exists, and if one does, how to isolate that bottleneck and provide solutions. The lesson began by defining the term bottleneck. You were also presented with my performance tuning methodology. If you follow these steps (or steps of your own), you should be able to find the bottleneck or at least find a way to work around it.
Today's lesson also provided a brief overview into the operation and function of your OS. Next, you saw a way of looking at the system in terms of performance. How you monitor the system was also shown today. Much can be learned from the NT Performance Monitor and sar, and much can be learned from Oracle itself using UTLBSTAT and UTLESTAT. With these tools, you can narrow the focus of your performance investigations and determine potential changes to your system.
Tomorrow's lesson is the last of the 21 days of instruction on the ins and outs of Oracle8. This book finishes up by teaching you the specifics of the Oracle8 networking system. You will learn how to configure SQL*Net and use the Oracle Name Server. Also, you will learn how to configure and use the Oracle Web server. Other miscellaneous topics will also be covered in the final lesson of the book.
- Q What is a bottleneck?
A The term bottleneck refers to a component in the system that is slowing down the whole system.
Q What type of operating systems are NT and UNIX?
A Windows NT and UNIX are virtual memory operating systems. This means that the memory used by a single process can be larger than the physical memory in the system. When more memory is needed than is available, some of the memory is paged out.
Q What sort of things are important when Troubleshooting a performance bottleneck?
A There are many important things to keep in mind, but I feel that of primary importance is that you follow a methodology. By using some sort of method of Troubleshooting, you can develop tests that can help you resolve the issues involved.
Q Should you always believe your results?
A No. Be skeptical. If something doesn't seem right, try another test.
The workshop provides quiz questions to help you solidify your understanding of the material covered and exercises to provide you with experience in using what you've learned. The answers to the quiz questions can be found in Appendix A, "Answers."
- 1. Name some areas that could be potential bottlenecks.
2. What is the best tool to use to look at the Oracle shared pool, cache-hit ratio?
3. How many random I/Os per second per disk drive are recommended before latencies start to increase?
4. How many sequential I/Os per second per disk drive are recommended before latencies start to increase?
5. What is disk latency?
6. How do you calculate the buffer cache-hit ratio?
7. How do you know whether the shared pool is large enough?
8. What two factors make an index effective?
9. Which is run first: UTLBSTAT.SQL or UTLESTAT.SQL?
10. Why might you use a large block size?
- 1. Turn on diskperf and reboot your system. Use the Performance Monitor
to look at physical disk statistics.
2. Use UTLBSTAT.SQL and UTLESTAT.SQL and analyze the results.
3. Check out your system's block buffer cache-hit ratio.
4. Determine the cache-hit ratio on your data-dictionary cache and your library cache.
© Copyright, Macmillan Computer Publishing. All rights reserved.