‏إظهار الرسائل ذات التسميات Watch. إظهار كافة الرسائل
‏إظهار الرسائل ذات التسميات Watch. إظهار كافة الرسائل

الجمعة، 15 مايو 2015

Watch Oracle DB Session Activity With The Real-Time Session Sampler

This page has been permanently moved. PleaseCLICK HERE to be redirected.

Thanks, Craig.Watch Oracle DB Session Activity With My Real-Time Session Sampler
Watching session activity is a great way to diagnose and learn about Oracle Database tuning. There are many approaches to this. I wanted something simple, useful, modifiable, no Oracle licensing
issues and that I could give away. The result is what I call the Oracle Real-Time Session Sampler (OSM: rss.sql).

The tool is simple to use.  Based on a number filtering command line inputs, it repeatedly samples active Oracle sessions and writes the output to a file in /tmp. You can do a "tail -f" on the file to watch session activity in real time!

The rss.sql tool is included in the OraPub System Monitor (OSM) toolkit (v13j), which can be downloaded HERE.

If you simply want to watch a video demo, watch below or click HERE.


The Back-Story
Over the past two months I have been creating my next OraPub Online Institute seminar about how to tune Oracle with an AWR/Statspack report using a quantitative time based approach. Yeah... I know the title is long. Technically I could have used Oracle's Active Session History view (v$active_session_history) but I didn't want anyone to worry about ASH licensing issues. And ASH is not available with Oracle Standard Edition.

The Real-Time Session Sampler is used in a few places in the online seminar where I teach about Oracle session CPU consumption and wait time. I needed something visual that would obviously convey the point I wanted to make. The Real-Time Session Sampler worked perfectly for this.

What It Does
Based on a number of command line inputs, rss.sql repeatedly samples active Oracle sessions and writes the output to file in /tmp. The script contains no dml statements. You can do a "tail -f" on the output file to see session activity in real time. You can look at all sessions, a single session, sessions that are consuming CPU or waiting or both, etc. You can even change the sample rate. For example, once every 5.0 seconds or once every 0.25 seconds! It's very flexible and it's fascinating to watch.

Here is an example of some real output.



How To Use RSS.SQL
The tool is run within SQL*Plus and the output is written to the file /tmp/rss_sql.txt. You need two windows: one to sample the sessions and other other to look at the output file. Here are the script parameter options:

rss.sql  low_sid  high_sid  low_serial  high_serial  session_state  wait_event_partial|%  sample_delay

low_sid is the low Oracle session id.
high_sid is the high Oracle session id.
low_serial is the low Oracle session's serial number.
high_serial is the high Oracle session's serial number.
session_state is the current state of the session at the moment of sampling: "cpu", "wait" or for both "%".
wait_event_partial is when the session is waiting, select the session only with this wait event. Always set this to "%" unless you want to tighten the filtering.
sample_delay is the delay between samples, in seconds.

Examples You May Want To Try
By looking at the below examples, you'll quickly grasp that this tool can be used in a variety of situations.

Situation: I want to sample a single session (sid:10 serial:50) once every five seconds.

SQL>@rss.sql  10 10 50 50 % % 5.0

Situation: I want to essentially stream a single session's (sid:10 serial:50) activity.

SQL>@rss.sql 10 10 50 50 % % 0.125

Situation: I want to see what sessions are waiting for an row level lock while sampling once every second.

SQL>@rss.sql 0 99999 0 99999 wait enq%tx%row% 1.0

Situation: I want to see which sessions are consuming CPU, while sampling once every half second.

SQL>@rss.sql 0 99999 0 99999 cpu % 0.50

Be Responsible... It's Not OraPub's Fault!
Have fun and explore...but watch out! Any time you are sample repeatedly, you run the risk of impacting the system under observation. You can reduce this risk by sampling less often (perhaps once every 5 seconds), by limiting the sessions you want to sample (not 0 to 99999) and by only select sessions in either a "cpu" or "wait" state.

A smart lower impact strategy would be to initially keep a broader selection criteria but sample less often; perhaps once every 15 seconds. Once you know what you want to look for, tighten the selection criteria and sample more frequently. If you have identified a specific session of interest, then you stream the activity (if appropriate) every half second or perhaps every quarter second.

All the best in your Oracle Database tuning work,

Craig.

Watch Oracle DB Elapsed Time and Wall Time With Parallel Query

This page has been permanently moved. PleaseCLICK HERE to be redirected.

Thanks, Craig.Watch Oracle Elapsed Time and Wall Time With Parallel Query
In my recent postings I wrote that when using the Oracle Database parallel query a SQL statement's wall time should be equal to its elapsed time divided by the number of parallel query slaves plus some overhead.

That may seem correct, but is it really true? To check I ran an experiment and posted the results here. The results are both obvious and illuminating.

If you don't want to read but just sit on the couch, have a beer and watch TV you're in luck! I took a clip from my Tuning Oracle Using An AWR Report online video seminar put it on youtube.  You can watch the video clip on YouTube HERE or simply click on the movie below.


The Math, For Review Purposes
In my previous recent postings I detailed the key time parameters; DB Time, DB CPU, non-idle wait time, elapsed time, parallelism and effective parallelism. To save you some clicking, the key parameters and their relationships are shown below.

DB Time = DB CPU + NIWT

Elapsed Time = Sum of DB Time

Wall Time = ( Elapsed Time / Parallelism ) + Parallelism Overhead

Wall Time = Elapsed Time / Effective Parallelism


Test Results: When Oracle Parallel Query was NOT involved.
If you want to see my notes, snippets, etc. they can be found in this text file HERE.

Here is the non-parallel SQL statement.
select /*+ FULL(big2) NOPARALLEL (big2) */ count(*)into i_varfrom big2 where rownum < 9000000
When the SQL statement was running, I was monitoring the session using my Realtime Session Sampler OSM tool, rss.sql. Since I knew the server process session ID and wanted to sample every second and wanted to see everything just for this session, this is the rss.sql syntax:
SQL>@rss.sql 16 16 827 827 % % 1For details on any OSM tool syntax, run the OSM menu script, osmi.sql. You can download my OSM Toolkit HERE.

The rss.sql tool output is written to a text file, which I was doing a "tail -f" on. Here is a very small snippet of the output. The columns are sample number, sample time, session SID, session serial#, Oracle username, CPU or WAIT, SQL_ID, OraPub wait category, wait event, [p1,p2,p3].


We can see the session is consuming CPU and waiting. When waiting, the wait event is "direct path read", which is asynchronous (we hope) block read requests to the IO subsystem that will NOT be buffered in the Oracle buffer cache.

Now for the timing results, which are shown in the below table. I took five samples.  It's VERY important to know that the wait time (WAIT_TIME_S), DB CPU (DB_CPU_S), and DB Time (DB_TIME_S) values are related to ONLY server process SID 16. In blazing contrast, the wall time (WALL_S), elapsed time (EL_VSQL_S), and SQL statement CPU consumption (CPU_VSQL_S) is related the entire SQL_ID statement execution.

Here are the "no parallel" experimental results.
SQL> select * from op_results; SAMPLE_NO WALL_S EL_VSQL_S CPU_VSQL_S WAIT_TIME_S DB_CPU_S DB_TIME_S---------- ---------- ---------- ---------- ----------- ---------- ---------- 1 35.480252 35.470015 9.764407 24.97 9.428506 34.152294 2 35.670021 35.659748 9.778554 25.15 9.774984 35.541861 3 35.749926 35.739473 9.774375 25.12 9.31266 34.126285 4 35.868076 35.857752 9.772321 25.32 9.345398 34.273479 5 36.193062 36.18378 9.712962 25.46 9.548465 35.499693Let's check the math. For simplicity and clarity, please allow me to round and use only sample 5.
DB_TIME_S = DB_CPU_S + WAIT_TIME_S 35.5 = 9.5 + 25.5 = 35.0The DB Time is pretty close (35.5 vs 35.0). Close enough to demonstrate the time statistic relationships.
Elapsed Time (EL_VSQL_S) = DB_TIME_S 35.5 = 34.2The Elapsed Time is off by around 4% (35.5 vs 34.2), but still closely to demonstrate the time statistic relationships.
Wall Time (WALL_S) = Elapsed Time (EL_VSQL_S) / Effective Parallelism 35.5 = 35.5 / 1Nice! The Wall Time results matched perfectly. (35.5 vs 35.5)

To summarize in a non parallel query (i.e., single server process) situation, the time math results are what we expected! (and hoped for)

Test Results: When Oracle Parallel Query WAS involved.
The only difference in the "non parallel" SQL statement above and the SQL statement below is the parallel hint. Below is the "parallel" SQL statement.
select /*+ FULL(big2) PARALLEL(big2,3) */ count(*) into i_var from big2 where rownumWhen the "parallel" SQL statement was running, because Oracle parallel query was involved resulting in multiple related Oracle sessions, when monitoring using my rss.sql tool, I need to open the session ID (and serial#) to include all sessions. I still sampled every second. Here is the rss.sql syntax:
SQL>@rss.sql 0 9999 0 9999 % % 1The tool output is written to a text file, which I was doing a "tail -f" on. Here is a very small snippet of the output. I manually inserted the blank lines to make it easier to see the different sample periods.


There is only one SQL statement being run on this idle test system. And because there is no DML involved, we don't see much background process activity. If you look closely above, sessions 168 (see third column) must be a log write process because the wait event is "log file parallel write". I checked and session 6 is a background process as well.

It's no surprise to typically see only four session involved. One session is the parallel query coordinator and the three parallel query slaves! Interestingly, the main server process session that I executed the query from is session number 16. It never showed up in any of my samples! I suspect it was "waiting" on an idle wait event and I'm only showing processes consuming CPU or waiting on a non-idle wait event. Very cool.

Now for the timing results. I took five samples.  Again, it's VERY important to know that the wait time (WAIT_TIME_S), DB CPU (DB_CPU_S), and DB Time (DB_TIME_S) values are related to ONLY calling server process, which in this case is session 16. In blazing contrast, the wall time (WALL_S), elapsed time (EL_VSQL_S), and SQL statement CPU consumption (CPU_VSQL_S) is related the entire SQL statement execution.

Here are the "parallel" experimental results.
SQL> select * from op_results; SAMPLE_NO WALL_S EL_VSQL_S CPU_VSQL_S WAIT_TIME_S DB_CPU_S DB_TIME_S---------- ---------- ---------- ---------- ----------- ---------- ---------- 1 46.305951 132.174453 19.53818 .01 4.069579 4.664083 2 46.982111 132.797536 19.371063 .02 3.809439 4.959602 3 47.79761 134.338069 19.739735 .02 4.170921 4.555491 4 45.97324 131.809249 19.397557 .01 3.790226 4.159572 5 46.053922 131.765983 19.754143 .01 4.062703 4.461175Let's check the math. So simplicity and clarity, please allow me to round and use sample 5.
DB_TIME_S = DB_CPU_S + WAIT_TIME_S 4.5 = 4.1 + 0 The DB Time shown above is kind of close... 10% off. (4.5 vs 4.1) But there is for sure timing error in my collection sript. I take the position, this is close enough to demonstrate the time statistic relationships. Now look below.
Elapsed Time (EL_VSQL_S) = DB_TIME_S 131.7 != 4.5Woah! What happened here? (131.7 vs 4.5) Actually, everything is OK (so far aways) because the DB Time is related to the session (Session ID 16), whereas the elapsed time is ALL the DB Time for ALL the processes involved in the SQL statement. Since parallel query is involved, resulting in four additional sessions (1 coordinator, 3 slaves) we would expect the elapsed time to be greater than the DB Time. Now let's look at the wall time.
Wall Time (WALL_S) = ( Elapsed Time (EL_VSQL_S) / Parallelism ) + overhead 46.1 = ( 131.8 / 3 ) + 2.2Nice! Clearly the effective parallelism is greater than 3 because there is some overhead (2.2). But the numbers makes sense because:

1. The wall time is less than the elapsed time because parallel query is involved.

2. The wall time is close to the elapsed time divided by the parallelism. And we can even see the parallelism overhead.

So it looks like our time math is correct!


Reality And The AWR Or Statspack Report
This is really important. In the SQL Statement section of any AWR or Statspack Report, you will see the total elapsed time over the snapshot interval and perhaps the average SQL ID elapsed time per execution. So what is the wall time? What are users experiencing? The short answer is, we do NOT have enough information.

To know the wall time, we need to know the parallelism situation. If you are NOT using parallel query, than based on the time math demonstrated above, the elapsed time per execution will be close to what the user experiencing (unless there is an issue outside of Oracle). However, if parallelism is involved, you can expect the wall time (i.e, user's experience) to be much less than the elapsed time per execution shown in the AWR or Statspack report.

Another way of looking at this is: If a user is reporting a query is taking 10 seconds, but the average elapsed time is showing as as 60 seconds, parallel query is probably involved. Also, as I mentioned above, never forget the average value is not always the typical value. (More? Check out my video seminar entitled, Using Skewed Data To Your Advantage HERE.)

Thanks for reading!

Craig.