Oracle Architecture and Diagnostic Facilities

Core Components to Monitor

  • System Global Area (SGA) and Program Global Area (PGA)
  • Redo/Undo logs and Temporary Tablespace
  • Wait Events and Latches
  • Background processes (DBWR, LGWR, SMON, PMON)

Diagnostic Tools

  • Automatic Workload Repository (AWR)
  • Active Session History (ASH)
  • Oracle Enterprise Manager (OEM)
  • SQL*Plus, SQL Developer, tkprof, and trace files

Common Troubleshooting Scenarios

1. High CPU Usage by Oracle Processes

Causes often include:

  • Suboptimal execution plans (full table scans, nested loops)
  • Unindexed WHERE clauses or poor joins
  • Excessive hard parsing or non-shared cursors

Steps to Diagnose

SELECT sql_id, cpu_time, parsing_schema_name
FROM v$sql
ORDER BY cpu_time DESC FETCH FIRST 10 ROWS ONLY;

SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('sql_id', child_number));

Resolutions

  • Analyze execution plans with DBMS_XPLAN
  • Use SQL Profiles or SQL Plan Baselines to guide optimization
  • Refactor queries to leverage indexes and CBO hints

2. Blocking and Deadlocks

Deadlocks typically arise from:

  • Application-level row locking conflicts
  • Improper transaction isolation levels
  • Manual locking or explicit SELECT FOR UPDATE chains

Diagnostic Query

SELECT blocking_session, sid, serial#, wait_class, seconds_in_wait
FROM v$session
WHERE blocking_session IS NOT NULL;

SELECT * FROM dba_blockers; SELECT * FROM dba_waiters;

Remediation

  • Ensure consistent locking order across modules
  • Shorten transaction duration
  • Implement retry logic and catch ORA-00060 exceptions

3. Temp Tablespace Overuse

Operations like hash joins, sorts, and global temp tables may overconsume temp space.

Check Usage

SELECT username, tablespace, blocks*8192/1048576 MB_USED
FROM v$sort_usage;

SELECT * FROM dba_temp_free_space;

Fixes

  • Increase TEMP tablespace or configure TEMPFILEs
  • Tune queries to avoid unnecessary disk sorts
  • Use in-memory sorts or LIMIT clauses for large result sets

Advanced Scenarios and Long-Term Fixes

1. RAC Performance Anomalies

In Oracle RAC (Real Application Clusters), performance issues may stem from interconnect latency or high GC (Global Cache) contention.

Diagnostic

SELECT * FROM gv$ges_blocking_enqueue;
SELECT * FROM gv$gc_element_stats;

Best Practices

  • Use instance affinity and service-level load balancing
  • Avoid frequent cross-node block transfers
  • Isolate high-GC schemas to fewer nodes

2. Redo/Undo Contention

Heavy DML workloads can saturate redo log buffers and undo segments, causing IO waits.

Mitigation

  • Enable redo log compression and group commit
  • Scale undo tablespaces and assign undo_retention dynamically
  • Distribute batch jobs or use partition-wise DML

Performance Tuning Methodology

1. Capture a Baseline

EXEC DBMS_WORKLOAD_REPOSITORY.create_snapshot();

2. Compare Snapshots via AWR

SELECT * FROM dba_hist_snapshot;
-- Generate AWR report:
@$ORACLE_HOME/rdbms/admin/awrrpt.sql

3. Index and Query Optimization

  • Regularly rebuild fragmented indexes
  • Use function-based indexes where needed
  • Gather statistics with DBMS_STATS

Conclusion

Oracle Database offers unparalleled power, but it demands precision in configuration and workload design. Issues like CPU spikes, locking, and storage contention require a layered diagnostic approach using Oracle's built-in tools and a firm grasp of execution models. By aligning application patterns with database internals and proactively tuning, teams can maintain high availability, performance, and stability across mission-critical Oracle deployments.

FAQs

1. How can I identify the top resource-consuming SQLs?

Use V$SQL and AWR reports to list high CPU, I/O, and elapsed-time SQL queries. Always cross-reference with execution plans.

2. What's the difference between logical and physical reads?

Logical reads fetch from buffer cache, while physical reads hit disk. High physical reads indicate cache inefficiency or large data scans.

3. How do I resolve ORA-01555 snapshot too old?

Increase undo_retention, ensure long-running queries don't span massive DMLs, and consider setting undo tablespace size manually.

4. Is it safe to kill a blocking session?

Only as a last resort. Investigate the root cause first—manual kills can cause rollbacks and orphaned locks.

5. How often should statistics be gathered?

Run DBMS_STATS weekly or after major data changes. Use auto-sampling and avoid locking stats unless necessary.