Oracle Architecture and Diagnostic Facilities
Core Components to Monitor
- System Global Area (SGA) and Program Global Area (PGA)
- Redo/Undo logs and Temporary Tablespace
- Wait Events and Latches
- Background processes (DBWR, LGWR, SMON, PMON)
Diagnostic Tools
- Automatic Workload Repository (AWR)
- Active Session History (ASH)
- Oracle Enterprise Manager (OEM)
- SQL*Plus, SQL Developer, tkprof, and trace files
Common Troubleshooting Scenarios
1. High CPU Usage by Oracle Processes
Causes often include:
- Suboptimal execution plans (full table scans, nested loops)
- Unindexed WHERE clauses or poor joins
- Excessive hard parsing or non-shared cursors
Steps to Diagnose
SELECT sql_id, cpu_time, parsing_schema_name FROM v$sql ORDER BY cpu_time DESC FETCH FIRST 10 ROWS ONLY; SELECT * FROM table(DBMS_XPLAN.DISPLAY_CURSOR('sql_id', child_number));
Resolutions
- Analyze execution plans with
DBMS_XPLAN
- Use SQL Profiles or SQL Plan Baselines to guide optimization
- Refactor queries to leverage indexes and CBO hints
2. Blocking and Deadlocks
Deadlocks typically arise from:
- Application-level row locking conflicts
- Improper transaction isolation levels
- Manual locking or explicit SELECT FOR UPDATE chains
Diagnostic Query
SELECT blocking_session, sid, serial#, wait_class, seconds_in_wait FROM v$session WHERE blocking_session IS NOT NULL; SELECT * FROM dba_blockers; SELECT * FROM dba_waiters;
Remediation
- Ensure consistent locking order across modules
- Shorten transaction duration
- Implement retry logic and catch ORA-00060 exceptions
3. Temp Tablespace Overuse
Operations like hash joins, sorts, and global temp tables may overconsume temp space.
Check Usage
SELECT username, tablespace, blocks*8192/1048576 MB_USED FROM v$sort_usage; SELECT * FROM dba_temp_free_space;
Fixes
- Increase TEMP tablespace or configure TEMPFILEs
- Tune queries to avoid unnecessary disk sorts
- Use in-memory sorts or LIMIT clauses for large result sets
Advanced Scenarios and Long-Term Fixes
1. RAC Performance Anomalies
In Oracle RAC (Real Application Clusters), performance issues may stem from interconnect latency or high GC (Global Cache) contention.
Diagnostic
SELECT * FROM gv$ges_blocking_enqueue; SELECT * FROM gv$gc_element_stats;
Best Practices
- Use instance affinity and service-level load balancing
- Avoid frequent cross-node block transfers
- Isolate high-GC schemas to fewer nodes
2. Redo/Undo Contention
Heavy DML workloads can saturate redo log buffers and undo segments, causing IO waits.
Mitigation
- Enable redo log compression and group commit
- Scale undo tablespaces and assign undo_retention dynamically
- Distribute batch jobs or use partition-wise DML
Performance Tuning Methodology
1. Capture a Baseline
EXEC DBMS_WORKLOAD_REPOSITORY.create_snapshot();
2. Compare Snapshots via AWR
SELECT * FROM dba_hist_snapshot; -- Generate AWR report: @$ORACLE_HOME/rdbms/admin/awrrpt.sql
3. Index and Query Optimization
- Regularly rebuild fragmented indexes
- Use function-based indexes where needed
- Gather statistics with DBMS_STATS
Conclusion
Oracle Database offers unparalleled power, but it demands precision in configuration and workload design. Issues like CPU spikes, locking, and storage contention require a layered diagnostic approach using Oracle's built-in tools and a firm grasp of execution models. By aligning application patterns with database internals and proactively tuning, teams can maintain high availability, performance, and stability across mission-critical Oracle deployments.
FAQs
1. How can I identify the top resource-consuming SQLs?
Use V$SQL and AWR reports to list high CPU, I/O, and elapsed-time SQL queries. Always cross-reference with execution plans.
2. What's the difference between logical and physical reads?
Logical reads fetch from buffer cache, while physical reads hit disk. High physical reads indicate cache inefficiency or large data scans.
3. How do I resolve ORA-01555 snapshot too old?
Increase undo_retention, ensure long-running queries don't span massive DMLs, and consider setting undo tablespace size manually.
4. Is it safe to kill a blocking session?
Only as a last resort. Investigate the root cause first—manual kills can cause rollbacks and orphaned locks.
5. How often should statistics be gathered?
Run DBMS_STATS weekly or after major data changes. Use auto-sampling and avoid locking stats unless necessary.