Background: Why Sahi Pro Behaves Differently at Scale

Proxy-Based Architecture 101

Sahi Pro injects and observes browser traffic via its proxy. This allows auto-waiting, request manipulation, and robust element discovery independent of brittle CSS/XPath selectors. In enterprises, however, Sahi's proxy often chains to corporate proxies, SSL inspection devices, SSO gateways, and geolocated CDNs. Each hop modifies headers, TLS handshakes, or latency patterns, which changes how and when Sahi's waits, recorders, and playback logic see the application under test.

Implications for Modern SPAs

Single-page apps aggressively mutate the DOM. Sahi's accessor strategy is resilient but not magic: microfrontend iframes, shadow DOM, and virtualization (e.g., infinite lists) can hide target nodes or recycle them between frames. Auto-waits mask timing issues up to a point; beyond that, tests stall or race.

Architecture Deep-Dive: Subsystems and Failure Modes

1. HTTP/HTTPS Chain: Sahi Proxy → Corp Proxy → WAF/CDN → App

Failure modes: certificate trust errors, 407 proxy auth loops, inconsistent cookies due to header rewriting, and sporadic timeouts under packet inspection. Symptoms include tests passing inside lab networks but failing in CI runners; downloads hanging; recorder not capturing XHR calls.

2. Object Accessors and Smart Waits

Failure modes: elements found during record but not at playback; dynamic indices shifting under virtualization; shadow DOM boundaries making _byText and _near unreliable; stale references across route transitions.

3. Runner, Grid, and Parallelism

Failure modes: sessions collide on shared profiles; browser reuse leaks state; node saturation increases TTLs, causing false positives on timeouts; video recording and screenshots add I/O overhead that skews timings.

4. Data-Driven and External Systems

Failure modes: Excel/CSV encoding drift (UTF-8 vs ANSI) corrupts test data; parameter injection order differs between local and CI; environment variables not propagated to distributed agents; clock skew invalidates short-lived tokens for SSO flows.

5. Reporting and Diagnostics

Failure modes: enormous report folders slow cleanup; screenshot capture on every step overwhelms storage; log rotation silently drops the precise window of failure; mixed locale settings produce unreadable timestamps across teams.

Diagnostics: From Symptom to Root Cause

A. Network and TLS Triage

Start with the path. Confirm Sahi Pro's proxy settings, chained proxy, and certificate stores across OS, Java, and the browsers under automation. Intermittent TLS failures often reflect mismatched trust stores (system vs JRE vs browser) or transparent SSL inspection devices replacing certs mid-session.

# Check Java cacerts for Sahi's CA
$JAVA_HOME/bin/keytool -list -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit | grep Sahi

# Verify OS trust (Windows example)
certutil -store -user Root | findstr /I Sahi

# Quick port test to corp proxy
curl -v --proxy http://user:pass@corp-proxy:8080 https://example.com

B. Accessor Health and DOM Drift

When replay differs from record, dump the live DOM around the failing node, and compare with a capture from the recorder session. Look for changed text nodes, extra nesting wrappers, or virtualization placeholders. Validate whether the element lives inside shadow DOM or an iframe.

// In test, snapshot around failing element
_log(_getHTML(_near("Checkout"), 2));
_highlight(_near("Checkout"));

// Assert shadow root presence via evaluation
_eval("return !!document.querySelector('my-el')?.shadowRoot");

C. Parallelism Pressure

Failures that vanish when run alone are almost always shared-state or resource exhaustion issues. Capture per-run CPU, memory, and file descriptor counts. Ensure each worker has isolated browser profiles and download directories. Identify whether retries succeed without environmental reset.

# Linux process limits
ulimit -n
ps -o pid,pcpu,pmem,cmd -C chrome | sort -k2 -nr | head

D. Data and Locale Drift

Encoding inconsistencies surface as garbled input or selector mismatches for non-ASCII text. Diff your test data with a hex viewer and enforce BOM-less UTF-8. Align locale and time zone for runners to eliminate date/number formatting surprises that break _byText lookups.

# Normalize CSV to UTF-8 without BOM
iconv -f UTF-8 -t UTF-8 -c data.csv > data.utf8.csv
sed -i ' 1s/^\xEF\xBB\xBF// ' data.utf8.csv

Common Pitfalls and Their Root Causes

1. 407 Proxy Authentication Loops

Symptom: Recorder shows login page; playback fails with proxy auth prompts or endless redirects. Root cause: Sahi Pro's proxy chaining is set, but the corporate proxy expects NTLM/Kerberos that the headless runner does not negotiate. CI agents lack domain context or SPNs for service accounts.

2. Certificate Mismatch During HTTPS Interception

Symptom: Intermittent TLS errors on specific subdomains or during redirects. Root cause: SSL inspection inserts per-site intermediates; Sahi's CA is not trusted in a nested trust chain, or the browser profile ignores system trust.

3. Flaky Element Discovery in Virtualized Lists

Symptom: _click(_byText("Add")) passes locally but fails on grid. Root cause: Virtualization renders only the visible window; auto-scroll differs by DPI/viewport across workers; indices shift as rows recycle.

4. Shadow DOM and Microfrontends

Symptom: Elements plainly visible in the browser are invisible to accessors. Root cause: Shadow roots encapsulate nodes; iframe-in-iframe composition hides the target tree from the current context.

5. Downloads and Native Dialogs

Symptom: File downloads stall or UI prompts block automation. Root cause: Default profiles prompt for save locations; remote workers lack write permissions; security policies block blob: or data: URLs.

6. ISR-like Caching in the App Under Test

Symptom: Tests read stale data after server actions; assertions pass locally but fail in CI. Root cause: Aggressive HTTP caching or client-side state caches mask updates; Sahi's waits do not detect cache flushes unless the DOM actually changes.

Step-by-Step Playbooks

Playbook A: Stabilize Corporate Proxy and TLS

  1. Identify the path: From test worker to app, enumerate proxies and inspectors. Record IPs/ports and auth methods.
  2. Align trust: Import Sahi Pro's CA into Java, OS, and browser stores. Export corporate inspection intermediates and add to all stores.
  3. Use explicit credentials or machine trust: For NTLM/Kerberos, run agents under domain accounts; pre-register SPNs if required.
  4. Harden timeouts: Increase connect and read timeouts to absorb WAF latency; differentiate between DNS, connect, TLS handshake, and request timeouts in logs.
# Add corp intermediate to Java trust
$JAVA_HOME/bin/keytool -importcert -alias corp-inspect -keystore $JAVA_HOME/lib/security/cacerts -storepass changeit -file corp.cer

# Export Sahi CA and import to Windows user store (PowerShell)
Import-Certificate -FilePath .\sahi-root.cer -CertStoreLocation Cert:\CurrentUser\Root

Playbook B: Make Accessors Deterministic

  1. Prefer role/label accessors: Target stable ARIA roles, labels, or data-attributes injected by development teams specifically for tests.
  2. Scope queries: Confine searches to a visible container with _in() or proximity via _near() to avoid duplicates.
  3. Handle virtualization: Scroll the container until the item is realized; then assert visibility.
  4. Shadow DOM: Elevate to the shadow root context before searching.
// Data attributes and scoping
_click(_in(_div("Cart"), _link("Checkout")));
_click(_in(_byId("list"), _cell(_near("SKU"), 1)));

// Shadow DOM helper (pseudo)
_eval("return document.querySelector('my-el').shadowRoot.querySelector('button[label=\"Pay\"]')?.click()");

Playbook C: De-flake Parallel Execution

  1. Isolate profiles: Create per-run temp profiles and download dirs; never reuse across workers.
  2. Resource budgets: Cap concurrent browsers per node to keep CPU/memory headroom; disable unnecessary video/screenshots on green steps.
  3. Session cleanup: On failure, force browser kill and temp directory purge to avoid bleed-through state.
  4. Retry policy: Retries should re-provision the environment, not just re-run the script.
# Example runner flags (conceptual)
sahi.sh playback suite.suite -threads 4 -profile-temp -downloads %TEMP%\sahi\dl\%JOB_ID%

Playbook D: Fix Downloads and Native Prompts

  1. Preconfigure profiles: Disable prompts, set default download directory, allow blob: URLs.
  2. Server-side headers: Ensure Content-Disposition and correct MIME types to avoid inline display.
  3. Verify completion: Poll the filesystem for .crdownload/partial artifacts to disappear before asserting.
// Wait for download completion
_wait( function(){ return _exists("C:\\agent\\dl\\report.pdf"); }, 20000 );

Playbook E: Data Integrity and Locale

  1. Normalize inputs: Enforce UTF-8 for all data sources; validate with a pre-run check.
  2. Lock locales: Set LANG and time zone for agents and test VMs; align with application expectations.
  3. Token freshness: For SSO, renew tokens per test or stub IdP; reduce clock drift with NTP.
# CI step example (Linux)
export LANG=en_US.UTF-8
export TZ=UTC
sudo ntpq -p

Performance Engineering for Large Suites

Budget the Test Pyramid

Push heavy business logic to API and contract tests; keep UI tests focused on critical paths and visual contracts. Use Sahi Pro where browser fidelity is required, not as a substitute for lower-level checks.

Reduce Churn

Stabilize selectors via test-only attributes. Collaborate with frontend teams to expose data-testid/role semantics. Eliminate incidental text coupling; copy changes should not break tests.

Warm the Path

Cache static assets at the proxy or CDN. Shorten login by provisioning pre-authenticated sessions via API when business rules allow, then navigate to the target route for UI assertions.

Batch and Shard

Split suites by deterministic tags (e.g., feature, risk, runtime). Avoid random selection that hides long-running clusters. Keep shards below node capacity such that average CPU never exceeds ~70%.

Observability and Forensics

Golden Signals for UI Automation

  • Latency: Step and page transition times; 95th and 99th percentiles.
  • Errors: Distinguish assertion failures, network errors, and environment errors.
  • Saturation: Browser and OS resource utilization.
  • Traffic: Outbound requests per step; spikes indicate chatty pages.

Artifacts That Matter

On failure, bundle: HTML snapshot, network HAR, console logs, and screenshot. Keep retention reasonable (e.g., 7-14 days) and export summaries to your APM or log aggregator. Build a small forensic viewer that correlates step logs with HAR timings and DOM snapshots.

// Minimal HAR capture (conceptual)
_eval("window.performance?.getEntriesByType?.('resource')?.map(e=>({n:e.name,s:e.startTime,d:e.duration}))");

Security, SSO, and Compliance Considerations

Working with SSO and MFA

For CI, use non-interactive auth flows: OAuth client credentials, resource owner password for sandbox, or pre-issued short-lived cookies. If MFA cannot be bypassed, create a controlled test tenant with virtual MFA that rotates secrets programmatically. Document segregation of duties and access scopes.

Certificate Lifecycle

Track expiry dates for Sahi Pro's CA and corporate intermediates. Automate renewal pipelines and distribute to all trust stores. Expired intermediates cause sudden, global failures that are hard to attribute during outages.

Governance: Keeping Suites Healthy Over Years

Version Pinning and Release Cadence

Lock Sahi Pro version per quarter; test upgrades on a staging grid before rolling out. Pin browser versions for test stability; only move forward when app teams validate compatibility.

Definition of Done for Tests

A test is "done" when it has stable selectors, cleanup steps, idempotent data, and runs under network throttling. Enforce this via PR templates and codeowners.

Debt Registers

Track flaky tests with owner, failure rate, and root cause hypothesis. Disable by policy after a threshold of flakes and assign to a weekly triage that either fixes or deletes.

Concrete Code Patterns

Resilient Login with Short-Lived Session

Prefer API-backed login to avoid UI churn while preserving end-to-end coverage for critical pages.

// Pseudocode: create session via API, then attach cookie
var token = _navigateTo("/api/test-login?user=qa&pass=..." );
_setCookie("session", token);
_navigateTo("/dashboard");
_assertExists(_byText("Welcome"));

Virtualized List Scrolling

Make the list realize the target item before clicking; assert via visibility.

function selectSku(id){
  while(!_exists(_cell(id))) {
    _scroll(_byId("list"), 0, 400);
    _wait(300);
  }
  _click(_cell(id));
}

Shadow DOM Access Helper

Bridge into shadow roots when necessary; keep this in a shared lib so teams do not reinvent workarounds.

function clickInShadow(host, selector){
  var js = "var h=document.querySelector('"+host+"');"+
           "if(!h||!h.shadowRoot)return false;"+
           "var t=h.shadowRoot.querySelector('"+selector+"');"+
           "if(!t)return false;t.click();return true;";
  return _eval(js);
}

CI/CD Integration Patterns

Deterministic Sharding

Assign tests to shards by hashing the test id; avoid time-based slicing that leads to inconsistent failure reproduction. Track shard utilization and rebalance periodically.

Fast Feedback Lanes

Gate PRs with smoke/contract tests; run the full regression nightly. Post flaky-test deltas to chat with owner mentions. Use artifact links directly in failure messages for one-click triage.

Environment Parity

Align OS images, browser versions, fonts, and locales between developer machines and CI nodes. Bake golden AMIs or containers with all trust stores preloaded and Sahi Pro configured identically.

Capacity Planning for Grids

Right-Sizing Nodes

Favor medium nodes over super-dense nodes. UI automation is latency/sensitive; contention turns small hiccups into cascading timeouts. Keep browser count per node low enough to guarantee headroom for spikes and OS updates.

Cold Start Policies

Pre-start a warm pool of nodes before nightly runs. Cache dependencies (browsers, profiles) on ephemeral disks. Use rolling restarts to avoid synchronized GC pauses across the fleet.

Long-Term Solutions and Modernization

Shift Assertions Downstream

Move idempotent business assertions into API or contract tests that run on every commit. Reserve Sahi Pro runs for UX-critical flows and cross-browser checks. This reduces cost and flake while preserving confidence.

Contracted Testability

Agree with product teams on test-only contracts: data-testid, ARIA roles, stable route IDs, and fixture APIs. Codify these in your definition of ready. Fewer brittle selectors means fewer breakages at sprint boundaries.

Security-Aligned Automation

Codify proxy/cert requirements in infrastructure as code. Treat cert rotation and proxy credential renewal as release criteria. This avoids surprise outages when certs age out.

Conclusion

Sahi Pro's proxy-centric model, smart waits, and cross-browser reach are powerful, but large organizations surface integration seams that smaller setups never see: chained proxies and TLS inspection, SPA virtualization and shadow DOM, parallelism pressure, and data/locale drift. Durable stability emerges when you address these as architectural concerns, not one-off fixes. Standardize trust and networking, institutionalize resilient accessors, isolate and right-size parallel runners, and invest in observability. With those foundations, Sahi Pro becomes a predictable, cost-effective pillar in your end-to-end quality strategy.

FAQs

1. How do we stop flaky tests that only fail in CI behind the corporate proxy?

Trust alignment is the usual culprit: import Sahi Pro's CA and your corporate intermediates into Java, OS, and browser stores on the runners. Run agents under domain accounts if NTLM/Kerberos is required, and increase timeouts to absorb WAF latency.

2. What's the most reliable strategy for selectors on dynamic SPAs?

Use test-only data-* attributes, ARIA roles, and scoped searches with _in()/_near(). For virtualized lists, scroll until realized and assert visibility; for shadow DOM, switch context via helpers rather than brittle XPath.

3. Why do downloads routinely hang on remote workers?

Because default profiles prompt for destinations or lack permissions. Preconfigure profiles to auto-save to writable temp directories, ensure correct Content-Disposition headers, and wait for partial files to complete before assertions.

4. How can we cut a 3-hour regression to under an hour without losing coverage?

Shard deterministically, reduce screenshots to failure-only, right-size node density, and move non-UI assertions to API tests. Keep a smoke lane for PRs and run full suites nightly with a warm pool.

5. What governance practices keep suites healthy over the long term?

Pin Sahi Pro and browser versions, require testability contracts in the UI, maintain a flake register with owners, and treat proxy/cert rotations as first-class release work. Enforce a "definition of done" for tests that includes stability, cleanup, and idempotent data.