ASCENT-Imtiaz Ahmad | The Valuer

Ascent

Editorial notes—internal currents, reflections, and grounding thoughts. Short. Direct. From the core.

Ascent • Spring - Summer 2026 • Air

Frontier LLMs and the Future of Private Company Valuation

By Hafiz Imtiaz Ahmad | Higher Colleges of Technology | Abu Dhabi

Advanced frontier Large Language Models (LLMs) - artificial intelligence systems trained on vast amounts of text data - are no longer just "answer engines."

They have evolved into agentic production systems capable of handling complex workflows. These tools can now ingest messy valuation inputs such as PDFs (Portable Document Formats), pitch decks, loan agreements, financial statements, and emails. They build models, test assumptions, and draft professional deliverables. This shift is clear in emerging models like GPT-5.3-Codex (OpenAI, 2026a), Claude Opus 4.6 (Anthropic, 2026a), and Gemini 3 Pro (Google DeepMind, 2026).

For private company valuation, AI won't replace the appraiser. What it does is compress cycle times and raise the floor for repeatable work. Think of tasks like document parsing, normalization, comparable screening, scenario generation, sensitivity grids, and narrative drafting. This frees up human effort for judgment-heavy work: defining the standard and premise of value, assessing risk, making defensible adjustments, and explaining why your conclusion makes sense.

This aligns perfectly with the International Valuation Standards Council's (IVSC) position: technology can assist, but an IVS-compliant valuation cannot be produced by an automated model alone. Professional judgment remains essential (IVSC, 2025).

The strategic turning point involves governance. The same tools that speed up analysis also introduce new failure modes. These include hallucinated facts (where the AI confidently invents incorrect information), prompt injection through untrusted documents (malicious hidden text that tricks the AI), opaque "compaction" states (where AI summarizes data in ways humans can't verify), and reproducibility issues when models change. The providers themselves warn about these risks (OpenAI, 2026d; Google DeepMind, 2026; Anthropic, 2026c).

The future will be defined by "valuation as code + valuation as evidence." Firms that build reproducible pipelines - with structured data, versioned assumptions, model governance, audit trails, and documented human review - will turn frontier LLMs into a competitive advantage. Firms that treat them as ad hoc chat assistants? They'll face quality drift, compliance friction, and skeptical clients.

Technical Capabilities That Matter for Valuation Work

Three capability clusters drive valuation impact: multimodal ingestion, tool-using reasoning, and governable customisation (reproducibility, versioning, and controls).
Gemini 3 Pro handles text, audio, image, video, and PDF natively. It offers a 1 million token context window (meaning it can "remember" about 750,000 words at once) and substantial output capacity on Vertex AI (Google's enterprise AI platform) (Google DeepMind, 2026). Claude Opus 4.6 excels at long-context agentic work, sustaining complex tasks in large codebases (Anthropic, 2026a). GPT-5.3-Codex is designed as a "general-purpose agent" for professional computer work, with emphasis on tool use, long-running execution, and sandboxing controls (OpenAI, 2026a).

A key differentiator for valuation workflows is agent infrastructure and guardrails. GPT-5.3-Codex runs in sandboxed environments by default—cloud containers with network disabled. It includes explicit guidance on controlling internet access to prevent prompt-injection and data exfiltration (unauthorized data transfer) (OpenAI, 2026b).
Gemini 3 Pro exposes enterprise-facing controls: structured output, function calling (allowing the AI to execute specific programming functions), code execution, context caching, and the Vertex RAG (Retrieval-Augmented Generation) Engine. All highly relevant for "valuation as code" (Google DeepMind, 2026).

Explainability remains imperfect across all vendors. OpenAI's platform is moving toward encrypted artifacts where underlying logic may evolve. This creates audit-trail challenges for regulated deliverables (OpenAI, 2026c). Anthropic emphasises monitoring for agentic misuse and warns about "eager" actions—when agents act without asking first (Anthropic, 2026c).

Hallucination performance should be measured, not marketed. OpenAI frames hallucination testing via benchmarks like SimpleQA and PersonQA. Their research suggests hallucinations persist partly because training incentives reward guessing over admitting uncertainty (OpenAI, 2026e).

Google DeepMind reports factuality results for Gemini 3 Pro: 72.1% on SimpleQA Verified (Google DeepMind, 2026). That's strong by benchmark standards, but it still means ungrounded claims can slip into valuation narratives if you're not careful.

Capability Comparison Table

Practical Use Cases Across the Valuation Lifecycle

The most durable value of LLMs in private company valuation comes from three areas: (a) ingesting and transforming unstructured data, (b) repeatable computation scaffolds, and (c) narrative production with evidence links.

Gemini 3 Pro excels at "read the room" tasks - ingesting CIMs (Confidential Information Memorandums), QoE (Quality of Earnings) reports, contracts, and board decks, then extracting valuation-ready datasets (Google DeepMind, 2026). GPT-5.3-Codex runs multi-step workflows in a controlled sandbox, making it ideal for building "valuation copilots" that update models, run tests, and produce work logs (OpenAI, 2026b). Claude Opus 4.6 handles sustained agentic work with long context, though you need to watch for overly eager behavior (Anthropic, 2026a).

LLMs will change how valuation teams handle comparables. Instead of one analyst manually filtering a comp set, an agentic workflow can: (i) list candidate peers, (ii) justify inclusion or exclusion, (iii) flag contradictory evidence, and (iv) generate sensitivity sets. The human reviewer decides the final peer set and documents the judgment. This fits perfectly with standards that emphasise collecting relevant information and documenting your reasoning (ASA, 2022).

Even discount estimation (DLOC/DLOM - Discount for Lack of Control and Discount for Lack of Marketability) becomes more scalable. The model won't "know the right discount," but it can quickly compile empirical reference sets - restricted stock studies, pre-IPO (Initial Public Offering) studies, option-model inputs. It can enforce consistent assumptions across scenarios and draft the narrative while flagging where you need to exercise judgment. IVS explicitly requires that significant inputs be appropriate and justified, with quality controls covering data, assumptions, and inputs (IVSC, 2025). AI-supported discount modules with validation checks fit directly into this framework.

Use-Case Mapping Table

Workflow Integration and Valuation as Code

The operational goal is a controlled pipeline: unstructured documents → structured dataset → valuation model → narrative report. With versioned inputs, reproducible computation, and an auditable workfile. This aligns with standards that require you to document the information you relied on and the work product you used to reach conclusions (ASA, 2022).

The most important architectural patterns are:

- Retrieval-First Grounding: Use firm-approved documents (engagement files, curated market data, prior memos) as your default context. Don't rely on the model's parametric memory. Gemini 3 Pro's RAG Engine support and
- Codex's tool integration make this practical (Google DeepMind, 2026).
- Human-in-the-Loop Gates: Build in mandatory review points for (i) key facts, (ii) key assumptions, and (iii) the valuation conclusion. This matches IVSC's emphasis on professional skepticism and ASA's focus on appraiser judgment (IVSC, 2025; ASA, 2022).
- Reproducibility and Versioning: Models change. OpenAI recommends using snapshots to lock a specific model version for consistent behavior. Google's Vertex makes model versioning explicit (OpenAI, 2026f).
- State Management: OpenAI's compaction endpoint returns encrypted, opaque items that may evolve. So preserve the original evidence set. Treat compaction as convenience, not as your audit record (OpenAI, 2026c).

Risks, Limitations, and Quantified Reliability Signals

Advanced LLMs fail in ways that look persuasive. Providers' documentation highlights three risk categories that matter for valuation: hallucination/guessing, prompt injection/data exfiltration, and agentic overreach.

Gemini 3 Pro documentation admits it "may occasionally guess when information is missing." You can mitigate this with careful prompting (Google DeepMind, 2026). Codex documentation warns that enabling internet access increases risks: prompt injection, code exfiltration, malware dependencies. It even provides examples of how hidden instructions could leak sensitive data (OpenAI, 2026d).

Anthropic notes that Opus 4.6 can act "too eager" in agentic settings - taking risky actions without asking first (Anthropic, 2026c).

You can quantify these risks, but read the numbers carefully. OpenAI reports that GPT-5.2 "Thinking" produced 30% fewer error-containing responses than GPT-5.1 on de-identified queries. Its system card reports low hallucination rates with browsing enabled (OpenAI, 2026g). Encouraging, yes. But not a free pass for unverified valuation facts - especially since private company work depends on confidential, non-public inputs where browsing doesn't help.

Google DeepMind reports Gemini 3 Pro scores 72.1% on SimpleQA Verified (Google DeepMind, 2026). That's strong, but it still means meaningful error rates if you let the model answer ungrounded questions. OpenAI's research argues that hallucinations persist because evaluations reward guessing over admitting uncertainty. That's a critical warning for valuation teams drafting narratives under deadline pressure (OpenAI, 2026e).

Regulatory and IP (Intellectual Property) risks also increase with "valuation agents." Codex flags license restrictions as a risk when agents pull code or content from the internet (OpenAI, 2026d). This forces firms to choose enterprise configurations that align with confidentiality obligations (Google DeepMind, 2026).

Risk-to-Control Table for Valuation Firms

Implications for ASA, NACVA, and IVSC Professional Responsibility

Across standards bodies, the common thread is accountability, judgment, and documentation. AI is a tool inside the valuation process, not a substitute for the valuation conclusion.
IVSC's perspectives paper is explicit: while AI/ML (Artificial Intelligence/Machine Learning) models can assist, no model (including an AVM—Automated Valuation Model) can produce an IVS-compliant valuation without the valuer applying professional judgment.

This includes assessing inputs and understanding model operation and fitness for purpose (IVSC, 2025). IVS also defines an Automated Valuation Model (AVM) as a model that produces automated calculations without the valuer applying judgment over inputs or outputs. In other words, "automation" is not "valuation."

This maps cleanly onto how you should treat LLMs: as workflow accelerators that

(i) help gather and process information, (ii) generate drafts and alternatives, and
(iii) run repeatable computations. But you must (a) select and justify inputs, (b) apply skepticism, (c) evaluate reasonableness, and (d) retain the evidence trail. IVS requires quality control over data, assumptions, and inputs - including comparing against authoritative sources and eliminating stale data (IVSC, 2025).

For ASA practice, the American Society of Appraisers Business Valuation Standards require the appraiser to gather and analyze relevant information, select and apply appropriate approaches, consider appropriate discounts and premiums, and "appropriately document and retain all information relied on and the work product used" (ASA, 2022). NACVA (National Association of Certified Valuators and Analysts) practice relies on similar principles. The logic is clear: any AI-derived figure or narrative must trace back to inputs and your analysis - not just to a model output (NACVA, 2025).

The most important standards implication: your workfiles must adapt. If your valuation process includes LLM assistance, your workfile should preserve:

1. The source documents you relied on
2. The extracted datasets and tie-outs
3. The model version and configuration
4. Your human review decisions—especially around key assumptions and any discounts or premiums

This is a direct extension of existing documentation requirements, not a new compliance category (Ojewale et al., 2026).

Talent, Staffing, Pricing, and Business Model Shifts

LLMs will change valuation firms the way spreadsheets once did. They won't eliminate judgment, but they will re-price routine labor and reward process design. GPT-5.3-Codex is positioned to handle "nearly anything professionals can do on a computer" - analysing data in sheets, producing slide decks (OpenAI, 2026a). Gemini 3 Pro focuses on complex multimodal reasoning and tool use (Google DeepMind, 2026).

The near-term staffing pattern will likely be "barbell": fewer junior hours on first-pass drafting and extraction. More emphasis on (i) technical valuation engineers who build data and model pipelines, (ii) review-heavy senior analysts, and (iii) domain specialists who pressure-test assumptions. Research already shows LLMs can handle meaningful financial statement analysis under controlled settings. First-pass analysis will increasingly be automated (Kim et al., 2024).

Pricing will shift from billable hours to deliverable + assurance tiers. As generation becomes cheap, clients will pay for: (a) defensible inputs, (b) audit-ready workfiles, (c) expert testimony readiness, and (d) scenario coverage. IVS's emphasis on data and input controls reinforces this direction. Advanced models still require governance (IVSC, 2025).

Recommendations for Practitioners

Frame adoption as risk-managed modernization, not "AI experimentation." The roadmap below assumes no specific jurisdiction. In regulated contexts, push governance earlier and harder (ESMA, 2025).
Key control recommendations aligned to standards:

- Define what the model is allowed to do. Separate "drafting and extraction" from "decision outputs." An LLM can propose a discount range, but you must select and justify it with evidence and judgment. This is required by IVS and ASA.
- Build a provenance-first workfile. Store source docs, extracted tables, tie-outs, assumption sets, model runs, and reviewer approvals. This directly implements ASA documentation and IVS data controls.
- Use least-privilege agent design. Default to sandboxed execution. Restrict network access. Use allowlists and safe HTTP methods. Treat all web and document content as untrusted (OpenAI, 2026d).
- Treat compaction and hidden state as convenience, not evidence. If using server-side compaction, keep the full original record for auditability (OpenAI, 2026c).
- Measure reliability on your own benchmark set. Use domain-specific valuation tests: tie-outs, narrative-to-schedule consistency. Don't just trust general benchmark claims (OpenAI, 2026e).
- Vendor selection: prioritise enterprise controls. Choose platforms based on data governance, versioning, integration (RAG, structured output), and audit logging. Gemini 3 Pro's enterprise feature set is a useful yardstick (Google DeepMind, 2026).

Conclusion

The arrival of frontier LLMs like GPT-5.3-Codex, Claude Opus 4.6, and Gemini 3 Pro marks a genuine shift in how private company valuation work gets done. These are not incremental improvements - they represent a move from AI as a research assistant to AI as a production system that can execute multi-step workflows. But this shift demands clear thinking about where the technology helps and where it introduces new risks.

Treat these tools as process infrastructure: ingestion, validation, computation, and narrative generation, with human judgment firmly embedded at every decision point. The irreducible professional value remains judgment, accountability, and defensible documentation. This is explicitly reinforced by IVSC's position that AI models alone cannot produce IVS-compliant valuations. The appraiser's role is not diminished - it is refocused on the work that truly requires professional expertise.

The biggest technical risks are not computational errors but hallucinated facts and prompt injection - where confidently presented but incorrect information slips into your deliverables, or where malicious instructions hidden in documents compromise your workflow. These risks are real and documented by the providers themselves. Mitigation requires retrieval-first design (grounding AI outputs in your firm's documents), least-privilege access controls, and strict data provenance tracking.

Looking forward, "valuation as code"- characterised by versioned inputs, model tests, logged runs, and pinned model versions - will become the competitive baseline for serious firms.

The firms that succeed will not be those with the fanciest AI, but those with the strongest governance around how that AI is used, audited, and controlled. Your clients will increasingly pay not for pages of narrative, but for audit-ready workfiles, defensible assumptions, and expert testimony readiness. This is the future of valuation practice: faster, more scalable, but ultimately more rigorous in its demands for transparency and accountability.

References

American Society of Appraisers. (2022). Business Valuation Standards. Retrieved from
https://www.appraisers.org/docs/default-source/5---standards/bv-standards-feb-2022.pdf

Anthropic. (2026a). Introducing Claude Opus 4.6. Retrieved from https://www.anthropic.com/news/claude-opus-4-6
Anthropic. (2026b). Claude Opus 4.6 Sabotage Risk Report. Retrieved from:https://www-cdn.anthropic.com/f21d93f21602ead5cdbecb8c8e1c765759d9e232.pdf

Anthropic. (2026c). Transparency Hub: Safety Evaluation Summaries. Retrieved from https://www.anthropic.com/transparency
European Securities and Markets Authority (ESMA). (2025). Leveraging Large Language Models in Finance: ILB ESMA Turing Report. Retrieved from
https://www.esma.europa.eu/sites/default/files/2025-06/LLMs_in_finance_-
_ILB_ESMA_Turing_Report.pdf

Google DeepMind. (2026). Gemini 3 Pro Model Card and Technical Documentation.
Retrieved from https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-pro
Huang, J. (2024). NVIDIA GTC Keynote: Jensen Huang talks about Perplexity [Video].
YouTube.

International Valuation Standards Council (IVSC). (2025). Navigating the Rise of AI in Valuation: Opportunities, Risks, and Standards. Retrieved from https://ivsc.org/wp-
content/uploads/2025/07/Navigating-the-Rise-of-AI-in-Valuation-Opportunities-Risks-and-Standards.pdf
Kim, A., Muhn, M., & Nikolaev, V. V. (2024). Financial Statement Analysis with Large Language Models. University of Chicago Working Paper. Retrieved from
https://www.suerf.org/wp-content/uploads/2024/10/SUERF-Policy-Brief-1008_Kim-et-al.pdf
Kong, Y., Lee, H., Hwang, Y., Lopez-Lira, A., Levy, B., Mehta, D., Wen, Q., Choi, C., Lee, Y., & Zohren, S. (2026). Evaluating LLMs in finance requires explicit bias consideration. Retrieved from: https://www.arxiv.org/abs/2602.14233 (arXiv:2602.14233)

Moro-Visconti, R. (2024). Artificial Intelligence-Driven FinTech Valuation: A Scalable Multilayer Network Approach. FinTech, 3, 479–495.
Moro-Visconti, R. (2025). Valuation of Artificial Intelligence. Journal of European Real Estate Research, 1-12.
National Association of Certified Valuators and Analysts (NACVA). (2025). AI Related Standards and Ethics FAQ Library.
Ojewale, V., Suresh, H., & Venkatasubramanian, S. (2026). Audit trails for accountability in large language models. Retrieved from: https://arxiv.org/abs/2601.20727

OpenAI. (2026a). Introducing GPT-5.3-Codex. Retrieved from https://openai.com/index/introducing-gpt-5-3-codex/
OpenAI. (2026b). GPT-5.3-Codex System Card. Retrieved from https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
OpenAI. (2026c). API Reference: Compaction and Responses. Retrieved from https://platform.openai.com/docs/api-reference/responses
OpenAI. (2026d). Codex Cloud Internet Access and Security. Retrieved from https://developers.openai.com/codex/cloud/internet-access/
OpenAI. (2026e). Why Language Models Hallucinate. Retrieved from https://openai.com/index/why-language-models-hallucinate/
OpenAI. (2026f). Model Versioning and Snapshots. Retrieved from https://developers.openai.com/api/docs/models/gpt-5.1
OpenAI. (2026g). Introducing GPT-5.2 and Hallucination Rates. Retrieved from https://openai.com/index/introducing-gpt-5-2/