Category: Validation & Reproducibility

Input validation, benchmark packs, golden tests, provenance, repeatable runs, model/version traceability.

  • What an audit-ready pricing run should contain

    What an audit-ready pricing run should contain

    A pricing result is easy to store.

    An audit-ready pricing run is harder.

    The difference is that a pricing result tells you what the number was. An audit-ready run tells you how that number came into existence.

    That distinction matters. In quantitative finance, a number rarely stands alone. An NPV, PV01, cashflow report, curve value, or scenario output depends on market data, portfolio inputs, instrument definitions, model assumptions, convention sets, engine versions, solver settings, scenario definitions, and validation rules.

    If those pieces are not preserved, the result may be useful for a moment, but difficult to explain later.

    An audit-ready pricing run should answer a simple set of questions:

    If the system cannot answer those questions, then the pricing run is not truly audit-ready.

    A pricing result is not enough

    Most analytics workflows naturally focus on outputs.

    That is understandable. Users usually care about the NPV, the cashflows, the sensitivities, the scenario P&L, or the aggregated portfolio report. The output is what gets reviewed, exported, shared, and acted upon.

    But the output is only the final layer.

    A pricing run is a chain of decisions and transformations. It begins with user-supplied inputs. Those inputs are parsed, validated, normalized, mapped to internal schemas, combined with market data, associated with convention sets, passed to a pricing engine, transformed into outputs, and finally displayed or exported.

    Each step can affect the result.

    That means an audit-ready run cannot consist only of the final output table. It needs to preserve the full context of the calculation.

    A report that says:

    is not enough.

    A better report can answer:

    That is the difference between a number and a traceable result.

    The run identity

    Every audit-ready pricing run needs a stable identity.

    That identity should not be a filename, a spreadsheet tab, or an informal timestamp in a folder. It should be a proper run identifier that links the inputs, configuration, execution metadata, outputs, logs, warnings, and exported artifacts.

    The run ID is the anchor.

    Without a stable run ID, teams often end up asking questions like:

    A run ID gives the organization a durable reference point. It allows users, support teams, developers, and auditors to talk about the same calculation without ambiguity.

    A useful run identity should include or link to:

    • the run ID,
    • the workspace or project,
    • the tenant or organization,
    • the user or system actor that triggered the run,
    • the run type,
    • the run status,
    • the creation time,
    • the completion time,
    • and links to input and output artifacts.

    The run identity should not depend on mutable context. If a portfolio is later corrected or a market snapshot is replaced, the original run should still point to the exact inputs it used at the time.

    The input references

    The most important part of an audit-ready run is the input record.

    The system should not merely store that “a portfolio was uploaded” or “a market snapshot was used.” It should store exact references to the validated inputs.

    For a basic pricing run, that usually means:

    • portfolio reference,
    • market snapshot reference,
    • curve input reference,
    • fixing data reference,
    • convention set reference,
    • pricing configuration reference,
    • scenario set reference, if applicable.

    These references should point to immutable or versioned artifacts.

    This matters because input files evolve. A user may upload a corrected portfolio. A market snapshot may be replaced. A convention set may receive a new version. A scenario recipe may be modified.

    If the run only points to “latest portfolio” or “current conventions,” then it cannot be reliably reproduced. Reproduction requires the exact inputs, not the current inputs.

    The same principle applies to derived inputs.

    If the system builds curves from market quotes, the pricing run should preserve not only the final curve outputs but also the curve construction inputs and configuration. Otherwise, a later reviewer may see a curve but not understand how it was built.

    Validation status and input quality

    An audit-ready run should record whether the inputs were validated before pricing.

    This is especially important for uploaded portfolios and market snapshots. Financial input files are not just data. They encode assumptions. They may contain missing fields, malformed dates, unsupported instruments, ambiguous currency mappings, missing fixings, duplicated trade identifiers, or inconsistent conventions.

    A pricing engine should not be the first place where these problems are discovered.

    The run record should therefore include:

    • validation status,
    • validation timestamp,
    • schema version used for validation,
    • semantic validation checks,
    • rejected rows or trades,
    • warnings,
    • non-fatal assumptions,
    • and links to structured validation reports.

    There is an important difference between a failed validation, a warning, and an accepted assumption.

    For example:

    • A missing required maturity date should probably fail validation.
    • A missing optional desk label may produce a warning.
    • A convention fallback should be explicit and visible, not silently applied.

    The audit trail should make these distinctions clear.

    If an input problem was corrected, that correction should produce a new validated artifact and a new run. The previous run should remain intact. This avoids a common source of confusion: old reports silently changing because old inputs were overwritten.

    Convention sets

    Conventions are one of the most important parts of an audit-ready pricing run.

    They are also one of the easiest things to hide accidentally.

    A trade can be represented correctly at the structural level but still produce different results under different conventions. Day-count rules, calendars, business-day adjustments, settlement lags, compounding assumptions, roll rules, fixing calendars, and curve mappings all affect the final output.

    An audit-ready run should therefore record the convention set explicitly.

    It should not rely on internal defaults that are invisible to the user. It should not rely on whatever the current library default happens to be. It should not allow a result to be produced without knowing which convention version was applied.

    The run should include:

    • convention set name,
    • convention set version,
    • market or currency scope,
    • effective date or validity range, if relevant,
    • all overrides applied to the run,
    • and the relationship between conventions and instrument definitions.

    This is not merely a technical preference. It is a trust issue.

    When a user asks why two pricing runs differ, conventions are one of the first places to look. If the run record does not contain convention information, the investigation becomes guesswork.

    Market data and curve construction

    Market data is not a single object.

    A pricing run may depend on discount curves, forward curves, fixings, volatility surfaces, FX rates, credit curves, inflation curves, or other market inputs. Even in a narrow MVP, the market snapshot and curve construction process matter.

    An audit-ready run should preserve:

    • the market snapshot identifier,
    • upload or acquisition time,
    • valuation date,
    • quote sources, if known,
    • curve construction configuration,
    • curve nodes,
    • interpolation choices,
    • extrapolation settings,
    • bootstrapping warnings,
    • missing or stale inputs,
    • and generated curve artifacts.

    The valuation date is particularly important. Many problems that appear to be pricing differences are actually date-context differences.

    A run on the same trade and same nominal market quotes can produce a different result if the valuation date, fixing availability, settlement assumptions, or curve construction setup changes.

    That is why market context should be first-class audit data, not background state.

    Engine and model metadata

    An audit-ready pricing run should record the engine that produced the result.

    This includes both technical and quantitative metadata.

    At minimum, the run should include:

    • analytics engine version,
    • code version or build identifier,
    • container or deployment version,
    • QuantLib version, if QuantLib is part of the engine,
    • model selection,
    • solver settings,
    • numerical tolerances,
    • random seed, if stochastic methods are used,
    • and runtime warnings.

    This is essential because software changes.

    A pricing engine may receive bug fixes, model improvements, performance optimizations, dependency upgrades, or changes to numerical settings. Any of these may affect results.

    If a result changes after a software update, the team should not be forced to guess whether the difference came from the portfolio, the market data, the conventions, or the engine. The run record should show exactly which engine version was used.

    For deterministic workflows, the ideal standard is clear:

    That statement is only meaningful if the run record stores all of those pieces.

    Scenario definition

    If the pricing run includes scenario analysis, the scenario definition must be part of the audit trail.

    It is not enough to say that the run used a “parallel shift” or “stress scenario.” The run should preserve the actual scenario recipe.

    That includes:

    • scenario set name,
    • scenario set version,
    • risk factors affected,
    • shift sizes,
    • shift types,
    • currencies or curves affected,
    • whether shifts are absolute or relative,
    • ordering of transformations,
    • base market snapshot,
    • and aggregation rules.

    Scenario analysis can become difficult to audit because the number of outputs grows quickly. A single portfolio may be priced under many shifts. A scenario run may fan out into many child calculations. Some may succeed, some may fail, and some may need retries.

    An audit-ready scenario run should therefore preserve both the parent run and the child runs.

    The parent run records the scenario set, base inputs, and aggregation logic. The child runs record each individual pricing calculation. The final report links back to the full structure.

    This makes it possible to answer:

    Without that structure, scenario analysis can become a collection of disconnected numbers.

    Outputs and reports

    Outputs should be structured, not just displayed.

    An audit-ready run should preserve machine-readable outputs and human-readable reports. The report is useful for review and communication. The structured output is useful for comparison, integration, replay, and downstream analysis.

    Depending on the run type, outputs may include:

    • trade-level NPV,
    • cashflows,
    • accrued amounts,
    • curve dependencies,
    • PV01 or DV01,
    • Greeks,
    • scenario outputs,
    • portfolio aggregations,
    • bucketed sensitivities,
    • validation warnings,
    • pricing exceptions,
    • and summary metrics.

    Every export should link back to the run that produced it.

    This is important because exports often travel outside the platform. A CSV file may be sent by email. A PDF may be shared with a client. A JSON file may be loaded into another internal system.

    If the exported file does not contain a run reference and metadata manifest, then it can become detached from its origin.

    An audit-ready export should therefore contain enough metadata to say:

    Logs, warnings, and exceptions

    A clean output is not always a clean run.

    An audit-ready run should preserve logs, warnings, and exceptions in a structured way.

    This does not mean exposing every internal log line to the end user. It means recording enough diagnostic information to support later investigation.

    Useful run diagnostics include:

    • validation warnings,
    • missing input warnings,
    • curve construction warnings,
    • fallback assumptions,
    • unsupported trade messages,
    • numerical convergence issues,
    • partial failure information,
    • retry attempts,
    • timeout information,
    • and engine warnings.

    Warnings should not disappear just because the run completed.

    A completed run with warnings is different from a completed run without warnings. Users should be able to see that difference.

    This is especially important for portfolio runs. A portfolio-level summary can look successful even if a small number of trades failed, were skipped, or were priced with assumptions that require review.

    Audit-ready systems should make partial success visible.

    Access, tenant, and actor context

    A pricing run should also preserve who did what.

    At minimum, the run record should capture:

    • organization or tenant,
    • workspace,
    • user or service actor,
    • permissions context,
    • time submitted,
    • time completed,
    • and any automated process that triggered the run.

    For scheduled jobs or API-driven runs, the actor may not be a human clicking a button. It may be a service account, scheduled process, or integration.

    That distinction matters.

    If a report was created manually, users may expect one review process. If it was generated automatically through an API, they may expect another. If the run belongs to a particular workspace or client environment, access to the results must remain scoped correctly.

    Audit-readiness is not only about calculation provenance. It is also about operational control.

    Immutability

    An audit-ready run should be immutable after completion.

    That does not mean mistakes cannot be corrected. It means corrections should create new runs or new artifacts rather than overwriting the original record.

    This principle is simple:

    If a portfolio was wrong, upload a corrected portfolio and run again. If a convention set changed, create a new version and run again. If a report needs a correction, generate a corrected report linked to the new run.

    This avoids ambiguity.

    Overwriting may feel convenient in the short term, but it creates problems later. Users may no longer know which version of a result was sent, reviewed, or used in a decision.

    Immutable run records make the history explicit.

    A practical audit-ready run checklist

    A pricing run is audit-ready when it contains, or links to, the following:

    Run identity

    • run ID,
    • workspace,
    • tenant or organization,
    • actor,
    • run type,
    • timestamps,
    • status.

    Input references

    • portfolio artifact,
    • market snapshot artifact,
    • curve input artifact,
    • fixing data,
    • convention set version,
    • pricing configuration,
    • scenario set, if applicable.

    Validation record

    • schema version,
    • validation status,
    • validation warnings,
    • rejected rows or trades,
    • semantic validation checks,
    • structured error report.

    Market and curve context

    • valuation date,
    • curve construction settings,
    • curve nodes,
    • interpolation and extrapolation choices,
    • bootstrapping warnings.

    Engine metadata

    • analytics engine version,
    • code or container version,
    • library versions,
    • model selection,
    • solver settings,
    • tolerances,
    • runtime warnings.

    Scenario metadata

    • scenario recipe,
    • scenario version,
    • shifts applied,
    • affected risk factors,
    • child run structure,
    • aggregation rules.

    Outputs

    • trade-level results,
    • portfolio aggregation,
    • cashflows,
    • sensitivities,
    • scenario outputs,
    • machine-readable result file,
    • human-readable export,
    • metadata manifest.

    Provenance

    • content hashes,
    • immutable artifact links,
    • logs,
    • warnings,
    • full traceability chain.

    This checklist is not bureaucracy. It is what allows a team to trust the output later.

    The goal is not just compliance

    Audit-ready workflows are sometimes discussed as if they only matter for compliance or external review.

    That is too narrow.

    The same structure helps with ordinary day-to-day work.

    It helps developers debug pricing discrepancies. It helps quants compare model changes. It helps risk users understand why portfolio numbers moved. It helps support teams investigate customer questions. It helps managers know whether a report was complete, partial, corrected, or superseded.

    Auditability is not only a defensive feature.

    It is a productivity feature.

    A team that can reproduce and explain its analytics can move faster because it spends less time reconstructing what happened.

    Closing thought

    An audit-ready pricing run is not just a result table.

    It is a complete, traceable calculation record.

    It captures the identity of the run, the inputs, the validation state, the conventions, the market data, the engine version, the scenario definition, the outputs, the warnings, the actor, and the provenance chain.

    That may sound like extra structure, but in practice it reduces confusion.

    It turns pricing from a one-off calculation into a reliable workflow.

    And for teams that depend on financial analytics, that reliability is often just as important as the pricing model itself.

  • Why pricing is not the hard part: the operational gap in quant analytics

    Many teams can price a trade.

    That is not the same as having a reliable analytics workflow.

    In quantitative finance, the difficult part is often not the mathematical model in isolation. It is everything around it: the market data snapshot, the curve construction inputs, the conventions, the portfolio representation, the run configuration, the scenario definition, the output format, the audit trail, and the ability to reproduce the result later.

    A pricing library, spreadsheet, notebook, or internal script can answer a narrow question:

    A production analytics workflow has to answer a broader set of questions:

    That is the operational gap in quant analytics.

    Pricing is usually only one step in the workflow

    Pricing gets most of the attention because it is the visible output. A user uploads a trade, runs a calculation, and sees an NPV, PV01, cashflow table, or scenario result.

    But before that number appears, many things have already happened.

    The system needs to know what the trade is. It needs to parse the portfolio definition. It needs to validate required fields. It needs to understand currencies, calendars, day-count conventions, roll rules, settlement rules, compounding assumptions, fixings, curve dependencies, and scenario shifts.

    Then it needs to build or select the right curves. It needs to apply the right market snapshot. It needs to submit the job to a compute engine. It needs to handle errors, retries, and partial failures. It needs to store the result in a way that can be inspected, exported, compared, and reproduced.

    The pricing model may be mathematically sophisticated. But from an operational perspective, the pricing call is only one function inside a much larger system.

    For small teams, this is where the pain starts.

    The spreadsheet and notebook stage works — until it doesn’t

    Spreadsheets and notebooks are incredibly useful. They are flexible, fast to modify, and easy to inspect. They are often the right tools for exploration, prototyping, and one-off analysis.

    The problem is that exploratory workflows often become operational workflows by accident.

    A spreadsheet that started as a quick calculation becomes a daily risk report. A notebook that started as a research prototype becomes a recurring valuation process. A script that worked for one portfolio becomes the unofficial engine for multiple desks, clients, or internal stakeholders.

    At that point, the questions change.

    It is no longer enough that the calculation works today. The team needs to know whether the result can be trusted tomorrow, repeated next month, and explained six months later.

    That requires structure.

    Without structure, teams end up depending on fragile manual processes:

    • files copied between folders,
    • market data snapshots renamed by hand,
    • curve inputs edited in spreadsheets,
    • assumptions embedded in code,
    • scenario definitions scattered across notebooks,
    • exports generated manually,
    • and results saved without full provenance.

    None of these problems is necessarily dramatic in isolation. But together they create a workflow that is hard to scale and hard to trust.

    The hidden cost of “it priced successfully”

    A successful pricing result can still be operationally weak.

    The calculation may have completed. The output may look plausible. The NPV may be within an expected range. The report may have been delivered.

    But what does that success actually prove?

    It does not necessarily prove that the input portfolio was valid. It does not prove that the correct convention set was used. It does not prove that the same market snapshot can be recovered. It does not prove that the scenario definition was versioned. It does not prove that the result can be reproduced.

    In production analytics, “the calculation ran” is not enough.

    A better standard is:

    That is a much higher bar.

    It is also the bar that financial analytics software needs to meet if it is going to support real workflows rather than isolated calculations.

    Conventions are part of the product, not an implementation detail

    One reason quant analytics is difficult to operationalize is that many errors are quiet.

    A missing field is easy to detect. A malformed CSV is easy to reject. A failed pricing call is easy to notice.

    Convention errors are harder.

    A trade can be syntactically valid but semantically wrong. A calendar assumption can be different from what the user expected. A day-count convention can be inconsistent. A settlement rule can be implicit. A curve mapping can be ambiguous. A missing fixing can be handled in a way that is technically convenient but financially misleading.

    These issues do not always produce obvious failures. Often, they produce numbers.

    That is why convention handling cannot be treated as a hidden backend detail. It has to be visible, explicit, versioned, and included in the provenance of the run.

    A robust analytics workflow should make it clear which conventions were used and where they came from. It should avoid silent defaults wherever possible. It should prefer explicit configuration over hidden assumptions.

    For users, this matters because the question is not only whether a number was produced. The question is whether the number was produced under the right assumptions.

    Reproducibility is more than rerunning code

    Reproducibility is often discussed as if it means “use the same code again.”

    That is only part of the story.

    In analytics, reproducibility requires the full calculation context:

    • the portfolio version,
    • the market snapshot,
    • the curve inputs,
    • the convention set,
    • the model and engine version,
    • the scenario definition,
    • the solver settings,
    • the generated outputs,
    • the warnings,
    • the logs,
    • and the identity and timing of the run.

    If any of these pieces are missing, the result may be hard to reconstruct.

    This is especially important for portfolio-level analytics and scenario analysis. A single valuation is already context-dependent. A portfolio run with multiple trades, curves, market data inputs, and scenario shifts is even more so.

    Without a complete run record, a team can end up asking: “Why did this result change?” and have no reliable way to answer.

    With a complete run record, the question becomes easier to investigate. Did the portfolio change? Did the market snapshot change? Did a convention set change? Did the engine version change? Did a scenario recipe change? Did an input fail validation? Did the run complete fully or produce partial results?

    The difference is not just technical. It changes how teams work.

    Scenario analysis needs operational discipline

    Scenario analysis is one of the places where the operational gap becomes most visible.

    Defining a parallel shift, twist, basis move, or volatility shock is conceptually straightforward. Running that scenario consistently across a portfolio, storing the result, comparing it with prior runs, and making the output explainable is more difficult.

    A scenario should not just be a set of ad-hoc changes applied to a spreadsheet.

    It should be a reusable recipe.

    That recipe should have a name, a definition, a version, and a clear relationship to the market snapshot and portfolio being analyzed. If the same scenario is run again, the team should know whether it is truly the same scenario or a modified version.

    This becomes more important as the number of scenarios grows. A few manual shifts are manageable. A structured scenario grid across multiple portfolios, currencies, curves, and risk factors needs orchestration.

    Otherwise, the team may be able to run scenarios but not manage them.

    The real product is the workflow

    A useful quant analytics platform is not just a pricing engine exposed through a web interface.

    The real product is the workflow around the engine.

    That workflow should help users move from raw inputs to validated artifacts, from calculation jobs to structured outputs, and from one-off results to repeatable analytics.

    A robust workflow should include:

    • upload and validation of market snapshots and portfolios,
    • clear error reports for rejected inputs,
    • explicit convention sets,
    • controlled curve construction,
    • asynchronous job execution,
    • scenario definitions,
    • structured outputs,
    • exports,
    • audit logs,
    • and immutable run artifacts.

    This is the difference between a calculation tool and an analytics control plane.

    The calculation tool answers a request. The analytics control plane manages the lifecycle of the request.

    Why this matters for smaller teams

    Large institutions often have internal platforms, engineering teams, model governance processes, infrastructure teams, and established risk systems. Even there, operational complexity remains difficult.

    For smaller funds, fintech teams, boutique firms, treasury teams, and consulting groups, the challenge is sharper.

    They may have the quantitative knowledge to price instruments. They may have access to open-source libraries. They may have skilled engineers. But building and maintaining a reliable analytics stack around the pricing library is still a significant investment.

    That stack includes APIs, authentication, storage, validation, job orchestration, reporting, monitoring, security, tenant isolation, and deployment. It also includes the domain-specific work of making conventions, curves, portfolios, scenarios, and outputs behave consistently.

    This is the gap between a library and a platform.

    Libraries are powerful, but they are not complete operating environments. Enterprise platforms are complete, but they can be heavy, expensive, and slow to adopt.

    There is room for a middle layer: focused, API-first analytics infrastructure that helps teams operationalize pricing and risk without building the full stack themselves.

    Good analytics infrastructure should make correctness easier

    Correctness in quant analytics is not only a matter of model implementation. It is also a matter of system design.

    The system should make good behavior easier than bad behavior.

    That means validating inputs before they reach the pricing engine. It means treating user-supplied files carefully. It means avoiding hidden defaults. It means preserving artifacts rather than overwriting them. It means recording the engine version and solver settings. It means making exports traceable to the run that produced them.

    It also means designing for failure.

    Files will be malformed. Portfolios will contain missing fields. Market snapshots will be incomplete. Jobs will fail. Long-running calculations will need retries. Users will ask why two reports differ.

    A mature analytics workflow does not pretend these problems will disappear. It gives teams a structured way to handle them.

    From “can we price it?” to “can we trust the workflow?”

    The first question in any analytics project is usually:

    That question matters. But it is not the final question.

    The more important production questions are:

    This is where many teams feel the gap.

    They are not missing a formula. They are missing an operational layer.

    Closing thought

    Pricing is important. But pricing alone is not enough.

    The hard part is turning pricing into a dependable workflow: one that accepts real inputs, handles real errors, preserves real provenance, supports real reporting, and helps teams trust the numbers they produce.

    That is the operational gap in quant analytics.

    And for many teams, closing that gap may be more valuable than adding another model or another instrument too early.

    The first step is not to build everything.

    The first step is to make the core workflow reliable: upload, validate, configure, run, explain, export, and reproduce.

    That is where quant analytics starts to become infrastructure.