A Clinical Operating System for Agentic Trial Operations

Architecture, Integration, and Operational Design

1 The Case for a Clinical Operating System

Clinical trials consume more than 60% of drug development time and cost—$1–2.6 billion (including opportunity cost) and 10–15 years per approved drug. The clinical development operations market reached $89 billion in 2024 and is projected to double to $176 billion by 2034 (Polaris Market Research, 2025). Yet the workforce running these trials is strained: thousands of positions are unfilled in the US, fully burdened employee costs exceed $200,000 and rising, and median tenure is under 3 years. A Phase III trial involves thousands of tasks, 50+ vendors, and hundreds of staff across a dozen disconnected platforms. Complexity has simply outpaced human capacity.

When site engagement slips, the costs compound quickly. Enrollment delays cost $500,000 or more per day in lost peak sales, with blockbuster drugs losing several million per day; roughly 70% of trials are delayed; and manual document management generates the kind of rework that consumes most of inspection preparation time.

Major vendors—Medidata, Veeva, IQVIA—have responded with real improvements: document classifiers, enrollment predictors, anomaly detection. But each optimizes within its own platform. The coordination work that lives in the seams between systems—chasing documents, reconciling data, tracking who owes what—remains a human job. Past tools also struggled because they asked clinical teams to change how they work. Nytel is built to fit existing workflows, not replace them.

Nytel is designed to turn clinical trial SOPs into executable, audit-ready AI workflows—one clinical operations expert and an agentic system running trial execution around the clock, with humans supervising rather than manually driving every step.

A common question is whether clinical trials really need a purpose-built ontology when other complex industries, like aviation, manage without one. The comparison actually argues the other way. Airlines run on systems that are already fully digitized: decision logic is narrow and deterministic, encoded directly into software. Clinical trials run on regulatory text—SOPs and Working Practices in natural language, full of conditional rules (“if the site does not respond within 10 business days, escalate”), expiry constraints (“GCP training must be current within 3 years”), and cross-document dependencies (“sub-investigators on the FDA 1572 must appear on the Delegation of Authority Log”). You can’t hardcode that; you need an intermediate representation (SOP-IR) that preserves regulatory intent, stays editable by humans, and can be executed by agents. The ontology isn’t extra complexity—it’s the minimum structure needed to make a PDF into a workflow.

2 Architecture: One Agent or Many?

Every agent action follows the same three-step pattern: Retrieve data from clinical systems; Analyze using deterministic logic and code—not generative LLMs—for decisions that can be audited; Create regulator-ready outputs. Whether the agent is reviewing an incoming CV or drafting an escalation email, the structure is the same.

The choice between a single agent and a multi-agent pipeline is worth thinking through carefully. A single agent is fast to build and easy to deploy—no orchestration layer, no handoff contracts. The downside is that debugging is harder: when something goes wrong, you have only the final output to work from and no intermediate artifacts to inspect. A multi-agent pipeline produces structured outputs at each stage, making failures much easier to localize. The tradeoff is coordination overhead and more moving parts to maintain.

For the MVP, a single agent covering the full TMF review workflow is the right starting point. The workflow is well-defined, document volume is modest, and the faster iteration cycle matters more early on. Moving to a pipeline makes sense when specific failure modes emerge—not before.

A production deployment for a full Phase III trial calls for roughly 5–8 specialized agents sharing a common knowledge layer: the SOP-IR ontology, a site status database, and message history. They communicate via structured handoffs, not natural language.

Table 1: Agent roles for full trial coverage.

Tier	Agent	Function
Ingestion	Ingestion	Email, OCR, staging
Processing	Classification	TMF matching, extraction
	Verification	ALCOA++, expiry, cross-refs
Orchestration	Resolution	Issue tracking, escalation
	Follow-up	Email drafting, templates
Operations	Monitoring	Site visits, data review
	Safety	AE/SAE detection, reporting
	Protocol Deviation	Deviation detection, CAPA

On cost: classification and extraction—the high-frequency, lower-stakes work—runs well on smaller, cheaper models (Haiku/Flash tier) and accounts for 70–80% of LLM calls. Verification and draft generation, where reasoning quality matters, warrants frontier models (Sonnet/Opus). The savings from this split are substantial, though the exact figures depend on volume and how complex the reasoning tasks turn out to be in practice.

3 Integration Strategy

Nytel is designed to sit above CTMS, eTMF, and EDC as an orchestration layer—reading state from those systems and writing actions back, without replacing them. Integration unfolds in two phases. Phase 1 runs through email and a standalone UI: no API credentials needed, no IT procurement, no changes to partner systems. Phase 2 moves to direct API integration once the workflows have been validated and sponsors are ready to grant access—a process that typically takes weeks to months in enterprise environments and should not be assumed to happen quickly.

System	Mode	Data Objects
eTMF	Read-only	Doc metadata, artifact status, TOC, zones
CTMS	Read + write	Activation dates, visits, contacts, milestones
EDC	Read-only	CRF status, query counts, data lock
Email	Read + send	Doc submissions, coordinator comms

In practice, eTMF read-only means replicating what a CRA does in a periodic review: pull the artifact inventory for a site, compare it against the expected checklist for the current milestone, apply quality rules, and generate a gap report. The minimal API surface for Veeva Vault covers artifact listing, metadata retrieval, and status updates; Vault Query Language (VQL) is the preferred interface for more structured queries. Where APIs don’t exist, email-based submission with a parallel Nytel document registry is the fallback—UI automation is not, because it breaks with every platform update.

CTMS systems track at site level (visit dates, enrollment counts, contacts); Nytel is designed to operate at milestone level (Pre-SIV, Site Activated, First Patient In). Bridging these views requires a per-sponsor mapping table that translates vendor-specific field names into Nytel’s milestone vocabulary. It’s a one-time configuration, not an ongoing burden—but it does need to be validated against the sponsor’s actual CTMS setup, since field naming varies even within the same vendor’s platform.

Whether integration is working can be assessed on three questions: Are all the data objects the agent needs accessible without manual entry? Is the data fresh enough that decisions aren’t made on stale state? Do status updates actually propagate back to the system of record?

4 Operations: Follow-Up, Escalation, and Quality

The target for site activation—based on early pilots—is a 50% reduction in median time from CDA executed to site activated, with a parallel reduction in the staffing required. Real-time inspection readiness, meaning the TMF is always current rather than assembled in a rush before an audit, is the third goal. These are targets under favorable conditions: cooperative sites, reliable document submissions, and working integrations. They are not guarantees.

The follow-up cadence is designed to keep pressure on without overwhelming coordinators:

Day	Action	Recipient
0	Initial request	Site Coordinator
3	First follow-up	Site Coordinator
7	Second follow-up	Site Coordinator + Field Monitor CC
14	Third follow-up	Country Monitor escalation
21	Blocked flag	Study Lead alert

Three things prevent this from becoming noise: outstanding items for the same contact are batched into a single weekly message; reminders pause when a query is already awaiting site response; and the study team can flag a site as quiet (coordinator on leave, site shutdown). Critically, the clock resets on any response—a coordinator who sends a document that then fails QC gets a correction request, not another missing-document reminder.

Time isn’t the only signal. Nytel is designed to build a site responsiveness profile from historical data—typical response times, known holiday calendars, patterns in coordinator turnover—and adjusts escalation windows accordingly. A site that reliably responds in 48 hours doesn’t need the same pressure as one that takes ten days. The hard exception is safety: SAE acknowledgment outstanding beyond 24 hours, or a protocol deviation unresolved after 10 business days, skips the ladder entirely and goes directly to the Study Lead and Medical Monitor.

4.1 Quality

QC rules are layered. Universal rules cover every document—ALCOA++ standards, GCP training currency (3 years), required signatures. Document-type rules add specifics: CV currency (2 years), license validity through study end, certified copy stamp for scanned originals. Sponsor overrides sit on top—stricter thresholds, custom templates, additional required fields—configured per sponsor without touching the underlying rule engine. In practice, the majority of checks are universal; overrides handle the edge cases.

For scans, an OCR confidence below 0.80 triggers human review. That’s where the risk of a missed signature or misread date exceeds the cost of a human look. This threshold will need calibration per document type and will drift as OCR models improve—treat it as a starting point. The review queue should show the original scan, the OCR output, and the specific fields that triggered the flag. A reviewer who can see exactly what the system wasn’t sure about can resolve most cases in under a minute.

5 Product Direction

The entry point is site activation—the stretch from CDA execution to a site going live. It’s where timelines slip first, where the document-chasing burden is most acute, and where a measurable KPI (CDA-to-activated time) makes success easy to demonstrate. The worry about scope is reasonable: a full Phase III trial spans hundreds of SOP steps across 15–20 documents. The system doesn’t try to automate all of it upfront. A small set of TMF document types—investigator CVs, medical licenses, GCP certificates, FDA 1572 forms, and delegation logs—drives the majority of QC work during activation. Each workflow added after that reuses the same ontology objects, so the marginal effort drops as the system matures.

From there, the natural expansion is TMF readiness → site monitoring → issue triage and CAPA. Institutional knowledge—which sites are slow, which document types generate the most corrections, which sponsors have unusual requirements—accumulates as data rather than disappearing with every staff change.

Protocol drafting is sometimes posed as a choice: does the system ask structured questions and then write the protocol, or does it generate text from a decision-tree representation? These aren’t alternatives—they’re two stages of the same process. A structured intake wizard captures the trial parameters (type, indication, phase, endpoints, population, regulatory pathway) and produces a machine-readable specification. The LLM takes that specification and writes the protocol narrative, constrained to the input schema and house style. The wizard enforces completeness; the LLM handles readability; the machine-readable output means the protocol can drive downstream agent execution without re-parsing it. This won’t work for every case—experienced writers revising a prior version may find the wizard unnecessary—and template-based drafting from an existing protocol is the obvious extension.