Executive Summary
The centerpiece is a managed services platform for implementation and support work around systems of record.
Salesforce is the concrete example because it is familiar, messy, and operationally important. The broader target is any complex, constantly shifting system of record that cannot go down and has to be changed with care. It could be an ERP, billing system, support platform, loan origination system, warehouse system, data platform, or internal operations tool. The pattern is the same: the business runs on it, the work never stops, and careless automation can create more damage than value.
The platform is a cooperative system for humans and agents working together. It takes in real managed-services work, enriches the request with customer context and prior knowledge, routes tasks, drafts analysis or fixes, asks for approval at the right moments, runs checks, logs decisions, learns from feedback, and gives the team visibility into what is happening.
Chatbots and demo agents are too small a frame. The useful thing here is an operating layer for implementation work, support queues, data migrations, release readiness, and a more accessible software delivery process.
The reason to start with systems of record is simple. If the loop works there, it can work in easier places too. These systems have old decisions, active users, permission boundaries, integrations, release windows, customer pressure, and a real cost when something breaks. That is where an agent system has to earn trust instead of just looking impressive.
Normal Development Choices
Start with the boring pieces: existing auth, the existing ticketing or intake source, a small database schema, a typed API, a simple queue UI, an event stream, and repo-native checks. Add agents only after the work object, system context, approval path, and validation loop are clear.
For a Salesforce team, the system context is org metadata, releases, permissions, integrations, and baseline snapshots. For another team, the same slot might be ServiceNow configuration, NetSuite scripts, Zendesk routing, warehouse workflows, claims data, billing rules, or internal product telemetry. The operating model stays the same even when the source system changes.
1. Why This Fits Cooperative Systems
Managed services work is a good proving ground because it is close to customers, full of repeatable operational work that generally falls into a few dozen categories, and risky enough that being correct matters. A team is constantly asked to explain the system, fix production issues, clean up old decisions, prepare releases, migrate data, answer support questions, and keep the customer moving.
That gives the platform a useful shape:
- The work is real and measurable.
- The users are close enough to give fast feedback.
- The tasks cross support, sales, finance, IT, implementation, and engineering.
- The source of truth is important enough to require careful permissions, audit trails, and approval gates.
- The backlog is large enough that agents can help, but ambiguous enough that humans still own judgment.
The consultant role moves up the stack. Agents can handle more of the inspection, drafting, comparison, testing, and documentation. Consultants own diagnosis, architecture, customer trust, change sequencing, and the decision about what work deserves to exist.
2. Target Workloads
A first useful version supports work a real managed-services team already does every week.
Support Request Triage
Classify incoming requests, identify missing context, connect the request to known system areas, and decide whether it is a bug, training issue, config change, data issue, or architecture problem.
Proof metric: faster first response and fewer tickets bouncing between people.
System Scan and Explainer
Pre-run safe scans, explain objects and automation, identify land mines, and keep a current-state map ready before recommending changes.
Proof metric: new consultants can understand the system without depending on one person with tribal knowledge.
Tech Debt Work Queue
Turn stale fields, brittle flows, weak tests, permission drift, and deployment friction into prioritized, reviewable work.
Proof metric: the backlog gets cleaner and future fixes get easier.
Release Readiness
Prepare deployment notes, check (and then enhance) tests, validate rollback paths, review impacted metadata, and flag unresolved risk before a change ships.
Proof metric: fewer surprises during deployment and faster movement from finding to reviewed fix.
Migration Readiness
Map source and target systems, field dependencies, transformation rules, validation checks, and the repeatable path for loading data into the target system.
Proof metric: migration risk is visible before cutover pressure starts.
Customer Escalation Packets
Summarize the issue, evidence, suspected root cause, impacted workflows, proposed next step, and decision needed from the customer or delivery lead.
Proof metric: escalations arrive with enough context for a decision.
3. Reference Architecture
The platform is a control plane around managed-services work. It can sit beside Jira, Slack, Salesforce, GitHub, ServiceNow, Zendesk, NetSuite, and the customer’s source systems on day one. Its job is to make the work visible, bounded, reviewable, and easier to improve.
flowchart LR
Intake["Work intake: tickets, emails, Slack, backlog, customer requests"]
Queue["Managed services work queue"]
Context["Customer context and knowledge layer"]
SystemScan["Pre-run safe system scan"]
Agents["Agent workers"]
Humans["Consultants, architects, delivery leads"]
Tools["Tool gateway"]
Repo["Repo, tests, docs, CI/CD"]
SystemOfRecord["System of record: CRM, ERP, support, billing, ops"]
Evals["Evals and regression sets"]
Traces["Traces, audit, cost, approvals"]
Dashboard["Visibility dashboard"]
Intake --> Queue
Queue --> Context
SystemOfRecord --> SystemScan
SystemScan --> Context
Context --> Agents
Agents --> Tools
Tools --> Repo
Tools --> SystemOfRecord
Agents --> Humans
Humans --> Queue
Humans --> Tools
Repo --> Evals
Tools --> Traces
Agents --> Traces
Humans --> Traces
Evals --> Dashboard
Traces --> Dashboard
Queue --> Dashboard
Core Platform Primitives
| Primitive | What it does | Reason |
|---|---|---|
| Intake classifier | Tags work by customer, workflow, system area, urgency, risk, and likely owner. | Keeps the queue from becoming a junk drawer. |
| System context layer | Stores scan output, diagrams, implementation notes, prior decisions, metadata or configuration, and current risks. | Gives agents and humans the same starting point. |
| Agent task runner | Runs bounded jobs: explain, compare, draft, validate, test, summarize, or prepare. | Turns repeated work into observable units. |
| Tool gateway | Provides narrow access to configuration, metadata, repos, docs, tickets, and approved systems. | Keeps automation useful without giving it uncontrolled reach. |
| Human review layer | Assigns approvals, escalations, QA, and architecture review. | Preserves judgment and customer trust. |
| Evaluation layer | Replays golden examples, boundary cases, tool-call checks, and release gates. | Shows whether the system is getting better or just louder. |
| Visibility dashboard | Shows queue health, agent output, review load, risk, escalations, latency, cost, and outcomes. | Lets a manager run the hybrid workforce instead of guessing. |
4. Kicksights Implementation Notes
Why Kicksights Is Here
Kicksights is the real place I am working through this problem. I am using Salesforce first because it has the right constraints: metadata, permissions, stale automation, integrations, migrations, release risk, and users who still need the system to work while it is changing.
That makes it a useful stress test. The broader use case is a team inheriting a messy system of record and needing a safer way to understand it, support it, improve it, and sometimes migrate away from it.
Kicksights already has pieces of that shape. The finished cooperative platform still needs more work, but the first slice is useful because it turns the idea into real product primitives: an authenticated managed-services workspace, persisted requests, system baseline references, work-order drafts, event history, and a path back to overview evidence.
The Salesforce names below are implementation labels, not requirements. In another domain, managed_requests and managed_request_events stay almost the same. salesforce_work_orders becomes delivery_work_orders, org_connections becomes system_connections, and baseline_snapshot_id points to the current safe map of whichever system the team supports.
What Kicksights Already Models
| Piece | Kicksights shape | Reason |
|---|---|---|
| Managed-services workspace | /managed-services lets a user open a support request, choose work type, priority, delivery risk, next reviewer, and optional pinned system baseline. |
Intake becomes structured enough for humans and agents to work from the same object. |
| Request types | Admin/config work, automation fix, integration, data migration, reporting, security, release promotion, and discovery scoping. | The queue starts with the categories managed-services teams actually see. |
| Lifecycle state | Submitted, triaging, needs clarification, scoped, queued for build, building, validating, ready for review, approved, promoting, done, blocked, canceled. | The platform can show where work is stuck and who owns the next move. |
| Next actor | Intake, blueprint, architect, builder, QA, release, migration, or client. | Early versions can route work before every agent worker is fully automated. |
| System baseline | A request can pin a baseline snapshot, or the service can attach the latest ready baseline when one exists. | Delivery work is grounded in the current system shape instead of memory or Slack archaeology. |
| Work orders | Draft work orders carry scope, out-of-scope items, affected components, acceptance criteria, validation plan, rollback plan, artifacts, and audit references. | A work order becomes the contract between intake, architecture review, build, QA, and approval. |
| Event stream | Request submitted, updated, clarified, approved, rejected, and work-order drafted events are stored with payload metadata. | Corrections and decisions become product signal instead of disappearing into chat. |
| Auth and data boundary | The worker validates Supabase JWTs, org membership is resolved server-side, and Supabase RLS filters org-owned records. | The managed-services queue can be shared without leaking another customer’s work. |
The important implementation choice is durable state before autonomy. Once the request, baseline, events, and work-order contract exist, agents have something safe and specific to operate on.
flowchart LR
UI["UI: /managed-services"]
Auth["Auth: Supabase session and org claims"]
API["FastAPI: managed request routes"]
DB["Supabase: managed_requests"]
Events["Supabase: managed_request_events"]
Baseline["Baseline snapshot"]
WorkOrder["Supabase: work order table"]
Review["Consultant review"]
Checks["Tests, evals, smoke, deploy checks"]
UI --> Auth
Auth --> API
API --> DB
API --> Events
DB --> Baseline
DB --> WorkOrder
WorkOrder --> Review
Review --> Events
Review --> Checks
Checks --> Events
Kicksights-Style Technical Stack
| Layer | Concrete implementation pattern |
|---|---|
| Frontend | Next.js and React workspace with typed API normalization, status buckets, routing controls, technical references, activity history, and work-order display. |
| Worker | FastAPI routes for create/list/get/update managed requests, list events, draft work orders, record clarifications, approve/reject requests, create org connections, and pin baselines. |
| Database | Supabase Postgres tables for org_connections, managed_requests, managed_request_events, and salesforce_work_orders, with indexes around org/status/baseline/request lookups. |
| Access control | Supabase Auth, JWT claim resolution, server-side org resolution, and RLS policies for org-scoped visibility. |
| System context | Org overview and OKG snapshots are the Kicksights baseline context layer. In another system, this can be configuration snapshots, dependency maps, workflow traces, or knowledge graph output. A request can carry baseline_snapshot_id so the team knows which system picture informed the scope. |
| Reliability | Repo-native commands for build, worker tests, local route smoke, staging deploy, production deploy, org-overview refresh/retry/cancel/promote, and prompt rollout checks. |
| Evals | Offline org-spec and prompt-contract evaluators with explicit thresholds before stricter blocking behavior is enabled. |
| Live verification | Local smoke can use expected auth-gated states, but protected staging/prod flows need real Supabase sessions because the worker validates JWTs. |
API Contract Shape
| Endpoint shape | Purpose |
|---|---|
POST /managed-requests |
Create the request, resolve org/user context, attach the latest baseline when available, set risk and next actor, and write request.submitted. |
GET /managed-requests |
List org-scoped work with status filtering, limit, and offset. |
GET /managed-requests/{request_id} |
Hydrate one request with child events and work orders. |
PATCH /managed-requests/{request_id} |
Update queue state, priority, risk, next actor, scope fields, artifacts, and audit references. |
GET /managed-requests/{request_id}/events |
Return the chronological decision and activity stream. |
POST /managed-requests/{request_id}/clarifications |
Record a question or answer and move the request between needs_clarification, triaging, and the next baseline step. |
POST /managed-requests/{request_id}/work-orders |
Generate the draft delivery object with scope, validation, rollback, affected components, and audit refs. |
POST /managed-requests/{request_id}/approve |
Move the request to approved or blocked and set the next actor for builder or architect review. |
POST /org-connections |
Register source, target, sandbox, UAT, archive, or client-managed systems. |
POST /org-connections/{org_connection_id}/baseline |
Pin an org connection to a baseline snapshot and status. |
One-Layer-Deeper Build Sequence
- Create the work object first. Define a
managed_requestsrecord withorg_id,type,priority,status,title,business_goal,risk_level,next_required_actor,baseline_snapshot_id,artifacts,audit_refs, andintake_payload. - Add an event stream immediately. Every submit, update, clarification, approval, rejection, work-order draft, validation result, deploy attempt, and customer decision writes an event. The event stream becomes the memory of the system.
- Pin system context before action. Attach the latest org baseline when possible. If the request touches migration, integration, security, or release promotion, treat missing baseline context as medium or high risk until a human confirms otherwise.
- Generate work orders instead of loose recommendations. A work order needs scope, out-of-scope items, affected components, acceptance criteria, validation plan, rollback plan, artifacts, and audit references. That is the handoff object an agent, consultant, QA reviewer, or release manager can share.
- Use agent workers as state transitions. The intake agent classifies and requests missing detail. The blueprint agent attaches baseline evidence. The architect agent drafts scope. The builder agent proposes repo, metadata, or configuration changes. The QA agent runs checks. The release agent prepares promotion and rollback evidence.
- Keep writes approval-gated. Agents can draft, compare, summarize, test, and prepare. Production writes, configuration changes, metadata deploys, customer messages, contract decisions, and data movement wait for human approval.
- Bake evals into release. Run golden request examples, unsafe-tool-call cases, metadata-boundary cases, and reviewer-correction cases before new agent behavior becomes trusted.
- Expose manager-level visibility. The dashboard shows request counts by status, stuck work, next actor, baseline coverage, reviewer burden, correction rate, eval failures, cost, latency, and customer-facing risk.
That is the bridge to cooperative systems work. Calling a model is the easy part. The harder work is creating the work object, the context layer, the permission boundary, the review path, the eval loop, and the feedback signal so humans and agents can move real work without creating chaos.
5. Human and Agent Operating Model
Agents are useful when the work is bounded. Humans are useful when the work needs judgment, context, sequencing, taste, or trust.
| Work type | Agent role | Human role |
|---|---|---|
| Support triage | Classify, summarize, find related evidence, identify missing context. | Decide priority, customer tone, owner, and escalation path. |
| System analysis | Scan metadata or configuration, group findings, cite evidence, mark unknowns. | Decide what matters, what to ignore, and what to ask the customer. |
| Implementation prep | Draft code changes, test additions, deployment notes, and rollback steps. | Review architecture, approve scope, merge only after checks pass. |
| Migration planning | Draft field maps, identify validation needs, list unknowns. | Own source-of-truth decisions and acceptable tradeoffs. |
| Knowledge updates | Suggest SOP changes from repeated tickets and corrections. | Approve durable policy and process changes. |
| Customer communication | Draft clear explanations and escalation packets. | Send the message, own the relationship, and make the call. |
The operating rule is simple: agents can move work forward. Production changes, compliance decisions, missing evidence, and customer commitments stay under human control.
6. Learning Loop
A cooperative platform improves because the work is instrumented. The team can see where agents helped, where they failed, where humans corrected them, and where the process itself is broken.
sequenceDiagram
participant Customer as Customer or operator
participant Queue as Work queue
participant Agent as Agent worker
participant Consultant as Consultant reviewer
participant System as System of record
participant Evals as Eval and knowledge loop
Customer->>Queue: Submit request or issue
Queue->>Agent: Assign bounded task with context
Agent->>Agent: Retrieve evidence and draft next step
Agent->>Consultant: Send result with confidence and sources
Consultant->>Agent: Approve, correct, or escalate
Consultant->>System: Apply approved change or response
Consultant->>Evals: Record correction and outcome
Evals->>Queue: Update examples, SOPs, risk rules, and routing
The correction loop matters more than the first answer. If a consultant rejects an output, the system captures why: bad context, missing tool, stale SOP, weak prompt, permission issue, poor judgment, or a true edge case. That turns review work into product signal.
7. Evaluation Framework
Measure the actual work instead of a random benchmark. Start with a clear definition of what good looks like, then turn that into a repeatable scoring path. A simple rubric can use -1, 0, and 1: harmful or wrong, incomplete, useful enough to trust after review.
| Evaluation Area | What to Test | Pass Standard |
|---|---|---|
| Task success | Did the agent complete the requested support, scan, release, or migration prep task? | The output is useful enough for a reviewer to act on. |
| Evidence quality | Did it cite metadata, configuration, docs, tickets, or repo paths instead of guessing? | Major claims have sources or are marked unknown. |
| Tool correctness | Did it call the right tool with safe arguments? | No unsafe record access, broad scraping, or unavailable tool invention. |
| Permission adherence | Did it respect customer, system, repo, and data boundaries? | Sensitive actions stop at approval gates. |
| Escalation behavior | Did it ask for help when confidence was low or risk was high? | Risky cases reach a human with a clear reason. |
| Regression control | Did a fix break known good behavior? | Golden cases and release checks stay green. |
| Reviewer burden | Did the system save time without creating cleanup work? | Reviewer corrections go down over time. |
| Cost and latency | Is the workflow worth running repeatedly? | Runtime, cost, and review time fit the value of the task. |
8. Pilot Plan
Start with one serious managed-services workflow: support triage plus escalation packet generation for a customer running a system of record. Salesforce is a good first example, but the pilot proves the operating model rather than the CRM-specific vocabulary.
Days 0-30: One Queue, One Customer, Real Evidence
Objective: make the first support workflow visible and reviewable.
- Connect one intake source and one customer workspace.
- Run the safe system scan and create the first customer context layer.
- Classify incoming work by system area, risk, urgency, and missing context.
- Generate escalation packets for human review.
- Track corrections, approval decisions, time saved, and unresolved unknowns.
Exit criteria: the team can see the queue, trust the evidence, and review agent work without digging through scattered context.
Days 31-60: Add Implementation and Release Support
Objective: connect support findings to real delivery work.
- Turn recurring issues into scoped backlog items.
- Add repo-native checks for tests, docs, deployment notes, rollback, and UI smoke where needed.
- Create eval cases from accepted and rejected support outputs.
- Add approval gates for code changes, configuration changes, metadata changes, and customer-impacting actions.
Exit criteria: repeated problems turn into reviewed fixes, and every change has a validation path.
Days 61-90: Make It Repeatable
Objective: turn one customer workflow into a managed-services operating model.
- Package reusable intake, scan, escalation, eval, and release patterns.
- Add dashboard views for queue health, risk, reviewer load, cost, latency, and outcome quality.
- Expand to a second customer or a second workflow only after the first loop is trusted.
- Document where the platform generalizes beyond Salesforce.
Exit criteria: the team has a repeatable human/AI delivery loop that can move to another customer or workflow.
9. Engineering Manager Operating Model
The engineering manager role here is still hands-on: shaping the primitives, owning reliability, reading traces, talking to users, reviewing architecture, and making sure the team is learning from real work.
Weekly Cadence
- Review queue health, escalations, and reviewer burden.
- Pick the smallest useful product improvement from real operator pain.
- Review failed evals and rejected agent outputs.
- Turn repeated manual review into a better tool, prompt, SOP, or guardrail.
- Keep the platform boring where it needs to be boring: logs, permissions, deploys, rollback, tests, and ownership.
- Ship small improvements frequently enough that users keep giving honest feedback.
Team Shape
| Role | Ownership |
|---|---|
| Engineering manager | Architecture, team focus, user loop, reliability, prioritization, and hands-on technical review. |
| Full-stack/product engineer | Queue UI, dashboards, workbench surfaces, workflow state, and operator experience. |
| AI/platform engineer | Agent workflows, tool gateway, traces, evals, guardrails, and model integration. |
| Systems/data engineer | Metadata ingestion, knowledge layer, reporting, audit, cost, and system integrations. |
| Embedded consultant or ops lead | Workflow diagnosis, acceptance criteria, customer context, and feedback quality. |
10. Proof Package
The public artifact shows the platform through one realistic workflow:
- A managed-services request arrives for Salesforce or another system of record.
- The platform classifies the request and finds the relevant system context.
- An agent drafts an escalation packet with evidence, confidence, unknowns, and proposed next action.
- A consultant reviews, corrects, and approves the packet.
- Repeated findings become a backlog item, test, SOP update, or release task.
- The dashboard shows queue health, agent performance, reviewer load, cost, latency, and unresolved risk.
The proof generalizes cleanly. Swap Salesforce for another system of record and the same cooperative pattern still works: intake, context, bounded agent work, human review, safe tools, evals, traces, and a feedback loop that makes the whole operation better over time.