Forward

Designing a 70-tool Claude tool-use catalog

Forward's agent loop calls Claude Sonnet 4.6 with a catalog of 70 tools spanning Procore reads, Autodesk drawing search, OneDrive file retrieval, and field-calculation primitives. Here's what we learned organizing it for production.

No credit card. We never sell or share your email. Unsubscribe with one click.

The shape of the problem

Forward is an SMS / iMessage bot for commercial construction project managers. A foreman texts “status on RFI 142” or “latest revision of sheet A-401” and gets back the answer in 8–15 seconds, sourced from the connected systems (Procore, Autodesk Construction Cloud, Microsoft OneDrive). Every response cites the data it came from inline.

To do this, our agent loop calls Claude Sonnet 4.6 with a catalog of 70 tools. They span:

Why 70 tools, not 1

We could have shipped a single “query Procore” tool that takes a natural-language query and returns whatever Procore data is relevant. That’s the retrieval-augmented-generation (RAG) shape. We tried it first. It failed in three places:

  1. Latency: vector-embedding queries against live Procore data added 2–4 seconds to every response. SMS users tolerate ~10 seconds; the RAG layer alone burned 30% of that budget.
  2. Freshness: a vector store goes stale. Construction data changes hourly — RFI status, daily logs, COR updates. A stale RAG answer is worse than a slow API call.
  3. Citation: the model could produce plausible-looking source citations to documents that didn’t support the answer. With direct tool calls, the citation is structurally the tool response payload, so the model can’t fabricate it.

The 70-tool catalog forces the model to make explicit reads against live Procore APIs. The trade-off: tool descriptions need to be airtight so the model knows when to use which.

Prompt caching is non-negotiable

With 70 tools, the system prompt + tool catalog is ~28K tokens. At Sonnet 4.6 input pricing, every message would cost ~$0.21 in input tokens alone. With Anthropic prompt caching, the system prompt + tool catalog is cached after the first call — subsequent calls in the 5-minute TTL pay the cached read rate (10% of the input rate). After warm-up:

First message (cache miss):     $0.21  input + $0.018 output
Subsequent (cache hit):         $0.021 input + $0.018 output
Cache hit rate in production:   93%
Effective per-message cost:     $0.034
At 100 messages/PM/month:       $3.40 per PM/month

At $3.40/PM/month in inference, the unit economics work even on a $29/seat plan. Without caching, they don’t.

Tool grouping by access pattern

We group tools by which APIs they hit, not by which feature they implement. Within the Procore group:

Three tools for one entity, because the user’s intent shapes the API call. A single tool with optional params would force the model to figure out which params to set; three tools with non-overlapping descriptions make the choice explicit.

Tool descriptions that actually work

Tool descriptions in our catalog follow a template:

<one-sentence purpose>

When to use this: <intent shape — phrased exactly how a user
                  might describe their need>

When NOT to use this: <the adjacent tool that's a better fit,
                        with a one-line distinguisher>

Returns: <field-by-field schema, no surprises>

The “When NOT to use this” line is the most important. The model’s default behavior is to grab the first tool whose description plausibly matches. Telling it explicitly when to not use a tool cut our tool- selection error rate by ~40%.

The project-disambiguation problem

A texter on a multi-project GC says “RFI 142”. Which project’s RFI 142? We solve this before the tool call, in a project-resolver step:

  1. Each phone number belongs to a tenant + a project (a “dedicated project line”) or to a tenant & multi-project pool (a shared line).
  2. Dedicated-line message → project is implicit. Tool calls auto-scope.
  3. Shared-line message → resolver runs a quick classifier: recent project (last 5 messages from this phone number), explicit project tag in the message, or single-project tenant.
  4. If ambiguous, the bot replies asking which project before firing any tool call. Costs one round-trip; cheaper than a wrong answer.

Approval queue for mutations

Forward never writes to Procore from a Claude tool call directly. Mutations go through a draft state:

  1. Model issues e.g. procore_draft_daily_log with the proposed content.
  2. Server creates a draft row, returns a draft ID to the model, and SMSes the PM a one-tap approval link.
  3. PM taps “Approve” on the dashboard or texts back “ok”.
  4. A separate worker process picks up the approved draft and applies it to Procore via the real write API.

This is non-negotiable for production. The cost of an accidental write to Procore (wrong daily log on the wrong project) is much higher than the friction of a one-tap approval.

Failure modes we hit

Practical takeaways

Forward’s demo line is live and answers Procore questions in real time — text any of the queries above to +1 (682) 300-6750. No signup needed. If you’re building something similar over a different vertical API, happy to talk shop; founder email is josh@getforward.xyz.

Try Forward right now

Drop your email above for early access — or skip the form and text +1 (682) 300-6750 from your phone. The live demo answers anything you can ask a project manager in plain English — no signup needed.

No credit card. We never sell or share your email. Unsubscribe with one click.

Forward home · Product · Pricing · Integrations