Engineering AI Agents for the Coffee-Stained World of Logistics

Nilkanth Patel, Head of Product Engineering

7min read

It's 6AM at a shipping warehouse in Cincinnati, Ohio. Maria, a receiving dock worker, watches as the first truck of the day backs up to the loading bay. The driver hands her a crumpled Bill of Lading (BOL) covered in coffee stains and handwritten notes. She squints at the faded text, cross-references the shipment details with her clipboard manifest, then walks over to her computer to log the delivery into the Transportation Management System (TMS). The whole process takes about 15 minutes per shipment, and she'll repeat this dance dozens of times before her shift ends.

This scenario plays out tens of thousands of times daily across logistics operations worldwide. What makes Maria effective isn't just her ability to read documents or click buttons – it's how she orchestrates multiple tasks together by following established workflows, remembering context from previous shipments, using various tools (from her eyes to computer systems), and applying reasoning when something doesn't quite look right.

At Pallet, we're building AI agents that can replicate this workflow with the same level of sophistication that Maria brings to her job. But these agents are only as good as their ability to interact with the real world. That’s why we spent months building the infrastructure for "actions" – the digital equivalent of Maria's physical toolkit.

Making Agents Intelligent

Before diving into the technical details, let's establish what we mean when we talk about AI agents. Many of these principles are derived from the excellent essay by the Anthropic team on building AI agents. We think of them as digital workers with four core capabilities that mirror human cognitive abilities:

Workflows represent the procedural knowledge that Maria has developed over years of experience. She knows that BOL verification comes before inventory updates, which comes before notifying the warehouse team. Our agents encode these same sequential processes, but they can execute them with more and more consistency over time.
Memory allows Maria to remember that Acme Shipping always uses non-standard BOL formats, or that Warehouse Zone C has been temporarily closed for maintenance. Our agents maintain both short-term context (details about the current shipment) and long-term knowledge (patterns learned from thousands of previous interactions).
Tools are Maria's interfaces to different systems – her computer terminal, barcode scanner, phone, and even her eyes for visual inspection. Our agents have digital equivalents: API integrations, browser automation, email systems, and computer vision capabilities.
Reasoning is perhaps the most crucial capability. When Maria encounters a BOL with mismatched quantities, she doesn't just blindly enter the data. She investigates, cross-references other documents, and makes informed decisions. Our agents apply similar logical processes to handle edge cases and exceptions.

The magic happens when these four capabilities work together seamlessly. That's where CoPallet comes in.

Actions: Building Blocks for Intelligence

In CoPallet (our AI agents product), we call tool interactions "actions" because they represent discrete, purposeful activities that agents can perform. Unlike simple API calls or database queries, actions are sophisticated operations that can span multiple steps and handle complex real-world scenarios.

Consider what happens when an agent needs to update a shipment status in a legacy TMS. This isn't just a single API call – it's a complex, multi-step process:

Our actions supports several categories of operations:

Communication protocols like email processing, SMS notifications, and other messaging workflows.
API integrations connect with modern logistics platforms through RESTful interfaces. These actions handle authentication, rate limiting, error recovery, and data transformation between different systems' schemas.
Browser navigation tackles the reality that much of logistics infrastructure runs on legacy web applications. Our agents can authenticate with complex session management, navigate multi-step forms, and extract data from visually-rendered content.
Domain-specific connectors provide out-of-the-box integration with common logistics software like TMSes, CargoWise, DAT load boards, and carrier portals. These actions encapsulate years of domain expertise about how logistics actually works in practice.
Digitization uses LLMs to understand documents semantically rather than just extract text like traditional OCR, enabling true comprehension that can validate business rules, reconcile conflicting information, and extract structured data even from damaged or poorly formatted documents.

Fighting Chaos with Structure

The biggest challenge in building reliable AI agents isn't the AI part – it's managing the inherent unpredictability of agent behavior. Unlike traditional software where inputs and outputs are precisely defined, agents operate in a world of probabilistic reasoning and dynamic decision-making.

Our solution is rigorous type safety through Zod schemas. Every action defines strict input and output contracts that constrain agent behavior within predictable boundaries. Here's a simplified example of how we structure a single step within an action:

// Each step has a well-defined configuration schema
const BROWSER_ACTION_CONFIGURATION_SCHEMA = z.object({
  type: z.enum(['click', 'fill', 'navigate', 'prompt']),
  target: z.union([
    ELEMENT_TARGET_SCHEMA,  // DOM element selector
    STEP_OUTPUT_TARGET_SCHEMA  // Reference to previous step output
  ]),
  value: z.union([
    z.string(),  // Static value
    STEP_OUTPUT_TARGET_SCHEMA  // Dynamic value from previous step
  ]).optional(),
  transformations: z.array(TRANSFORMATION_SCHEMA).optional()
  prompt: z.string().describe('Prompt to supply the LLM for additional context').optional().
});

This schema-driven approach means that even though agents can make dynamic decisions about what actions to take, each individual action operates within well-defined guardrails. It's like giving Maria a detailed checklist for each type of shipment while still allowing her to adapt to unique situations.

Engineering for the Long Haul

Behind the scenes, we've built CoPallet on an event-driven architecture that treats agent execution as a series of asynchronous operations. This design choice reflects the reality of logistics workflows – they're inherently distributed, often long-running, and need to coordinate across multiple systems with varying response times.

Our action orchestration system can dynamically allocate compute resources based on the complexity of each operation. Simple data lookups run on lightweight workers, while complex document processing tasks get routed to GPU-accelerated instances. This approach keeps our system responsive while managing costs effectively.

Observability is built into every layer of the stack. We use OpenTelemetry to trace every network request, decision point, and state transition. When an agent execution goes wrong, our engineers can replay the entire sequence of events with full context about what the agent "was thinking" at each step.

This level of traceability addresses what Anthropic identifies as a core challenge in agent development: "They often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug." Our approach prioritizes transparency at every level.

The Future of AI in Logistics

We’ve briefly discussed a few key foundational elements of our AI agents work so far: action orchestration that dynamically allocates compute resources based on task complexity, comprehensive observability through OpenTelemetry, and event-driven architectures that handle the inherently distributed and long-running nature of logistics workflows.

But this is just the beginning. We’re hard at work on several new frontiers, like human-in-the-loop systems that evolve beyond simple escalation points and drive true collaborative intelligence. We're exploring how to make document parsing even more robust by training models specifically on logistics documents and developing techniques to handle increasingly complex multi-modal inputs that combine text, images, and structured data.

And perhaps most exciting is our research into improving agent reasoning over time through reinforcement learning from human feedback. Just as Maria gets better at her job through experience, our agents should continuously improve their decision-making by learning from successful resolutions and human corrections. This creates a compounding intelligence effect where each intervention makes the entire system smarter.

The logistics industry has been waiting for technology that can handle the complexity and variability of real-world operations. By combining the reasoning capabilities of large language models with robust action systems and human oversight (more on this in a follow up post!), we're creating AI agents that can scale human intelligence rather than replacing it.

We're looking for ambitious builders who want to join us. If you're passionate about designing large-scale AI systems that transform how logistics companies operate, we want to talk to you. Help us solve problems that matter for an industry that moves the world, using cutting-edge AI that actually works in production!

Making Agents Intelligent

Actions: Building Blocks for Intelligence

Fighting Chaos with Structure

Engineering for the Long Haul

The Future of AI in Logistics

Build your AI workforce

Build your
AI workforce