Notary Labs - Data Control Plane for Agentic Workflows

Overview

What is this SDK?

The Notary Labs Python SDK adds observability to your AI agents. Observability means you can see what's happening inside your agent as it runs—every LLM call, every tool execution, every decision.

Why do you need observability?

AI agents are black boxes. When something goes wrong, you have no idea what happened. Did the LLM return garbage? Did a tool fail? Did the agent loop forever? Without observability, debugging is guesswork.

The problem

# Without tracing, you have no idea what happened:
result = my_agent("Summarize this document")
# Did it work? How long did it take? What LLM calls were made?
# What tools were used? Why did it fail? 🤷

# With tracing, you see everything:
@notarylabs.observe("my-agent")
def my_agent(question: str):
    # Every step is recorded with timing, inputs, outputs
    ...

What does the SDK capture?

The SDK captures three types of events, which are automatically linked in a parent-child hierarchy:

Traces

A complete agent execution from start to finish. Contains timing, inputs, outputs, and all child events.

LLM Calls

Every call to an LLM (GPT-4, Claude, etc.). Includes model, messages, response, token usage, and latency.

Tool Calls

Every tool your agent uses—API calls, database queries, web searches. Includes inputs, outputs, and timing.

How does it work?

You add a decorator to your agent functions and wrap your tool calls. The SDK captures everything in the background and sends it to your dashboard. Your agent code stays clean—no logging boilerplate everywhere.

Quick Start

Get observability working in under 5 minutes. This is the minimal setup—just enough to see traces in your dashboard.

Install the SDK

Terminal

pip install notarylabs

Add to your agent

This is the minimal integration. It traces your agent function and captures when it runs, what arguments it receives, and what it returns.

agent.py

import notarylabs

# Initialize with your API key
notarylabs.init(api_key="nl_live_xxx")

@notarylabs.observe("my-agent")
def my_agent(question: str):
    # Your agent logic here
    answer = call_llm(question)
    return answer

# Run your agent - the trace is automatically sent to your dashboard
result = my_agent("What is machine learning?")

# Always call shutdown before your program exits
notarylabs.shutdown()

View in dashboard

Run your agent and check your dashboard. You'll see a trace with the function name, input arguments, output, and duration.

Open Dashboard

Enable debug mode during development

Add debug=True to your init() call to see events printed to your console. This helps verify the SDK is working before checking the dashboard.

Core Concepts

The Event Hierarchy

The SDK creates a tree structure of events. Understanding this hierarchy is key to using the SDK effectively.

trace: "customer-support-agent"

├── tool_call: "classify-intent"

├── tool_call: "search-knowledge-base"

├── llm_call: "gpt-4"

└── tool_call: "send-response"

In this example, the trace "customer-support-agent" is the parent. It contains three tool calls and one LLM call, all of which are children. This hierarchy lets you see exactly what your agent did, in what order, and how long each step took.

Context Propagation

The SDK automatically tracks which trace is "active" using Python's contextvars. This means:

When you're inside an @observe function, any tool calls or LLM calls become children of that trace
If you call another @observe function, it becomes a child trace
This works correctly with async/await—each concurrent task maintains its own trace context

Background Batching

Events are not sent to the server immediately. Instead, they're queued and sent in batches every 5 seconds (or when 100 events accumulate). This keeps the SDK from slowing down your agent.

When do I need to call shutdown()?

It depends on your use case:

One-shot scripts — Call shutdown() before exit, otherwise the final batch of events may be lost.
Long-running agents — Don't call shutdown() during operation. Events flush automatically every 5 seconds. Only call it on process termination (via signal handler).

shutdown() stops logging permanently

Once you call shutdown(), the background emitter stops and no more events will be sent. Don't call it in a loop or between agent runs.

Tracing Agents

The @observe Decorator

The @observe decorator is how you tell the SDK "I want to trace this function." It wraps your function and records everything about its execution.

Basic usage

import notarylabs

notarylabs.init(api_key="nl_live_xxx")

# The @observe decorator wraps your function and records:
# - When it started and ended
# - What arguments were passed in
# - What value was returned (or what error was raised)
# - How long it took
# - All LLM calls and tool calls made inside

@notarylabs.observe("document-summarizer")
def summarize_document(doc: str) -> str:
    # Everything inside here is traced
    summary = call_llm(f"Summarize: {doc}")
    return summary

result = summarize_document("Long document text...")
# A trace event is automatically sent to your dashboard

What gets captured automatically:

Input arguments — Both positional and keyword arguments are serialized and recorded
Return value — Whatever your function returns is serialized and recorded
Duration — How long the function took to execute (in milliseconds)
Status — "success" if the function returned normally, "error" if it raised an exception
Error message — If an exception was raised, the error message is captured
Child events — All tool calls, LLM calls, and nested traces inside this function

Nested Traces

When one observed function calls another observed function, the SDK automatically creates a parent-child relationship. This lets you build complex, multi-step agents while maintaining visibility into each step.

Nested traces

# Traces can be nested. When you call an observed function
# from inside another observed function, a parent-child
# relationship is automatically created.

@notarylabs.observe("orchestrator")
def orchestrator(task: str):
    # This trace is the parent
    plan = planner(task)        # Child trace
    result = executor(plan)      # Child trace
    return result

@notarylabs.observe("planner")
def planner(task: str):
    # This trace is a child of "orchestrator"
    return create_plan(task)

@notarylabs.observe("executor")
def executor(plan: list):
    # This trace is also a child of "orchestrator"
    return execute_plan(plan)

# In your dashboard, you'll see:
# orchestrator
# ├── planner
# └── executor

Naming your traces

Use descriptive names like "customer-support-agent" or "document-summarizer" instead of generic names like "agent" or "main". Good names make your dashboard much easier to navigate.

Logging Tools

What is a "tool" in this context?

A tool is any external action your agent takes. This includes:

API calls (REST, GraphQL, etc.)
Database queries
File system operations
Web searches
Sending emails or notifications
Any function with side effects

The tool() Context Manager

Use the tool() context manager to wrap any tool execution. You explicitly set the input and output, giving you full control over what gets logged.

Logging tool calls

# Tools are external actions your agent takes:
# - API calls
# - Database queries
# - File operations
# - Web searches
# - Any side effect

# Use the tool() context manager to log these:

@notarylabs.observe("research-agent")
def research_agent(topic: str):

    # Log a web search tool call
    with notarylabs.tool("web-search") as t:
        t.set_input({"query": topic})      # What went into the tool
        results = search_api(topic)         # Actually call the tool
        t.set_output({"results": results})  # What came out

    # Log a database query
    with notarylabs.tool("database-lookup") as t:
        t.set_input({"table": "documents", "filter": topic})
        docs = db.query(topic)
        t.set_output({"count": len(docs)})

    return process_results(results, docs)

Why set_input() and set_output() are separate:

You call set_input() before executing the tool and set_output() after. This ensures that even if the tool throws an exception, the input is still logged—which is crucial for debugging.

Error Handling

If an exception is raised inside a tool() block, the SDK captures the error and then re-raises it. Your code doesn't need any special error handling.

Errors are captured automatically

# Errors are automatically captured. If an exception is raised
# inside a tool() block, the SDK records the error and re-raises it.

with notarylabs.tool("risky-api-call") as t:
    t.set_input({"endpoint": "/data"})
    response = requests.get("https://api.example.com/data")
    response.raise_for_status()  # Might raise an exception
    t.set_output(response.json())

# If raise_for_status() throws, your dashboard shows:
# - tool_name: "risky-api-call"
# - status: "error"
# - error: "HTTPError: 500 Server Error"
# - duration_ms: 1234

LLM Auto-Capture

This is optional

You can use the SDK without auto-capture by logging LLM calls with tool(). Auto-capture is a convenience feature that extracts richer data.

Why use auto-capture?

LLM calls are the most important thing to monitor in an AI agent. Auto-capture gives you structured data—model name, token usage, latency—without writing any logging code.

Manual vs auto-capture

# Problem: LLM calls are the most important thing to monitor,
# but manually logging every call is tedious and error-prone.

# Without auto-capture (tedious):
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": query}]
)
# Now manually log: model, messages, response, tokens, latency...

# With auto-capture (automatic):
from notarylabs.clients import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": query}]
)
# Automatically logged: model, messages, response, token usage, latency

Using the Wrapped OpenAI Client

The SDK provides a drop-in replacement for the official OpenAI client. The API is identical—just change your import.

OpenAI auto-capture

from notarylabs.clients import OpenAI

# This is a drop-in replacement for openai.OpenAI
# The API is identical - just change your import
client = OpenAI(api_key="sk-...")

@notarylabs.observe("qa-agent")
def qa_agent(question: str):
    # This LLM call is automatically captured
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

# Your dashboard shows the LLM call with:
# - provider: "openai"
# - model: "gpt-4"
# - input_messages: [...]
# - output_content: "..."
# - token usage: {prompt: X, completion: Y, total: Z}
# - duration_ms: 1234

What gets captured:

Field	Description
provider	"openai"
model	The model name (e.g., "gpt-4", "gpt-3.5-turbo")
input_messages	The full message array sent to the LLM
output_content	The LLM's response text
tool_calls	Any tool calls the LLM requested (function calling)
usage	Token counts: prompt, completion, total
duration_ms	How long the LLM call took

Anthropic support is experimental

We also provide a wrapped Anthropic client, but it hasn't been extensively tested. Use with caution and report any issues.

API Reference

notarylabs.init(api_key, debug=False)

Initialize the SDK. Call this once when your application starts, before any traced functions run.

Parameters:

api_key (str, required) — Your Notary Labs API key. Must start with "nl_live_" or "nl_test_".
debug (bool, default False) — If True, prints every event to the console as JSON.

notarylabs.shutdown()

Flush any remaining events and permanently stop the background emitter. After calling this, no more events will be logged.

When to call:

One-shot scripts: Call before your script exits
Long-running agents: Only call on process termination (SIGTERM/SIGINT handler)
Never: In a loop, between agent runs, or mid-execution

Shutdown patterns

# The SDK sends events in the background every 5 seconds.
# You do NOT need to call shutdown() manually during operation.

# ============================================
# ONE-SHOT SCRIPTS (process exits after task)
# ============================================
# Use shutdown() to flush remaining events before exit

notarylabs.init(api_key="nl_live_xxx")
result = run_agent("Do this task")
notarylabs.shutdown()  # Required - flush before exit

# Or use atexit for cleaner code:
import atexit
notarylabs.init(api_key="nl_live_xxx")
atexit.register(notarylabs.shutdown)


# ============================================
# LONG-RUNNING AGENTS (process runs forever)
# ============================================
# Events auto-flush every 5 seconds - no manual action needed

notarylabs.init(api_key="nl_live_xxx")

while True:
    run_agent_cycle()
    # Events are sent automatically in the background
    # Do NOT call shutdown() here - it stops logging permanently

# Only handle shutdown on process termination:
import signal

def handle_exit(sig, frame):
    notarylabs.shutdown()  # Flush final batch
    exit(0)

signal.signal(signal.SIGTERM, handle_exit)
signal.signal(signal.SIGINT, handle_exit)

@notarylabs.observe(name: str)

Decorator that creates a trace for a function. All LLM calls and tool() usages inside become children of this trace.

Parameters:

name (str) — A descriptive name for this trace (e.g., "my-agent", "document-summarizer").

notarylabs.tool(name: str)

Context manager for logging tool calls. Yields a context object with set_input() and set_output() methods.

Parameters:

name (str) — A descriptive name for this tool (e.g., "web-search", "database-query").

Context methods:

t.set_input(data) — Record the input to this tool call. Call before executing the tool.
t.set_output(data) — Record the output from this tool call. Call after executing the tool.

Event Schemas

These are the JSON objects sent to your dashboard. Understanding them helps when debugging or building custom integrations.

TraceEvent

{
  "event_type": "trace",
  "event_id": "abc-123",
  "trace_id": "abc-123",
  "trace_name": "my-agent",
  "input_args": {"args": ["What is AI?"], "kwargs": {}},
  "output": "AI is artificial intelligence...",
  "duration_ms": 2500,
  "status": "success",
  "child_events": ["def-456", "ghi-789"]
}

LLMCallEvent

{
  "event_type": "llm_call",
  "event_id": "def-456",
  "trace_id": "abc-123",
  "provider": "openai",
  "model": "gpt-4",
  "input_messages": [{"role": "user", "content": "What is AI?"}],
  "output_content": "AI is artificial intelligence...",
  "usage": {"prompt_tokens": 10, "completion_tokens": 50, "total_tokens": 60},
  "duration_ms": 1800,
  "status": "success"
}

ToolCallEvent

{
  "event_type": "tool_call",
  "event_id": "ghi-789",
  "trace_id": "abc-123",
  "tool_name": "web-search",
  "input_args": {"query": "artificial intelligence"},
  "output": {"results": [...]},
  "duration_ms": 500,
  "status": "success"
}

Troubleshooting

Events aren't showing up in my dashboard

# Problem: Events aren't showing up in the dashboard

# 1. Check that you're calling shutdown()
notarylabs.shutdown()  # <-- Don't forget this!

# 2. Enable debug mode to see what's happening
notarylabs.init(api_key="nl_live_xxx", debug=True)

# 3. Check for error messages in the console
# [Notary] Flush 3 events: failed  <-- Something went wrong
# [Notary] Flush 3 events: sent    <-- Working correctly

"Invalid API key" error

# Problem: "Invalid API key" error

# Make sure your API key starts with the correct prefix:
# - nl_live_xxx  (production)
# - nl_test_xxx  (testing)

# Wrong:
notarylabs.init(api_key="sk-xxx")  # This is an OpenAI key!

# Correct:
notarylabs.init(api_key="nl_live_xxx")

Events say "failed" but my agent keeps running

The SDK is designed to fail silently. If it can't reach the server, it logs a warning but doesn't crash your agent. This is intentional—observability shouldn't break production.

Check that your API endpoint is reachable and your API key is valid. Enable debug=True to see detailed error messages.

Complete Example

Here's a full example putting everything together: initialization, shutdown, tracing, tool logging, and LLM auto-capture.

complete_agent.py

import notarylabs
from notarylabs.clients import OpenAI

# 1. Initialize once at startup
notarylabs.init(
    api_key="nl_live_xxx",
    debug=True  # Prints events to console during development
)

# 2. Create wrapped LLM client
llm = OpenAI()

# 3. Define your agent with @observe
@notarylabs.observe("customer-support-agent")
def support_agent(user_message: str) -> str:

    # Tool call: classify the user's intent
    with notarylabs.tool("classify-intent") as t:
        t.set_input({"message": user_message})
        intent = classify(user_message)
        t.set_output({"intent": intent})

    # Tool call: search knowledge base
    with notarylabs.tool("search-docs") as t:
        t.set_input({"query": user_message, "intent": intent})
        docs = knowledge_base.search(user_message)
        t.set_output({"doc_count": len(docs)})

    # LLM call: generate response (auto-captured)
    response = llm.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"Context: {docs}"},
            {"role": "user", "content": user_message}
        ]
    )

    return response.choices[0].message.content

# 4. Run your agent
answer = support_agent("How do I reset my password?")

# 5. Shutdown flushes any remaining events
notarylabs.shutdown()

Need help?

Found a bug or have a question? Open an issue on GitHub or contact support.

Report Issue Contact Support