Overview
What is this SDK?
The Notary Labs Python SDK adds observability to your AI agents. Observability means you can see what's happening inside your agent as it runs—every LLM call, every tool execution, every decision.
Why do you need observability?
AI agents are black boxes. When something goes wrong, you have no idea what happened. Did the LLM return garbage? Did a tool fail? Did the agent loop forever? Without observability, debugging is guesswork.
# Without tracing, you have no idea what happened:
result = my_agent("Summarize this document")
# Did it work? How long did it take? What LLM calls were made?
# What tools were used? Why did it fail? 🤷
# With tracing, you see everything:
@notarylabs.observe("my-agent")
def my_agent(question: str):
# Every step is recorded with timing, inputs, outputs
...What does the SDK capture?
The SDK captures three types of events, which are automatically linked in a parent-child hierarchy:
Traces
A complete agent execution from start to finish. Contains timing, inputs, outputs, and all child events.
LLM Calls
Every call to an LLM (GPT-4, Claude, etc.). Includes model, messages, response, token usage, and latency.
Tool Calls
Every tool your agent uses—API calls, database queries, web searches. Includes inputs, outputs, and timing.
How does it work?
You add a decorator to your agent functions and wrap your tool calls. The SDK captures everything in the background and sends it to your dashboard. Your agent code stays clean—no logging boilerplate everywhere.
Quick Start
Get observability working in under 5 minutes. This is the minimal setup—just enough to see traces in your dashboard.
Install the SDK
pip install notarylabsAdd to your agent
This is the minimal integration. It traces your agent function and captures when it runs, what arguments it receives, and what it returns.
import notarylabs
# Initialize with your API key
notarylabs.init(api_key="nl_live_xxx")
@notarylabs.observe("my-agent")
def my_agent(question: str):
# Your agent logic here
answer = call_llm(question)
return answer
# Run your agent - the trace is automatically sent to your dashboard
result = my_agent("What is machine learning?")
# Always call shutdown before your program exits
notarylabs.shutdown()View in dashboard
Run your agent and check your dashboard. You'll see a trace with the function name, input arguments, output, and duration.
Open Dashboarddebug=True to your init() call to see events printed to your console. This helps verify the SDK is working before checking the dashboard.Core Concepts
The Event Hierarchy
The SDK creates a tree structure of events. Understanding this hierarchy is key to using the SDK effectively.
In this example, the trace "customer-support-agent" is the parent. It contains three tool calls and one LLM call, all of which are children. This hierarchy lets you see exactly what your agent did, in what order, and how long each step took.
Context Propagation
The SDK automatically tracks which trace is "active" using Python's contextvars. This means:
- When you're inside an
@observefunction, any tool calls or LLM calls become children of that trace - If you call another
@observefunction, it becomes a child trace - This works correctly with async/await—each concurrent task maintains its own trace context
Background Batching
Events are not sent to the server immediately. Instead, they're queued and sent in batches every 5 seconds (or when 100 events accumulate). This keeps the SDK from slowing down your agent.
When do I need to call shutdown()?
It depends on your use case:
- One-shot scripts — Call
shutdown()before exit, otherwise the final batch of events may be lost. - Long-running agents — Don't call
shutdown()during operation. Events flush automatically every 5 seconds. Only call it on process termination (via signal handler).
shutdown(), the background emitter stops and no more events will be sent. Don't call it in a loop or between agent runs.Tracing Agents
The @observe Decorator
The @observe decorator is how you tell the SDK "I want to trace this function." It wraps your function and records everything about its execution.
import notarylabs
notarylabs.init(api_key="nl_live_xxx")
# The @observe decorator wraps your function and records:
# - When it started and ended
# - What arguments were passed in
# - What value was returned (or what error was raised)
# - How long it took
# - All LLM calls and tool calls made inside
@notarylabs.observe("document-summarizer")
def summarize_document(doc: str) -> str:
# Everything inside here is traced
summary = call_llm(f"Summarize: {doc}")
return summary
result = summarize_document("Long document text...")
# A trace event is automatically sent to your dashboardWhat gets captured automatically:
- Input arguments — Both positional and keyword arguments are serialized and recorded
- Return value — Whatever your function returns is serialized and recorded
- Duration — How long the function took to execute (in milliseconds)
- Status — "success" if the function returned normally, "error" if it raised an exception
- Error message — If an exception was raised, the error message is captured
- Child events — All tool calls, LLM calls, and nested traces inside this function
Nested Traces
When one observed function calls another observed function, the SDK automatically creates a parent-child relationship. This lets you build complex, multi-step agents while maintaining visibility into each step.
# Traces can be nested. When you call an observed function
# from inside another observed function, a parent-child
# relationship is automatically created.
@notarylabs.observe("orchestrator")
def orchestrator(task: str):
# This trace is the parent
plan = planner(task) # Child trace
result = executor(plan) # Child trace
return result
@notarylabs.observe("planner")
def planner(task: str):
# This trace is a child of "orchestrator"
return create_plan(task)
@notarylabs.observe("executor")
def executor(plan: list):
# This trace is also a child of "orchestrator"
return execute_plan(plan)
# In your dashboard, you'll see:
# orchestrator
# ├── planner
# └── executorLogging Tools
What is a "tool" in this context?
A tool is any external action your agent takes. This includes:
- API calls (REST, GraphQL, etc.)
- Database queries
- File system operations
- Web searches
- Sending emails or notifications
- Any function with side effects
The tool() Context Manager
Use the tool() context manager to wrap any tool execution. You explicitly set the input and output, giving you full control over what gets logged.
# Tools are external actions your agent takes:
# - API calls
# - Database queries
# - File operations
# - Web searches
# - Any side effect
# Use the tool() context manager to log these:
@notarylabs.observe("research-agent")
def research_agent(topic: str):
# Log a web search tool call
with notarylabs.tool("web-search") as t:
t.set_input({"query": topic}) # What went into the tool
results = search_api(topic) # Actually call the tool
t.set_output({"results": results}) # What came out
# Log a database query
with notarylabs.tool("database-lookup") as t:
t.set_input({"table": "documents", "filter": topic})
docs = db.query(topic)
t.set_output({"count": len(docs)})
return process_results(results, docs)Why set_input() and set_output() are separate:
You call set_input() before executing the tool and set_output() after. This ensures that even if the tool throws an exception, the input is still logged—which is crucial for debugging.
Error Handling
If an exception is raised inside a tool() block, the SDK captures the error and then re-raises it. Your code doesn't need any special error handling.
# Errors are automatically captured. If an exception is raised
# inside a tool() block, the SDK records the error and re-raises it.
with notarylabs.tool("risky-api-call") as t:
t.set_input({"endpoint": "/data"})
response = requests.get("https://api.example.com/data")
response.raise_for_status() # Might raise an exception
t.set_output(response.json())
# If raise_for_status() throws, your dashboard shows:
# - tool_name: "risky-api-call"
# - status: "error"
# - error: "HTTPError: 500 Server Error"
# - duration_ms: 1234LLM Auto-Capture
tool(). Auto-capture is a convenience feature that extracts richer data.Why use auto-capture?
LLM calls are the most important thing to monitor in an AI agent. Auto-capture gives you structured data—model name, token usage, latency—without writing any logging code.
# Problem: LLM calls are the most important thing to monitor,
# but manually logging every call is tedious and error-prone.
# Without auto-capture (tedious):
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
)
# Now manually log: model, messages, response, tokens, latency...
# With auto-capture (automatic):
from notarylabs.clients import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
)
# Automatically logged: model, messages, response, token usage, latencyUsing the Wrapped OpenAI Client
The SDK provides a drop-in replacement for the official OpenAI client. The API is identical—just change your import.
from notarylabs.clients import OpenAI
# This is a drop-in replacement for openai.OpenAI
# The API is identical - just change your import
client = OpenAI(api_key="sk-...")
@notarylabs.observe("qa-agent")
def qa_agent(question: str):
# This LLM call is automatically captured
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
# Your dashboard shows the LLM call with:
# - provider: "openai"
# - model: "gpt-4"
# - input_messages: [...]
# - output_content: "..."
# - token usage: {prompt: X, completion: Y, total: Z}
# - duration_ms: 1234What gets captured:
| Field | Description |
|---|---|
| provider | "openai" |
| model | The model name (e.g., "gpt-4", "gpt-3.5-turbo") |
| input_messages | The full message array sent to the LLM |
| output_content | The LLM's response text |
| tool_calls | Any tool calls the LLM requested (function calling) |
| usage | Token counts: prompt, completion, total |
| duration_ms | How long the LLM call took |
API Reference
notarylabs.init(api_key, debug=False)Initialize the SDK. Call this once when your application starts, before any traced functions run.
Parameters:
api_key(str, required) — Your Notary Labs API key. Must start with "nl_live_" or "nl_test_".debug(bool, default False) — If True, prints every event to the console as JSON.
notarylabs.shutdown()Flush any remaining events and permanently stop the background emitter. After calling this, no more events will be logged.
When to call:
- One-shot scripts: Call before your script exits
- Long-running agents: Only call on process termination (SIGTERM/SIGINT handler)
- Never: In a loop, between agent runs, or mid-execution
# The SDK sends events in the background every 5 seconds.
# You do NOT need to call shutdown() manually during operation.
# ============================================
# ONE-SHOT SCRIPTS (process exits after task)
# ============================================
# Use shutdown() to flush remaining events before exit
notarylabs.init(api_key="nl_live_xxx")
result = run_agent("Do this task")
notarylabs.shutdown() # Required - flush before exit
# Or use atexit for cleaner code:
import atexit
notarylabs.init(api_key="nl_live_xxx")
atexit.register(notarylabs.shutdown)
# ============================================
# LONG-RUNNING AGENTS (process runs forever)
# ============================================
# Events auto-flush every 5 seconds - no manual action needed
notarylabs.init(api_key="nl_live_xxx")
while True:
run_agent_cycle()
# Events are sent automatically in the background
# Do NOT call shutdown() here - it stops logging permanently
# Only handle shutdown on process termination:
import signal
def handle_exit(sig, frame):
notarylabs.shutdown() # Flush final batch
exit(0)
signal.signal(signal.SIGTERM, handle_exit)
signal.signal(signal.SIGINT, handle_exit)@notarylabs.observe(name: str)Decorator that creates a trace for a function. All LLM calls and tool() usages inside become children of this trace.
Parameters:
name(str) — A descriptive name for this trace (e.g., "my-agent", "document-summarizer").
notarylabs.tool(name: str)Context manager for logging tool calls. Yields a context object with set_input() and set_output() methods.
Parameters:
name(str) — A descriptive name for this tool (e.g., "web-search", "database-query").
Context methods:
t.set_input(data)— Record the input to this tool call. Call before executing the tool.t.set_output(data)— Record the output from this tool call. Call after executing the tool.
Event Schemas
These are the JSON objects sent to your dashboard. Understanding them helps when debugging or building custom integrations.
TraceEvent
{
"event_type": "trace",
"event_id": "abc-123",
"trace_id": "abc-123",
"trace_name": "my-agent",
"input_args": {"args": ["What is AI?"], "kwargs": {}},
"output": "AI is artificial intelligence...",
"duration_ms": 2500,
"status": "success",
"child_events": ["def-456", "ghi-789"]
}LLMCallEvent
{
"event_type": "llm_call",
"event_id": "def-456",
"trace_id": "abc-123",
"provider": "openai",
"model": "gpt-4",
"input_messages": [{"role": "user", "content": "What is AI?"}],
"output_content": "AI is artificial intelligence...",
"usage": {"prompt_tokens": 10, "completion_tokens": 50, "total_tokens": 60},
"duration_ms": 1800,
"status": "success"
}ToolCallEvent
{
"event_type": "tool_call",
"event_id": "ghi-789",
"trace_id": "abc-123",
"tool_name": "web-search",
"input_args": {"query": "artificial intelligence"},
"output": {"results": [...]},
"duration_ms": 500,
"status": "success"
}Troubleshooting
Events aren't showing up in my dashboard
# Problem: Events aren't showing up in the dashboard
# 1. Check that you're calling shutdown()
notarylabs.shutdown() # <-- Don't forget this!
# 2. Enable debug mode to see what's happening
notarylabs.init(api_key="nl_live_xxx", debug=True)
# 3. Check for error messages in the console
# [Notary] Flush 3 events: failed <-- Something went wrong
# [Notary] Flush 3 events: sent <-- Working correctly"Invalid API key" error
# Problem: "Invalid API key" error
# Make sure your API key starts with the correct prefix:
# - nl_live_xxx (production)
# - nl_test_xxx (testing)
# Wrong:
notarylabs.init(api_key="sk-xxx") # This is an OpenAI key!
# Correct:
notarylabs.init(api_key="nl_live_xxx")Events say "failed" but my agent keeps running
The SDK is designed to fail silently. If it can't reach the server, it logs a warning but doesn't crash your agent. This is intentional—observability shouldn't break production.
Check that your API endpoint is reachable and your API key is valid. Enable debug=True to see detailed error messages.
Complete Example
Here's a full example putting everything together: initialization, shutdown, tracing, tool logging, and LLM auto-capture.
import notarylabs
from notarylabs.clients import OpenAI
# 1. Initialize once at startup
notarylabs.init(
api_key="nl_live_xxx",
debug=True # Prints events to console during development
)
# 2. Create wrapped LLM client
llm = OpenAI()
# 3. Define your agent with @observe
@notarylabs.observe("customer-support-agent")
def support_agent(user_message: str) -> str:
# Tool call: classify the user's intent
with notarylabs.tool("classify-intent") as t:
t.set_input({"message": user_message})
intent = classify(user_message)
t.set_output({"intent": intent})
# Tool call: search knowledge base
with notarylabs.tool("search-docs") as t:
t.set_input({"query": user_message, "intent": intent})
docs = knowledge_base.search(user_message)
t.set_output({"doc_count": len(docs)})
# LLM call: generate response (auto-captured)
response = llm.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"Context: {docs}"},
{"role": "user", "content": user_message}
]
)
return response.choices[0].message.content
# 4. Run your agent
answer = support_agent("How do I reset my password?")
# 5. Shutdown flushes any remaining events
notarylabs.shutdown()Need help?
Found a bug or have a question? Open an issue on GitHub or contact support.