avalan

Intelligence orchestration at your fingertips.

avalan lets you build, orchestrate, and deploy intelligent AI agents anywhere (locally, on premises, in the cloud) through a unified CLI and SDK that supports any model, multi-modal inputs, and native adapters to leading platforms. With built-in memory, advanced reasoning workflows, and real-time observability, avalan accelerates development from prototype to enterprise-scale AI deployments.

CLI agent with tool usage

🧠Models

Run any model from a single CLI/SDKβ€”local checkpoints, on-prem clusters, or leading vendor APIs (OpenAI, Anthropic, OpenRouter, Ollama, and many more.).

Swap in the latest text, vision, or audio models at will. Fine-tune the stack, from tokenizers to reasoning strategies without changing your code.

echo 'Who are you, and who is Leo Messi?' \
    | avalan model run "meta-llama/Meta-Llama-3-8B-Instruct" \
      --system "You are Aurora, a helpful assistant" \
      --max-new-tokens 100 \
      --temperature .1 \
      --top-p .9 \
      --top-k 20
with TextGenerationModel("meta-llama/Meta-Llama-3-8B-Instruct") as lm:
    async for token in await lm(
        "Who are you, and who is Leo Messi?",
        system_prompt = "You are Aurora, a helpful assistant",
        settings = GenerationSettings(temperature = 0.1, max_new_tokens = 100, top_p = .9, top_k = 20),
    ):
        print(token, end="", flush=True)
echo 'Who are you, and who is Leo Messi?' \
    | avalan model run "ai://$OPENAI_API_KEY@openai/gpt-4o" \
      --system "You are Aurora, a helpful assistant" \
      --max-new-tokens 100 \
      --temperature .1 \
      --top-p .9 \
      --top-k 20
engine_settings = TransformerEngineSettings(access_token=os.environ("OPENAI_API_KEY"))
with OpenAIModel("gpt-4o", engine_settings) as lm:
    async for token in await lm(
        "Who are you, and who is Leo Messi?",
        system_prompt = "You are Aurora, a helpful assistant",
        settings = GenerationSettings(temperature = 0.1, max_new_tokens = 100, top_p = .9, top_k = 20),
    ):
        print(token, end="", flush=True)
echo "[S1] Leo Messi is the greatest football player of all times." \
    | avalan model run "nari-labs/Dia-1.6B-0626" \
        --modality audio_text_to_speech \
        --audio-path example.wav \
        --audio-reference-path docs/examples/oprah.wav \
        --audio-reference-text "[S1] And then I grew up and had the esteemed honor of meeting her. And wasn't that a surprise. Here was this petite, almost delicate lady who was the personification of grace and goodness."
            
with TextToSpeechModel("nari-labs/Dia-1.6B-0626") as speech:
    generated_path = await speech(
        text="[S1] Leo Messi is the greatest football player of all times.",
        path="example.wav",
        reference_path="docs/examples/oprah.wav",
        reference_text=(
            "[S1] And then I grew up and had "
            "the esteemed honor of meeting "
            "her. And wasn't that a "
            "surprise. Here was this petite, "
            "almost delicate lady who was "
            "the personification of grace "
            "and goodness."
        ),
        max_new_tokens=5120 # 128 tokens ~= 1 sec.
    )
    print(f"Speech generated in {generated_path}")
            

πŸ€–Agents

Spin up adaptive *agents* that invoke tools, stream real-time data, and re-plan as context shifts.

Built-in observability tracks prompts, latency, and rewards, so you can improve behaviour with every run.

echo "What is (4 + 6) and then that result times 5, divided by 2?" \
  | avalan agent run \
      --engine-uri "NousResearch/Hermes-3-Llama-3.1-8B" \
      --tool "math.calculator" \
      --memory-recent \
      --run-max-new-tokens 1024 \
      --name "Tool" \
      --role "You are a helpful assistant named Tool, that can resolve user requests using tools." \
      --stats \
      --display-events \
      --display-tools \
      --conversation
with TextGenerationModel("NousResearch/Hermes-3-Llama-3.1-8B") as lm:
    async for token in await lm(
        "What is (4 + 6) and then that result times 5, divided by 2?",
        system_prompt = "You are a helpful assistant named Tool, that can resolve user requests using tools." ,
        settings = GenerationSettings(temperature = 0.9, max_new_tokens = 256),
    ):
        print(token, end="", flush=True)
echo "Hi Tool, based on our previous conversations, what's my name?" \
  | avalan agent run \
      --engine-uri "NousResearch/Hermes-3-Llama-3.1-8B" \
      --tool "memory.message.read" \
      --memory-recent \
      --memory-permanent-message "postgresql://root:password@localhost/avalan" \
      --id "f4fd12f4-25ea-4c81-9514-d31fb4c48128" \
      --participant "c67d6ec7-b6ea-40db-bf1a-6de6f9e0bb58" \
      --run-max-new-tokens 1024 \
      --name "Tool" \
      --role "You are a helpful assistant named Tool, that can resolve user requests using tools." \
      --stats
with TextGenerationModel("NousResearch/Hermes-3-Llama-3.1-8B") as lm:
    async for token in await lm(
        "Hi Tool, based on our previous conversations, what's my name?",
        system_prompt = "You are a helpful assistant named Tool, that can resolve user requests using tools.",
        settings = GenerationSettings(temperature = 0.9, max_new_tokens = 256),
    ):
        print(token, end="", flush=True)

πŸ’ΎMemories

Attach rich semantic history and pluggable knowledge stores with documents, code, or live web pages so agents respond with up-to-date context, not guesswork.

πŸ› οΈTools

Grant agents new powers by wiring in multiple tools, from internal microservices to public APIsβ€”with fine-grained control over authorization, access, and usage scope.

🌎Connections

Let agents talk. Expose them through your preferred protocol (OpenAI-compatible API, MCP, A2A) or bridge them via MCP tooling to other local or remote agents.

πŸ”€Flows

Orchestrate complex, multi-agent processes with intuitive agent collaboration with flows. Enjoy full end-to-end observability for debugging and performance, and manage task lifecycles to support both human-in-the-loop and fully automated workflows.

πŸš€Deploy

Deploy your intelligent solution on-premises or to the cloud in minutes. Choose between stateful, long-lasting agents for ongoing services or lightweight,ephemeral intelligent tasks for burst-scale operations.