avalan - The multi-backend, multi-modal framework for effortless AI agent development, orchestration, and deployment

🧠Models

Run any model from a single CLI/SDK—local checkpoints, on-prem clusters, or leading vendor APIs (OpenAI, Anthropic, OpenRouter, Ollama, and many more.).

Swap in the latest text, vision, or audio models at will. Fine-tune the stack, from tokenizers to reasoning strategies without changing your code.

echo 'Who are you, and who is Leo Messi?' \
    | avalan model run "meta-llama/Meta-Llama-3-8B-Instruct" \
      --system "You are Aurora, a helpful assistant" \
      --max-new-tokens 100 \
      --temperature .1 \
      --top-p .9 \
      --top-k 20

with TextGenerationModel("meta-llama/Meta-Llama-3-8B-Instruct") as lm:
    async for token in await lm(
        "Who are you, and who is Leo Messi?",
        system_prompt = "You are Aurora, a helpful assistant",
        settings = GenerationSettings(temperature = 0.1, max_new_tokens = 100, top_p = .9, top_k = 20),
    ):
        print(token, end="", flush=True)

echo 'Who are you, and who is Leo Messi?' \
    | avalan model run "ai://$OPENAI_API_KEY@openai/gpt-4o" \
      --system "You are Aurora, a helpful assistant" \
      --max-new-tokens 100 \
      --temperature .1 \
      --top-p .9 \
      --top-k 20

engine_settings = TransformerEngineSettings(access_token=os.environ("OPENAI_API_KEY"))
with OpenAIModel("gpt-4o", engine_settings) as lm:
    async for token in await lm(
        "Who are you, and who is Leo Messi?",
        system_prompt = "You are Aurora, a helpful assistant",
        settings = GenerationSettings(temperature = 0.1, max_new_tokens = 100, top_p = .9, top_k = 20),
    ):
        print(token, end="", flush=True)

echo "[S1] Leo Messi is the greatest football player of all times." \
    | avalan model run "nari-labs/Dia-1.6B-0626" \
        --modality audio_text_to_speech \
        --audio-path example.wav \
        --audio-reference-path docs/examples/oprah.wav \
        --audio-reference-text "[S1] And then I grew up and had the esteemed honor of meeting her. And wasn't that a surprise. Here was this petite, almost delicate lady who was the personification of grace and goodness."

with TextToSpeechModel("nari-labs/Dia-1.6B-0626") as speech:
    generated_path = await speech(
        text="[S1] Leo Messi is the greatest football player of all times.",
        path="example.wav",
        reference_path="docs/examples/oprah.wav",
        reference_text=(
            "[S1] And then I grew up and had "
            "the esteemed honor of meeting "
            "her. And wasn't that a "
            "surprise. Here was this petite, "
            "almost delicate lady who was "
            "the personification of grace "
            "and goodness."
        ),
        max_new_tokens=5120 # 128 tokens ~= 1 sec.
    )
    print(f"Speech generated in {generated_path}")

🤖Agents

Spin up adaptive *agents* that invoke tools, stream real-time data, and re-plan as context shifts.

Built-in observability tracks prompts, latency, and rewards, so you can improve behaviour with every run.

echo "What is (4 + 6) and then that result times 5, divided by 2?" \
  | avalan agent run \
      --engine-uri "NousResearch/Hermes-3-Llama-3.1-8B" \
      --tool "math.calculator" \
      --memory-recent \
      --run-max-new-tokens 1024 \
      --name "Tool" \
      --role "You are a helpful assistant named Tool, that can resolve user requests using tools." \
      --stats \
      --display-events \
      --display-tools \
      --conversation

with TextGenerationModel("NousResearch/Hermes-3-Llama-3.1-8B") as lm:
    async for token in await lm(
        "What is (4 + 6) and then that result times 5, divided by 2?",
        system_prompt = "You are a helpful assistant named Tool, that can resolve user requests using tools." ,
        settings = GenerationSettings(temperature = 0.9, max_new_tokens = 256),
    ):
        print(token, end="", flush=True)

echo "Hi Tool, based on our previous conversations, what's my name?" \
  | avalan agent run \
      --engine-uri "NousResearch/Hermes-3-Llama-3.1-8B" \
      --tool "memory.message.read" \
      --memory-recent \
      --memory-permanent-message "postgresql://root:password@localhost/avalan" \
      --id "f4fd12f4-25ea-4c81-9514-d31fb4c48128" \
      --participant "c67d6ec7-b6ea-40db-bf1a-6de6f9e0bb58" \
      --run-max-new-tokens 1024 \
      --name "Tool" \
      --role "You are a helpful assistant named Tool, that can resolve user requests using tools." \
      --stats

with TextGenerationModel("NousResearch/Hermes-3-Llama-3.1-8B") as lm:
    async for token in await lm(
        "Hi Tool, based on our previous conversations, what's my name?",
        system_prompt = "You are a helpful assistant named Tool, that can resolve user requests using tools.",
        settings = GenerationSettings(temperature = 0.9, max_new_tokens = 256),
    ):
        print(token, end="", flush=True)

💾Memories

Attach rich semantic history and pluggable knowledge stores with documents, code, or live web pages so agents respond with up-to-date context, not guesswork.

🛠️Tools

Grant agents new powers by wiring in multiple tools, from internal microservices to public APIs—with fine-grained control over authorization, access, and usage scope.

🌎Connections

Let agents talk. Expose them through your preferred protocol (OpenAI-compatible API, MCP, A2A) or bridge them via MCP tooling to other local or remote agents.

🔀Flows

Orchestrate complex, multi-agent processes with intuitive agent collaboration with flows. Enjoy full end-to-end observability for debugging and performance, and manage task lifecycles to support both human-in-the-loop and fully automated workflows.

🚀Deploy

Deploy your intelligent solution on-premises or to the cloud in minutes. Choose between stateful, long-lasting agents for ongoing services or lightweight,ephemeral intelligent tasks for burst-scale operations.

Intelligence orchestration at your fingertips.