Skip to main content

Validation and runbooks

Use this page when Quickstart is no longer enough and you need a sharper operational playbook. If you still need first-line triage, start with Troubleshooting.

Bring the baseline support capture from Troubleshooting with you. This page is for narrowing the failure mode and choosing the next command family or repo runbook, not for replacing the first-pass checklist.

What each command is actually for

  • aether doctor validates local prerequisites: FUSE visibility, kernel support, and the configured mount directory when one is already present in config.
  • aether mount is the first real end-to-end test of endpoint reachability, auth, and session access.
  • aether status talks to a running local mount over the control socket.
  • aether logs tail reads the resolved log file, not stdout/stderr.
  • aether metrics show prints the resolved metrics endpoint and scrape URL.
  • aether fuse check and aether fuse cleanup are the cleanup tools when mounts get messy.

Fast validation flow

Make endpoint and auth explicit, then separate local preflight from remote validation:

export AFS_AETHER_SERVER_ENDPOINT='https://grpc.aetherfs.io'
export AFS_AUTH_TOKEN='Bearer <tenant-uuid>:<principal>'

aether doctor

mkdir -p ./workdir
aether mount --session-id "$SESSION_ID" --mount-dir ./workdir

aether status --session-id "$SESSION_ID"
aether stop --session-id "$SESSION_ID"

If you use a non-default cache layout, keep AFS_AETHER_CACHE_DIR explicit or pass --cache-dir / --socket to the status-style commands so control-socket discovery stays deterministic.

Failure-mode map

doctor fails before mount

Typical causes:

  • /dev/fuse or the FUSE kernel module is unavailable
  • the configured mount path is invalid or blocked by permissions
  • the kernel is missing an expected capability

Important nuance:

  • a missing mount directory is only a warning; aether mount can create it during startup when permissions allow
  • advisory kernel warnings do not always prevent a mount, but they usually explain degraded behavior

doctor passes but the mount still fails

This usually means the problem is remote, not local. Check these first:

  • AFS_AETHER_SERVER_ENDPOINT or [bridge].server_endpoint
  • auth token source and token format
  • session ID
  • whether your deployment expects additional local credential context

For readable diagnostics:

export AFS_AETHER_LOG='debug'
export AFS_AETHER_LOG_FORMAT='text'
aether mount --session-id "$SESSION_ID" --mount-dir ./workdir

The runtime is up but behavior looks stale

Review the TTL and cache controls that shape freshness:

  • AFS_AETHER_LOOKUP_TTL_SECS
  • AFS_AETHER_NEGATIVE_LOOKUP_TTL_SECS
  • AFS_AETHER_DIR_CACHE_TTL_SECS
  • AFS_AETHER_CACHE_DIR
  • AFS_AETHER_CACHE_SHARED_ROOT

Start with Cache and performance, then use aether status --session-id "$SESSION_ID" and aether metrics show to confirm what the local runtime resolved.

Unmounts are stuck or the mount looks unhealthy

Use the cleanup path in order:

aether stop --session-id "$SESSION_ID"
aether fuse check
aether fuse cleanup --dry-run

Move to aether fuse cleanup --yes only when you are confident the mount is stale and the runtime is not doing useful work anymore.

Deep repo runbooks worth knowing

The repository already contains deeper operational material behind this site. The most useful Aether CLI-focused docs are:

  • docs/operations/start-aether.md for the full local server-plus-mount smoke test
  • docs/operations/aether-cli.md for CLI examples and command-family context
  • docs/runbooks/aether/aether-fuse-hang-debug.md for hung mounts and FUSE deadlock investigation
  • docs/runbooks/aether/fuse-cache-troubleshooting.md for stale-view and invalidation issues
  • docs/runbooks/aether/journal-repair.md for durability and replay recovery
  • docs/runbooks/pjdfstest-runbook.md for filesystem-behavior validation under load

Use this page as the bridge between the user-facing docs and those deeper repo docs.