Skip to content

Seed Script

scripts/seed_git_history.py reads git log from a local repository and posts all commits to POST /api/v1/seed/commits in batches of 200.

Use this script once when first deploying Leliel against a repository that already has commit history. It populates Repo, Branch, and Commit nodes so that historical context is available in the graph before any CI builds have been ingested.


Prerequisites

  • Python 3.12 with requests installed
  • Leliel API running and reachable
  • A local clone of the repository to seed
# Install the requests library into your active Python environment before running
pip install requests

Usage

# Run the seed script to post all commits from a local git repository to Leliel
python scripts/seed_git_history.py \
  --repo-path /path/to/local/repo \
  --repo-name my-repo \
  --branch main \
  --api-url http://localhost:8081 \
  --api-key your-pipeline-key

Arguments

Argument Required Default Description
--repo-path Yes Path to the local git repository to read commit history from
--repo-name Yes Repo name as stored in the graph. Must match the repo field in build ingest payloads.
--branch No main Branch name to walk; uses --first-parent to follow the mainline only
--api-url Yes Knowledge API base URL, no trailing slash
--api-key Yes Pipeline key (KNOWLEDGE_API_KEY)
--dry-run No Parse and count commits but do not POST to the API

Dry run

Run with --dry-run first to confirm the repo path and commit count before writing any data:

# Parse commits from git log without sending any data to the Knowledge API
python scripts/seed_git_history.py \
  --repo-path /path/to/local/repo \
  --repo-name my-repo \
  --api-url http://localhost:8081 \
  --api-key your-pipeline-key \
  --dry-run

Output:

Found 312 commits in my-repo:main
[dry-run] No data posted.

Batch processing

Commits are posted in batches of 200. Progress is printed after each batch completes:

Found 312 commits in my-repo:main
  Seeded 200/312
  Seeded 312/312
Done. 312 commits written to graph.

Notes

The script uses git log --first-parent to walk only the mainline of the specified branch. Commits that arrived on the branch via a merge are not followed into their source branches. This keeps the commit graph clean and avoids importing large numbers of commits that belong to feature branches.

The --repo-name value controls the repo property stored on every Repo, Branch, and Commit node created by the seed. It must match the repo field used in build ingest payloads. If they differ, builds and seed commits will be stored under separate Repo nodes in the graph.

Re-seeding is safe

All seed writes use MERGE. Running the script multiple times against the same repo will update existing nodes without creating duplicates.