How Git-Archiver Works

A serverless architecture for preserving open source software forever, powered by Cloudflare Workers and GitHub infrastructure.

Overview

Git-Archiver is a free service that creates permanent archives of public GitHub repositories. When a repository gets deleted, goes private, or becomes unavailable, your archived copy remains accessible through GitHub Releases.


The entire system is serverless, free to operate, and open source. It uses existing GitHub infrastructure to store archives with no additional hosting costs.

System Architecture

flowchart TB
    subgraph User["🌐 User Interface"]
        A[Web Browser]
    end

    subgraph Edge["⚡ Edge Layer"]
        B[Cloudflare Worker
API Gateway] C[(KV Store
Rate Limits)] end subgraph GitHub["🐙 GitHub Infrastructure"] D[GitHub Issues
Job Queue] E[GitHub Actions
Workflow Runner] F[GitHub Releases
Archive Storage] G[index.json
Repository Index] end A -->|1. Submit URL| B B -->|2. Rate Check| C B -->|3. Create Issue| D D -->|4. Trigger| E E -->|5. Clone & Archive| F E -->|6. Update| G A -->|7. Browse/Download| B B -->|8. Fetch Index| G B -->|9. Proxy Downloads| F style A fill:#1a1e2a,stroke:#00d4ff,color:#f1f5f9 style B fill:#12151e,stroke:#00d4ff,color:#00d4ff style C fill:#12151e,stroke:#7c3aed,color:#7c3aed style D fill:#12151e,stroke:#f59e0b,color:#f59e0b style E fill:#12151e,stroke:#10b981,color:#10b981 style F fill:#12151e,stroke:#00d4ff,color:#00d4ff style G fill:#12151e,stroke:#00d4ff,color:#00d4ff

Archive Request Flow

sequenceDiagram
    participant U as User
    participant W as Cloudflare Worker
    participant KV as KV Store
    participant GH as GitHub API
    participant A as GitHub Actions
    participant R as GitHub Releases

    U->>W: POST /submit {url}
    W->>KV: Check rate limit
    KV-->>W: OK (remaining: 9)
    W->>GH: Validate repo exists
    GH-->>W: 200 OK (public, 50MB)
    W->>GH: Check pending issues
    GH-->>W: No duplicates
    W->>GH: Create issue with label
    GH-->>W: Issue #123 created
    W-->>U: 201 Created {issue_url}

    Note over GH,A: Webhook triggers workflow

    A->>GH: Clone repository
    A->>A: Create .tar.gz archive
    A->>R: Upload to release
    A->>R: Update index.json
    A->>GH: Close issue #123
                

Step-by-Step Process

  • Validation - The worker validates the GitHub URL format and checks if the repository exists and is public
  • Rate Limiting - Cloudflare KV tracks requests per IP to prevent abuse (10 requests/hour)
  • Deduplication - Checks for existing pending requests and archives created today
  • Issue Creation - Creates a GitHub issue with the archive-request label
  • Workflow Trigger - GitHub Actions workflow automatically triggers on new issues
  • Archiving - The workflow clones the repo, creates a tarball, and uploads to GitHub Releases
  • Index Update - A central index.json file tracks all archived repositories

Data Storage Architecture

flowchart LR
    subgraph Releases["GitHub Releases"]
        direction TB
        I[("📋 index
index.json")] R1["📦 owner__repo__2024-01-15
archive.tar.gz + README.md"] R2["📦 facebook__react__2024-01-14
archive.tar.gz + README.md"] R3["📦 torvalds__linux__2024-01-10
archive.tar.gz + README.md"] end subgraph Index["index.json Structure"] direction TB J["{ repositories: {
'owner/repo': {
versions: [...],
latest_tag: '...',
total_size: 123456
}
}}"] end I --> J style I fill:#12151e,stroke:#00d4ff,color:#00d4ff style R1 fill:#12151e,stroke:#7c3aed,color:#f1f5f9 style R2 fill:#12151e,stroke:#7c3aed,color:#f1f5f9 style R3 fill:#12151e,stroke:#7c3aed,color:#f1f5f9 style J fill:#1a1e2a,stroke:#00d4ff,color:#94a3b8

Storage Details

  • Tag Format - Each archive uses the format owner__repo__YYYY-MM-DD
  • Archive Contents - Each release contains a .tar.gz of the repository and the original README.md
  • Version History - Multiple versions can exist for the same repository (archived on different dates)
  • Size Limits - Maximum repository size is 2GB (GitHub Release asset limit)
  • Index Backup - The index is backed up before updates to prevent data loss

Technology Stack

Cloudflare Workers

Serverless JavaScript runtime at the edge. Handles API requests, CORS, rate limiting, and response caching with global distribution.

🗄️

Cloudflare KV

Global key-value storage for distributed rate limiting. Provides consistent state across all edge locations.

🔄

GitHub Actions

CI/CD platform that runs the archiving workflow. Triggered by issues with automatic retry and error handling.

📦

GitHub Releases

Permanent storage for archived repositories. Supports files up to 2GB with unlimited total storage.

🎫

GitHub Issues

Used as a job queue for archive requests. Provides transparency and allows users to track their requests.

🌐

Static Frontend

Vanilla HTML, CSS, and JavaScript. No build step required. Hosted on any static hosting platform.

API Reference

Endpoints

POST /submit

Submit a single repository for archiving. Body: {"url": "https://github.com/owner/repo"}

POST /bulk-submit

Submit up to 20 repositories at once. Body: {"urls": ["...", "..."]}

GET /index

Fetch the master index of all archived repositories. Cached for 5 minutes.

GET /readme?owner=X&repo=Y

Fetch the README for an archived repository. Cached for 1 hour.

GET /status?owner=X&repo=Y

Check if the original repository is still online. Cached for 1 minute.

Rate Limits

To prevent abuse and ensure fair usage, the following rate limits apply:

  • Submit: 10 requests per hour per IP
  • Bulk Submit: 3 requests per hour per IP
  • Index: 60 requests per minute per IP
  • Status/README: 30 requests per minute per IP

Rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) are included in all API responses.

Why Archive Repositories?

🛡️

Protection Against Deletion

Developers sometimes delete repositories. Your archive ensures critical dependencies remain accessible.

🔒

Private Transitions

Public repos can go private. Keep a snapshot of code that was once freely available.

⚖️

DMCA Takedowns

Sometimes repositories face legal challenges. Archives preserve historical record.

📜

Historical Research

Study how projects evolved. Multiple dated archives let you track changes over time.

Open Source

Git-Archiver is completely open source. You can:

  • View the complete source code on GitHub
  • Deploy your own instance using the provided documentation
  • Contribute improvements via pull requests
  • Report issues or request features
MIT License Cloudflare Workers GitHub Actions Vanilla JS