A serverless architecture for preserving open source software forever, powered by Cloudflare Workers and GitHub infrastructure.
Git-Archiver is a free service that creates permanent archives of public GitHub repositories. When a repository gets deleted, goes private, or becomes unavailable, your archived copy remains accessible through GitHub Releases.
The entire system is serverless, free to operate, and open source. It uses existing GitHub infrastructure to store archives with no additional hosting costs.
flowchart TB
subgraph User["🌐 User Interface"]
A[Web Browser]
end
subgraph Edge["⚡ Edge Layer"]
B[Cloudflare Worker
API Gateway]
C[(KV Store
Rate Limits)]
end
subgraph GitHub["🐙 GitHub Infrastructure"]
D[GitHub Issues
Job Queue]
E[GitHub Actions
Workflow Runner]
F[GitHub Releases
Archive Storage]
G[index.json
Repository Index]
end
A -->|1. Submit URL| B
B -->|2. Rate Check| C
B -->|3. Create Issue| D
D -->|4. Trigger| E
E -->|5. Clone & Archive| F
E -->|6. Update| G
A -->|7. Browse/Download| B
B -->|8. Fetch Index| G
B -->|9. Proxy Downloads| F
style A fill:#1a1e2a,stroke:#00d4ff,color:#f1f5f9
style B fill:#12151e,stroke:#00d4ff,color:#00d4ff
style C fill:#12151e,stroke:#7c3aed,color:#7c3aed
style D fill:#12151e,stroke:#f59e0b,color:#f59e0b
style E fill:#12151e,stroke:#10b981,color:#10b981
style F fill:#12151e,stroke:#00d4ff,color:#00d4ff
style G fill:#12151e,stroke:#00d4ff,color:#00d4ff
sequenceDiagram
participant U as User
participant W as Cloudflare Worker
participant KV as KV Store
participant GH as GitHub API
participant A as GitHub Actions
participant R as GitHub Releases
U->>W: POST /submit {url}
W->>KV: Check rate limit
KV-->>W: OK (remaining: 9)
W->>GH: Validate repo exists
GH-->>W: 200 OK (public, 50MB)
W->>GH: Check pending issues
GH-->>W: No duplicates
W->>GH: Create issue with label
GH-->>W: Issue #123 created
W-->>U: 201 Created {issue_url}
Note over GH,A: Webhook triggers workflow
A->>GH: Clone repository
A->>A: Create .tar.gz archive
A->>R: Upload to release
A->>R: Update index.json
A->>GH: Close issue #123
archive-request label
flowchart LR
subgraph Releases["GitHub Releases"]
direction TB
I[("📋 index
index.json")]
R1["📦 owner__repo__2024-01-15
archive.tar.gz + README.md"]
R2["📦 facebook__react__2024-01-14
archive.tar.gz + README.md"]
R3["📦 torvalds__linux__2024-01-10
archive.tar.gz + README.md"]
end
subgraph Index["index.json Structure"]
direction TB
J["{ repositories: {
'owner/repo': {
versions: [...],
latest_tag: '...',
total_size: 123456
}
}}"]
end
I --> J
style I fill:#12151e,stroke:#00d4ff,color:#00d4ff
style R1 fill:#12151e,stroke:#7c3aed,color:#f1f5f9
style R2 fill:#12151e,stroke:#7c3aed,color:#f1f5f9
style R3 fill:#12151e,stroke:#7c3aed,color:#f1f5f9
style J fill:#1a1e2a,stroke:#00d4ff,color:#94a3b8
owner__repo__YYYY-MM-DD.tar.gz of the repository and the original README.mdServerless JavaScript runtime at the edge. Handles API requests, CORS, rate limiting, and response caching with global distribution.
Global key-value storage for distributed rate limiting. Provides consistent state across all edge locations.
CI/CD platform that runs the archiving workflow. Triggered by issues with automatic retry and error handling.
Permanent storage for archived repositories. Supports files up to 2GB with unlimited total storage.
Used as a job queue for archive requests. Provides transparency and allows users to track their requests.
Vanilla HTML, CSS, and JavaScript. No build step required. Hosted on any static hosting platform.
POST /submit
Submit a single repository for archiving. Body: {"url": "https://github.com/owner/repo"}
POST /bulk-submit
Submit up to 20 repositories at once. Body: {"urls": ["...", "..."]}
GET /index
Fetch the master index of all archived repositories. Cached for 5 minutes.
GET /readme?owner=X&repo=Y
Fetch the README for an archived repository. Cached for 1 hour.
GET /status?owner=X&repo=Y
Check if the original repository is still online. Cached for 1 minute.
To prevent abuse and ensure fair usage, the following rate limits apply:
Rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) are included in all API responses.
Developers sometimes delete repositories. Your archive ensures critical dependencies remain accessible.
Public repos can go private. Keep a snapshot of code that was once freely available.
Sometimes repositories face legal challenges. Archives preserve historical record.
Study how projects evolved. Multiple dated archives let you track changes over time.
Git-Archiver is completely open source. You can: