Files
karaokepedia/.github/copilot-instructions.md

5.3 KiB

Karaokepedia - Static Website Archive

Project Overview

This is a static HTML archive of Karaokepedia (karaoke.karaniwan.org), a semi-crowdsourced karaoke database focused on Japanese/anime songs available in Philippine karaoke machines. The site was deprecated and migrated to awitotaku.com, but this archive preserves the original content for reference.

Status: HTTrack mirror from December 29, 2025. Dockerized with nginx:alpine, automated builds via Gitea Actions.

Architecture

Static Content

  • Pure Static HTML: All files in karaoke.karaniwan.org/ are pre-rendered HTML pages (~17.6 MB, 2,309 files)
  • No backend: HTTrack mirror of Ruby on Rails app (visible in HTML comments), but this is static HTML only
  • Assets included: 2 MB of CSS/JS/fonts/images stay in-repo (self-contained archive)

Container Stack

  • Base image: nginx:alpine (~25 MB)
  • Web server: Custom nginx.conf with font MIME types, gzip, caching
  • Security: Runs as non-root user, includes healthcheck
  • Final size: ~35-40 MB total
  • CI/CD: Gitea Actions builds on push to main

Structure

.
├── karaoke.karaniwan.org/     # Static HTML content (served by nginx)
│   ├── songs/                 # Individual song pages
│   ├── songs*.html            # Paginated listings (hex-named)
│   ├── artists/               # Artist profiles
│   ├── karaoke_machines/      # Machine-specific listings
│   ├── tags/                  # Tag-based grouping
│   ├── assets/                # CSS/JS/images/fonts (~2 MB)
│   └── index.html             # Main entry point
├── Dockerfile                 # nginx:alpine with custom config
├── nginx.conf                 # Custom nginx configuration
├── .dockerignore              # Excludes HTTrack artifacts
├── .gitea/workflows/build.yml # CI/CD pipeline
├── hts-cache/                 # HTTrack metadata (not in image)
├── hts-log.txt                # HTTrack log (not in image)
└── index.html                 # HTTrack root nav (not in image)

Data Model (Static)

Each song page includes:

  • Song title and artist (linked)
  • Machine keys: KY (Kumyoung), TJ (TJ Media), P (Platinum) with numeric codes
  • Tags: Language (Japanese/English/Korean/OPM), genre (Pop/Rock/Metal), type (Anime OST/Drama OST)
  • Release dates and alternative names
  • Bootstrap 3 responsive layout

File naming: Hex-based (e.g., songs9285.html, songs02d1.html) for pagination/organization.

Development & Deployment

Local Testing

# Quick test with Python
python3 -m http.server 8000
# Visit http://localhost:8000/karaoke.karaniwan.org/

# Docker build and run
docker build -t karaokepedia:test .
docker run -p 8080:80 karaokepedia:test
# Visit http://localhost:8080/

# Check healthcheck
docker inspect --format='{{.State.Health.Status}}' <container-id>

Building for Production

# Build with tags
docker build -t karaokepedia:latest .
docker tag karaokepedia:latest your-registry/karaokepedia:latest

# Push to registry
docker push your-registry/karaokepedia:latest

CI/CD Pipeline (Gitea Actions)

  • Trigger: Push to main branch or manual dispatch
  • Workflow: .gitea/workflows/build.yml (GitHub Actions-compatible syntax)
  • Steps: Checkout → Setup Buildx → Login to registry → Build & push → Output digest
  • Tags: :latest and :main-<commit-sha>
  • Registry: Configure via secrets (DOCKER_USERNAME/DOCKER_PASSWORD for Docker Hub, or adapt for Gitea registry)

Registry Configuration

Edit .gitea/workflows/build.yml and uncomment the appropriate registry:

  • Docker Hub (default): Uses DOCKER_USERNAME and DOCKER_PASSWORD secrets
  • GitHub Container Registry: Uncomment GHCR section, uses GITHUB_TOKEN
  • Gitea Container Registry: Uncomment Gitea section, configure domain and credentials

Modifying Content

Since this is static HTML:

  1. Edit HTML directly - no templating system
  2. Manual updates across paginated files (songs*.html) if needed
  3. No regeneration tool - changes must be applied per-file

Navigation Conventions

  • Internal links are relative (e.g., ../songs/, ../../artists/)
  • Asset references use assets/application-*.css and assets/application-*.js (fingerprinted)
  • External services: Disqus comments (disabled), Piwik analytics (historic)

Key Patterns

  • Machine abbreviations: KY=Kumyoung (red label), TJ=TJ Media (orange label), P=Platinum (no color)
  • Alphabetical pagination: songs.html?initial=A, songs6c50.html?initial=A
  • Deprecated notices: All pages show "This site has been deprecated. Proceed to Awit Otaku"

HTTrack Artifacts

  • hts-cache/: Mirror metadata (new.lst lists all mirrored URLs)
  • hts-log.txt: Download log (2312 links, 2309 files, 2 404 errors)
  • cookies.txt: Session cookies from mirroring
  • Root index.html: HTTrack navigation page

What NOT to Do

  • Don't look for package.json, Gemfile, or build configs - they don't exist
  • Don't try to "npm install" or "bundle install" - this isn't a dev project
  • Don't modify HTTrack files (hts-cache/, hts-log.txt) - they're archival metadata
  • Don't expect dynamic search/filtering - all functionality is static HTML links
  • Don't include HTTrack artifacts in Docker image - they're excluded via .dockerignore