Git Repository Size Optimization: Techniques for Cleaning Large Files and History
The main reasons for a growing Git repository are committing large files (e.g., logs, videos), leftover large files in historical records, and unoptimized submodules. This leads to slow cloning/downloads, time-consuming backup transfers, and local operation lag. Cleanup methods: For recently committed but un-pushed large files, use `git rm --cached` to remove cached files, then re-commit and push. For large files in historical records, rewrite history with `git filter-repo` (install the tool, filter large files, and force push updates). After cleanup, verify with `git rev-list` to check for omissions. Ultimate solution: Batch cleanup can use `--path-glob` to match files. Large submodule files require prior cleanup before updating. Long-term optimization recommends Git LFS for managing large files (track large file types after installation to avoid direct commits). Always back up the repository before operations. Use force pushes cautiously in collaborative environments; ensure team confirmation before execution. Develop the habit of committing small files and using LFS for large files to keep the repository streamlined long-term.
Read More