Notebook Output Filtering
Overview
This repository uses nbstripout to automatically strip outputs from Jupyter notebooks before they are committed to git. This keeps the repository clean and prevents:
Large binary blobs from plots/images
Merge conflicts in output cells
Unnecessary repository bloat
Potential data leaks in outputs
Setup
The filtering is configured via:
Git filter (configured by
nbstripout --install):filter.nbstripout.clean = python -m nbstripout filter.nbstripout.smudge = cat filter.nbstripout.required = true
.gitattributes file:
*.ipynb filter=nbstripout
This means:
When you commit: outputs are stripped
When you checkout: notebooks are left as-is
If the filter fails: commit is prevented
Verification
The setup was verified with these tests:
Test 1: nbstripout works
# Created test notebook with outputs
nbstripout /tmp/test_notebook.ipynb
# Verified: outputs stripped ✅
Test 2: Git filter works
# Tested git hash-object with filter
git hash-object -w --stdin --path=test.ipynb < notebook_with_outputs.ipynb
# Verified: filter applied ✅
Test 3: Repository notebooks
# Checked all notebooks in repo
python -c "import json; ..."
# Result: 0 cells with outputs ✅
Usage
For Contributors
No action needed! The filter runs automatically:
# Edit notebook, run cells, save with outputs
jupyter notebook examples/my-notebook.ipynb
# Commit normally - outputs are stripped automatically
git add examples/my-notebook.ipynb
git commit -m "Add new analysis notebook"
# Outputs are stripped in the commit, but your local file keeps them
Installing nbstripout (for new clones)
If you clone the repository on a new machine:
# The filter is already configured in .git/config after clone
# Just ensure nbstripout is installed:
pip install nbstripout
# Or install from requirements.txt which includes it
pip install -r requirements.txt
The git filter configuration is stored in .git/config and will be set up when you run nbstripout --install (already done for this repository).
Checking Notebook Status
To see if a notebook has outputs that will be stripped:
# Check a specific notebook
python -c "
import json
with open('examples/my-notebook.ipynb') as f:
nb = json.load(f)
outputs = sum(1 for c in nb['cells'] if c.get('outputs'))
print(f'Cells with outputs: {outputs}')
"
Manual Stripping (if needed)
# Strip outputs from a notebook manually
nbstripout examples/my-notebook.ipynb
# Strip outputs from all notebooks
find . -name '*.ipynb' -not -path '*/.ipynb_checkpoints/*' -exec nbstripout {} \;
Testing the Filter
To verify the filter is working:
# 1. Create a test notebook with outputs
jupyter notebook /tmp/test.ipynb
# (run some cells)
# 2. Try to add and see what would be committed
git add /tmp/test.ipynb
git diff --cached /tmp/test.ipynb
# The diff should show outputs are stripped
Configuration Files
.gitattributes
*.ipynb filter=nbstripout
.git/config (automatically created)
[filter "nbstripout"]
clean = python -m nbstripout
smudge = cat
required = true
.gitignore (already configured)
.ipynb_checkpoints/
Benefits
Smaller repository: No binary output data committed
Cleaner diffs: Only source code changes visible
No merge conflicts: Outputs can’t conflict
Faster clones: Less data to download
Privacy: No accidental data leaks in outputs
Troubleshooting
Filter not working?
Check the configuration:
git config --local filter.nbstripout.clean
# Should output: python -m nbstripout
nbstripout not found?
Install it:
pip install nbstripout
Want to commit outputs (unusual)?
Temporarily disable the filter:
git -c filter.nbstripout.clean= add notebook.ipynb