Building a 3D Scene from 30 Photos: Getting Gaussian Splatting Running on Colab
I wanted to touch Gaussian Splatting (3DGS), but every time I tried to start with the paper I got stuck at the math. After a few rounds of stalling, I tried running it on Colab first. Once "what goes in, what comes out" was visible on my own screen, the paper was much easier to come back to. This is the record of doing it in that order.
The deeper theory is already covered better by other writers. Here I am only writing about "what runs, in the first 30 minutes."
The setup
| Item | Value | |---|---| | Environment | Google Colab (GPU runtime: T4 worked) | | Training data | NeRF Synthetic dataset (the one the official repo recommends) | | Time | ~5 min data download + 15-30 min training + ~5 min rendering |
You do not need prior knowledge of 3D graphics or NeRF. If you can read `pip install`, you can follow along. I'll keep the gotchas at the end.
What is Gaussian Splatting actually doing
Roughly: "Given a set of photos, place millions of transparent ellipsoids (Gaussians) in space such that, from any viewing angle, the rendered view matches the photo of that angle."
Unlike textbook 3D models (collections of triangle polygons), Gaussian Splatting represents the scene as a swarm of points, each with position, color, opacity, and shape. Things that are hard to express with polygons — fur, smoke — come out reasonably natural.
Compared with NeRF (Neural Radiance Fields), the difference is sharper:
| Item | NeRF | Gaussian Splatting | |---|---|---| | Scene representation | Neural network weights (black box) | Explicit 3D Gaussian point cloud (exportable) | | Render speed | Slow (seconds per frame) | Fast (real-time) | | Train speed | Slow (hours) | Fast (tens of minutes) | | Compatibility with downstream tools | Poor | Easy to drop into Blender etc. |
If NeRF is "an AI that memorized the scene," Gaussian Splatting is "a point cloud file of the scene." The fact that the output comes out as a file you can hand to other tools later was, personally, the bigger differentiator than the speed.
The data flow, before touching code
Before any code, mapping "what goes in / what comes out":
[Input]
├── Image files (PNG/JPG) 30-300 per scene
├── Camera info (transforms_train.json) Where each image was shot from
└── Initial point cloud (points3D.ply) Present in COLMAP datasets; absent
in NeRF Synthetic — random init works
↓ Training (iterative optimization)
[Intermediate] Gaussian point cloud: hundreds of thousands to millions Per point: position (x,y,z), color (RGB), opacity, shape (covariance)
↓ Render
[Output] 2D images from arbitrary viewpoints (real-time) ```
The thing to notice early: **training does not start without the camera-info JSON**. "Just a folder of photos" is not enough — you need per-image camera pose data.
If you want to start from real-world photos, you'd run COLMAP (a Structure-from-Motion tool) first to produce that JSON. Today I'm using the NeRF Synthetic dataset, which ships with the JSON already attached. For a first run, that's the easier path.
Running it on Colab — the steps that actually worked
The official repo (graphdeco-inria/gaussian-splatting) README is thorough but written for serious users. Getting it to work in Colab in one pass needs a bit of decoding. When I tried it straight, the first data-download URL on the README was already broken, and the Google Drive mirror was rate-limited. Below are the cells that finally got me through, in the order I ran them. Open a fresh Colab notebook, set the runtime to **GPU (T4)**, and run from the top.
1. Confirm the GPU is actually attached
If no GPU name shows up, fix the runtime type and reconnect before going further. Every later step assumes a GPU.
!nvidia-smi --query-gpu=name,memory.total --format=csv
import torch
print(torch.__version__, torch.cuda.is_available(), torch.version.cuda)
2. Clone the repo (recursively)
`diff-gaussian-rasterization` and `simple-knn` are submodules. Forget `--recursive` and you'll hit "directory is empty" errors later in the build.
!git clone --recursive https://github.com/graphdeco-inria/gaussian-splatting
%cd /content/gaussian-splatting
3. Build the CUDA extensions
This is the most failure-prone step. Print both the Colab-side CUDA and the PyTorch-side CUDA before installing — if they don't match, the build will silently produce broken binaries.
!nvcc --version | tail -1
import torch; print('torch CUDA:', torch.version.cuda)
!pip install -q ./submodules/diff-gaussian-rasterization
!pip install -q ./submodules/simple-knn
!pip install -q plyfile tqdm
4. Get the dataset
The official tutorial's `cseweb.ucsd.edu` `nerf_synthetic.zip` URL redirected to 404 when I tried it. The Google Drive mirror gets blocked by `gdown` access limits a lot. What worked was pulling just the `lego` scene from a HuggingFace mirror that has the dataset extracted.
!pip install -q huggingface_hub
from huggingface_hub import snapshot_download
path = snapshot_download(
repo_id='pablovela5620/nerf-synthetic-mirror',
repo_type='dataset',
allow_patterns=['lego/**'],
local_dir='/content/data',
)
!ls /content/data/lego
5. Train
On T4 (15 GB), 7,000 iterations is the safe zone. NeRF Synthetic has a white background so I pass `--white_background`.
!python train.py -s /content/data/lego --iterations 7000 --white_background --eval
6. Render
Outputs land in `output/<random runid>/`. To avoid hunting for the runid manually, grab the latest run programmatically and render from it.
import glob, os
RUN = sorted(glob.glob('/content/gaussian-splatting/output/*'), key=os.path.getmtime)[-1]
!python render.py -m {RUN}
Rendered images appear in `output/<runid>/test/ours_7000/renders/`. If the lego excavator's shape reconstructs across all the test viewpoints, training worked.
Gotchas (things that genuinely stopped me)
CUDA build fails
If `diff-gaussian-rasterization` won't build, it's almost always that the Colab CUDA and the PyTorch CUDA are mismatched. `!nvcc --version` vs `torch.version.cuda` — print both and you'll see the gap. Reinstall PyTorch against the right CUDA and the build goes through.
Confusion about the initial point cloud (`points3D.ply`)
I read the COLMAP docs and convinced myself that "every scene needs `sparse/0/points3D.ply`." Actually that's only for real-world photos run through COLMAP. **NeRF Synthetic only needs `transforms_*.json` and `train.py` starts from a random initial point cloud.** Once I separated "COLMAP-derived datasets need points3D" from "synthetic datasets don't," the confusion cleared.
CUDA out of memory
As Gaussian count grows, VRAM evaporates. Free-tier T4 (15 GB) is safe up to about 7,000 iterations; 30,000 sometimes OOMs. Start at 7,000, confirm it runs, then push iterations up.
Data URLs are unreliable
Saying this twice because it tripped me up most: the official `wget` URL redirected to 404, the Google Drive mirror was rate-limited. HuggingFace mirror worked for me, but distribution endpoints come and go. **Before anything else, just confirm the dataset actually downloads** — that's the most time-saving check.
Viewing the output
After training, the final Gaussian point cloud is at `output/<runid>/point_cloud/iteration_<N>/point_cloud.ply`. Open it with one of:
- [SuperSplat](https://playcanvas.com/supersplat/editor): browser-based `.ply` viewer. Drag and drop, lowest barrier to entry. - Blender + Gaussian Splatting add-on: useful when composing with other 3D scenes downstream. - The SIBR viewer bundled with the official repo: requires local Linux/Windows.
For a first viewing, SuperSplat is the easiest. Throw the `.ply` at the browser, rotate it.
After running it once
Before running, I had Gaussian Splatting in my head as "some heavy 3D generation thing." After running it once on Colab, the actual flow — "feed in photos and camera info → iterative optimization runs → point cloud file comes out" — turned out to be not that complicated. Data prep is more work than the model is.
When the 7,000-iteration render came back with the same lego excavator reconstructing from every test angle, that was the moment "ah, this is actually a 3D scene" clicked.
Trying to read the paper first and stalling on the math, vs. running it first and then reading — for me, running first made the paper readable. The math equations stopped being opaque; they were now "ah, this parameter, this calculation." Some people find it uncomfortable to skip the theory first. I'd say: this order is fine.
Where to push next
After the first run goes through, three directions look interesting:
1. **Run on your own data** — your phone shoots a video, COLMAP produces `transforms.json`, train on that. Adds the COLMAP step. 2. **Move off Colab** — Colab runtimes die at 12 hours. For serious training, [Modal](https://modal.com/) charges by the second and lets long runs survive. 3. **Read for the parameters** — read [scomup's Japanese explainer](https://qiita.com/scomup/items/d5790da25a846e645de1) while you adjust learning rates and Gaussian split thresholds. Reading while running settles things in.
References
- [graphdeco-inria/gaussian-splatting (official)](https://github.com/graphdeco-inria/gaussian-splatting) - [Gaussian Splatting on Colab (classmethod, JP)](https://dev.classmethod.jp/articles/3d-gaussian-splatting-on-colab/) - [Gaussian Splatting explainer (scomup, JP)](https://qiita.com/scomup/items/d5790da25a846e645de1) - [SuperSplat (.ply viewer)](https://playcanvas.com/supersplat/editor) - [NeRF Synthetic mirror (HuggingFace)](https://huggingface.co/datasets/pablovela5620/nerf-synthetic-mirror) - [Modal Labs docs](https://modal.com/docs/examples/hello_world)
---
*Originally published in Japanese at [Qiita](https://qiita.com/nomurasan/items/60c23f318ee4f0d93ca5). Same author writing as "nomuraya / shimajima / nomurasan / 中翔" across media. English version adapted rather than literally translated.*