Journals for

2024-03-01

🔗

Smelter had been on hold for a while, but this week at work a need came up for parallel processing, so I finished it up and ran the first real job!

This data processing pipeline formerly took 6 hours to run with a single process on a 24 vCPU machine, and now runs in 23 minutes using 32 8vCPU containers on Fargate.

The original vision for Smelter was to run it with Lambdas and other similar FaaS runtimes, but because it uses an adapter model to support new platforms, I was able to add Fargate with no changes to the existing code. Really happy how this turned out.

I do still want to implement the originally envisioned massively-parallel Lambda computation at some point too, for more interactive use cases.

Building

  • Filigree
    • Started on many-to-many relationships using a through table and model
  • Smelter
    • Tested out the Fargate orchestration code with real containers and made some fixes
    • Ergonomic improvements to spawning Fargate tasks
    • First real run of Smelter! This is a data processing pipeline at work that formerly ran with a single process, now running with 32 containers in parallel across the dataset.

Thanks for reading! If you have any questions or comments, please send me a note on Twitter.