SBBP
Written
— Updated
- "Should've been a Blog Post" is a project that downloads a video, generates a transcript using Whisper, and places that transcript along side screenshots from the video so you can follow along with slides.
- The name is a bit of snark, but I have two aims for this project:
- Some videos really should have just been blog posts, and I feel like I've just wasted time.
- Others are actually valuable, but as a parent with young kids my time is limited, and so this allows faster consumption or in settings where watching a video isn't feasible.
- Github Link
Task List
Up Next
- Host this somewhere so I can access it over the web and add to Readwise
- Maybe just host locally and expose subdomain
- Host this somewhere so I can access it over the web and add to Readwise
Soon
- Get thumbnail image from video metadata
- Summarize comments (can yt-dlp grab them?)
- For videos with chapters, summarize each chapter
- Table of contents with video chapters
- Improve reader layout
Later
- For longer videos without chapters, use LLM to try to generate chunks
- Click on/near a paragraph to play the video at that point.
- Better image similarity algorithm
- Some way for multiple users to link to the same video without downloading it each time (if opened to public)
- Probably put the video metadata itself in a separate table and have the user's video model link to that.
- Video Host whitelist? (if opened to public)
- Might be a hassle to maintain but also potentially prevents issues with shady content
Done
- Move processing into Rust
- Steps
- Download video
- Images
- Extract Images
- Similar Image Algorithm
- Transcript
- Extract Audio
- Transcription
- Summary of Transcription
- Steps
- Simple auth
- Allow storing files in cloud storage
- Use Deepgram for transcription
- Hosting: Store non-temporary downloaded files in Backblaze or Cloudflare object storage
- Integrate with Filigree
- Allow triggering video download and processing from inside the app — Nov 20th, 2023
- Queue up multiple videos for processing — Nov 20th, 2023
- Add "read" state and progress tracking — Nov 20th, 2023
- Speed up SSIM process — Nov 18th, 2023
- Generate a summary of the video — Nov 18th, 2023
- Use structural similarity (SSIM) metric to remove duplicate screenshots — Nov 17th, 2023
- Screenshot extraction
- Whisper transcript generation
- Align transcript to images and
- Move processing into Rust