Perceive

Written — Updated
  • This project was an experiment from December 2022, in indexing a bunch of personal data and performing semantic search on it via embedding similarity. Everything on the backend was done in Rust just to make it hard. :)
  • Tauri app
    • Loading bar for initial load
    • Search field
    • real-time search as you type (without highlighting)
    • ability to select different sources, source categories
  • Asymmetric Search
  • rust-bert on M1
    • 1. The latest published version of the rust-bert crate is using tch-rs 0.8, but you need to use at least 0.10 instead. The git version already uses the newest version, so you can just set that in your Cargo.toml: rust-bert = { git = "[github.com/guillaume-be/rust-b](https://github.com/guillaume-be/rust-bert.git)" }
    • 2. You need to install pytorch manually. The simplest method is to just install it globally via Homebrew or pip3, though a more robust method would be to install a local copy in a python venv or something, and reference it from there.
    • 3. Set the LIBTORCH environment variable to wherever you have libtorch installed. With the homebrew method this is LIBTORCH=/opt/homebrew/opt/pytorch
    • 4. Tell the linker where to find it in your Cargo config.
  • Initial Sprint Reflections

    • Still much easier to do ML stuff in Python, though rust-bert was hugely useful in implementing a lot of this
    • Setting up libtorch
    • Web scraping browser history is kind of a hassle
      • Lots of pages require authentication
      • Github really hates it, even for public pages
      • Content extraction for HTML
        • Readability works well, rust port needs some work
        • In the future would probably run a sidecar that hosts the up-to-date JS version
    • Rayon thread pool exhaustion
    • Model choice matters a lot
    • Running bulk inference without GPU support is still slow for the larger models.
    • ndarray is great
    • Future work
      • Better article scraping
      • ML model feedback
      • More integrations
      • OpenAI integration to allow running on less powerful systems

Thanks for reading! If you have any questions or comments, please send me a note on Twitter.