Journals for

2023-01-04

🔗

Small Perceive update today. Given a particular item, you can find other items that are most similar. Since the semantic search is already basically a similarity match, this was just a matter of changing the code to read the embeddings vector for the item from the database, instead of creating one from a typed-in string.

2023-01-03

🔗

Added support for semantic search on Browser bookmarks. It's very convenient that Chrome's metadata files are all just SQLite databases or JSON files. I'm thinking that bookmark management is going to become a first-class feature of Perceive, so you can get semantic search not only on bookmarks imported from the browser, but can add bookmarks inside the tool itself and search through them as well. So next up, going to try out a GUI in Tauri.

2023-01-02

🔗

Did various cleanup on Perceive over the weekend, and cleaned up the HTML parsing which had previously been removing a lot of spaces between words. Since a lot of the data comes from the browsing history and similar online sources, I added a command to allow reprocessing all the data without downloading it again.

This brought about an unexpected issue. I ended up with a data processing pipeline deadlock where all the Rayon thread pool's threads were waiting on blocking channel sends. Then a different stage later in the pipeline which also used Rayon was unable to get any threads to do anything, and so no progress was made.

Attaching with the debugger was very useful here. I had my suspicions, mostly from eliminating pretty much every other potential cause, but looking at the call stacks of all the different threads made it very obvious.

Fortunately the solution was easy. It turns out Rayon lets you create separate thread pools, and so I did exactly that to remove contention. A couple hours of debugging, and only a couple minutes to make the fix.