Daniel Imfeld

2022-12-27

🔗

Got the import pipeline, model encoding, and embedding search working for Perceive. I ended up using the instant-distance crate to do the nearest neighbor searching, but hnsw_rs looks good as well.

Tomorrow I'll look at generating better embeddings. The model I'm using cuts off the input at 250 tokens so I'm going to add something to cut documents up and do a weighted average of the resulting vectors for each piece. Might play around with some other methods too.

Terminal Output of Search

Journals for 2022-12-27

2022-12-27