2022-12-27
🔗- Got the import pipeline, model encoding, and embedding search working for Perceive. I ended up using the
instant-distance
crate to do the nearest neighbor searching, buthnsw_rs
looks good as well. - Tomorrow I'll look at generating better embeddings. The model I'm using cuts off the input at 250 tokens so I'm going to add something to cut documents up and do a weighted average of the resulting vectors for each piece. Might play around with some other methods too.
-