2022-12-27
🔗Got the import pipeline, model encoding, and embedding search working for Perceive. I ended up using the instant-distance
crate to do the nearest neighbor searching, but hnsw_rs
looks good as well.
Tomorrow I'll look at generating better embeddings. The model I'm using cuts off the input at 250 tokens so I'm going to add something to cut documents up and do a weighted average of the resulting vectors for each piece. Might play around with some other methods too.