Got the import pipeline, model encoding, and embedding search working for Perceive. I ended up using the
instant-distance crate to do the nearest neighbor searching, but
hnsw_rs looks good as well.
Tomorrow I'll look at generating better embeddings. The model I'm using cuts off the input at 250 tokens so I'm going to add something to cut documents up and do a weighted average of the resulting vectors for each piece. Might play around with some other methods too.