- Got the import pipeline, model encoding, and embedding search working for Perceive. I ended up using the
instant-distancecrate to do the nearest neighbor searching, but
hnsw_rslooks good as well.
- Tomorrow I'll look at generating better embeddings. The model I'm using cuts off the input at 250 tokens so I'm going to add something to cut documents up and do a weighted average of the resulting vectors for each piece. Might play around with some other methods too.