lsorber
📅 Joined in 2014
🔼 67 Karma
✍️ 55 posts
Load more
Show HN:
"RAGLite – A Python package for the unhobbling of RAG"
RAG applications can be magical when they work well, but anyone who has built one knows how much the output quality depends on the quality of retrieval and augmentation.
With RAGLite, we set out to unhobble RAG by mapping out all of its subproblems and implementing the best solutions to those subproblems. For example, RAGLite solves the chunking problem by partitioning documents in provably optimal level 4 semantic chunks. Another unique contribution is its optimal closed-form linear query adapter based on the solution to an orthogonal Procrustes problem. Check out the README for more features.
We'd love to hear your feedback and suggestions, and are happy to answer any questions!
(Replying to PARENT post)
Then, you want to parition the document into chunks. Late chunking pairs really well with semantic chunking because it can use late chunking's improved sentence embeddings to find semantically more cohesive chunks. In fact, you can cast this as a binary integer programming problem and find the ‘best’ chunks this way. See RAGLite [1] for an implementation of both techniques including the formulation of semantic chunking as an optimization problem.
Finally, you have a sequence of document chunks, each represented as a multi-vector sequence of sentence embeddings. You could choose to pool these sentence embeddings into a single embedding vector per chunk. Or, you could leave the multi-vector chunk embeddings as-is and apply a more advanced querying technique like ColBERT's MaxSim [2].
[1] https://github.com/superlinear-ai/raglite
[2] https://huggingface.co/blog/fsommers/document-similarity-col...
(Replying to PARENT post)
The benefit is that each sentence’s embedding is informed by all of the other sentences in the context. So when a sentence refers to “The company” for example, the sentence embedding will have captured which company that is based on the other sentences in the context.
This technique is called ‘late chunking’ [1], and is based on another technique called ‘late interaction’ [2].
And you can combine late chunking (to pool token embeddings) with semantic chunking (to partition the document) for even better retrieval results. For an example implementation that applies both techniques, check out RAGLite [3].
[1] https://weaviate.io/blog/late-chunking
[2] https://jina.ai/news/what-is-colbert-and-late-interaction-an...
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
(Replying to PARENT post)
1. Declare numbers as numbers in the configuration language. E.g. "decimal(1e1000)".
2. Parse declared numbers with a lossless format like Python's decimal.Decimal.
3. Let users decide at their own risk if they want to convert to a lossy format like float.
(Replying to PARENT post)
The implementation learns to play Battleship in about 2000 steps, pretty neat!