lsorber

✨ Co-founder & CTO at Superlinear (https://superlinear.eu)

📅 Joined in 2014

🔼 67 Karma

✍️ 55 posts

🌀

15 latest posts

(Replying to PARENT post)

For those who want to dive deeper, here’s a 300 LOC implementation of GRPO in pure NumPy: https://github.com/superlinear-ai/microGRPO

The implementation learns to play Battleship in about 2000 steps, pretty neat!

👤lsorber🕑4mo🔼0🗨️0

Show HN:

"RAGLite – A Python package for the unhobbling of RAG"

👤lsorber🕑1y🔼19🗨️0

(Replying to PARENT post)

The name ‘late chunking’ is indeed somewhat of a misnomer in the sense that the technique does not partition documents into document chunks. What it actually does is to pool token embeddings (of a large context) into say sentence embeddings. The result is that your document is now represented as a sequence of sentence embeddings, each of which is informed by the other sentences in the document.

Then, you want to parition the document into chunks. Late chunking pairs really well with semantic chunking because it can use late chunking's improved sentence embeddings to find semantically more cohesive chunks. In fact, you can cast this as a binary integer programming problem and find the ‘best’ chunks this way. See RAGLite [1] for an implementation of both techniques including the formulation of semantic chunking as an optimization problem.

Finally, you have a sequence of document chunks, each represented as a multi-vector sequence of sentence embeddings. You could choose to pool these sentence embeddings into a single embedding vector per chunk. Or, you could leave the multi-vector chunk embeddings as-is and apply a more advanced querying technique like ColBERT's MaxSim [2].

[1] https://github.com/superlinear-ai/raglite

[2] https://huggingface.co/blog/fsommers/document-similarity-col...

👤lsorber🕑1y🔼0🗨️0

(Replying to PARENT post)

You don’t have to reduce a long context to a single embedding vector. Instead, you can compute the token embeddings of a long context and then pool those into say sentence embeddings.

The benefit is that each sentence’s embedding is informed by all of the other sentences in the context. So when a sentence refers to “The company” for example, the sentence embedding will have captured which company that is based on the other sentences in the context.

This technique is called ‘late chunking’ [1], and is based on another technique called ‘late interaction’ [2].

And you can combine late chunking (to pool token embeddings) with semantic chunking (to partition the document) for even better retrieval results. For an example implementation that applies both techniques, check out RAGLite [3].

[1] https://weaviate.io/blog/late-chunking

[2] https://jina.ai/news/what-is-colbert-and-late-interaction-an...

[3] https://github.com/superlinear-ai/raglite

👤lsorber🕑1y🔼0🗨️0

(Replying to PARENT post)

Did you even need the D, wouldn't a PI controller be sufficient?

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

If TSMC buys its lithography machines, why should it even get any credit for 5nm at all?

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

Could you give an example of an unsolved riddle from linguistics?

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

In my opinion, the best solution to these issues is to:

1. Declare numbers as numbers in the configuration language. E.g. "decimal(1e1000)".

2. Parse declared numbers with a lossless format like Python's decimal.Decimal.

3. Let users decide at their own risk if they want to convert to a lossy format like float.

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

Where's the data that says Moore's law no longer holds? I see comments and articles asserting this but everytime with evidence. The data that I do find certainly still suggests Moore's law is doing fine.

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

Sounds great until the client realises they can hire someone else who does charge by the hour, saving them a massive 100k - 10k = 90k compared to your proposition.

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

Huh, makes a pretty big difference for us. We were using pandas' built-in to_parquet though, which seems to suffer from some overhead.

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

Are you sure about that? It depends on how Cloudflare defines what a cold start is. It might well include the initial loading of your code, with imports and init.

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

Have you benchmarked this against pickling those data files? In our experience, parquet's overhead isn't worth it for smaller data files.

👤lsorber🕑5y🔼0🗨️0

(Replying to PARENT post)

Looks neat. Are you considering a flake8 extension like bandit for easy adoption (in CI and in VS Code)?

👤lsorber🕑5y🔼0🗨️0