(Replying to PARENT post)

I still don't have a really good answer to this question:

If you want to be able to do Q&A against an existing corpus of documentation, can fine-tuning an LLM on that documentation get good results, or is that a waste of time compared to the trick where you search for relevant content and paste that into a prompt along with your question?

I see many people get excited about fine-tuning because they want to solve this problem.

The best answer I've seen so far is in https://github.com/openai/openai-cookbook/blob/main/examples...

> Although fine-tuning can feel like the more natural option—training on data is how GPT learned all of its other knowledge, after all—we generally do not recommend it as a way to teach the model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less reliable for factual recall. [...] In contrast, message inputs are like short-term memory. When you insert knowledge into a message, it’s like taking an exam with open notes. With notes in hand, the model is more likely to arrive at correct answers.

👤simonw🕑2y🔼0🗨️0

(Replying to PARENT post)

It's amazing how much misinformation and vague information there is on this topic. I tried getting to the bottom of this in the following post in the OpenAI forum:

https://community.openai.com/t/fine-tuning-myths-openai-docu...

Bottom line is that fine-tuning does not seem to be a feasible option for adding new knowledge to a model for question answering.

👤crosen99🕑2y🔼0🗨️0

(Replying to PARENT post)

The search+prompt approach has another benefit, which is that allows a chat interface to "cite it's source", something you really can't do with just parameters (fine-tuned or not.)

Although a lot of people are working on hallucination reduction, I think we're a long way off from that in the general case. So having the ability to point to a real piece of data, outside the model, is important for applications where accuracy matters.

👤lukev🕑2y🔼0🗨️0

(Replying to PARENT post)

I have similar questions for code assistance.

Github Copilot seems to be the most effective code assistant currently. It seems to use many heuristics for figuring out relevant snippets to include in prompts, like computing Jaccard similarity of windows of the last 20 opened files. It also tries some tree-sitter cleverness for some languages, but when I snoop on the HTTP traffic it seems to almost always just give up and only include the rest of a file as context.

I have wondered whether a model fine-tuned on my own code would do much better, and be simpler. But perhaps building embeddings and searching them (like in the article you linked) would be superior.

Code assistants need to be super low latency though which maybe complicates things too.

👤spenczar5🕑2y🔼0🗨️0

(Replying to PARENT post)

An answer from OpenAI will be very biased. OpenAI will happily take your money to query their models with long prompts. They will also happily take your money to compute embeddings and thus help you search (and lock you in!).

But, as far as I know, OpenAI will not help you fine-tune, they will not run a fine-tuned model for you, and they would probably prefer that you use their models over open models that can be fine tuned.

(None of this is to say that fine tuning is better. I’m just saying that OpenAI has a strong commercial bias.)

👤amluto🕑2y🔼0🗨️0

(Replying to PARENT post)

> the trick where you search for relevant content and paste that into a prompt

Supabase Clippy was the first docs site to ship this experience to production as far as I can tell: https://supabase.com/blog/chatgpt-supabase-docs

I believe they called it "context injection" and I have been following suit in my own writing on the topic.

I am prototyping experiences like Supabase Clippy and am also very interested in fine-tuning for docs Q&A. But my main question is: what exactly would the fine-tuning inputs and outputs look like for docs Q&A?

Edit: For Q&A the question is the input and the answer is the desired output? Is that right?

A more general comment about fine-tuning for docs from my blog:

> AI is all about prediction. Given this temperature, this wind, this day of the year, what is the chance of rain? Temperature, wind, and date are your inputs. Chance of rain is your desired output. Now, try to apply this same type of thinking towards documentation. What are your inputs? What’s your output? The page title and code block could be your inputs. Whether or not the code builds could be your output. Or maybe the code block should be the output? This is why I keep saying that applying fine-tuning to docs is tricky. What are the inputs and outputs?

https://technicalwriting.tools/posts/ten-principles-response...

(I am an AI n00b and have not looked deeply into how fine-tuning works but it's high on my list to experiment with OpenAI's fine-tuning API. Please LMK if I am getting any fundamentals wrong.)

👤kaycebasques🕑2y🔼0🗨️0

(Replying to PARENT post)

Langchain wrote about LLMs and SQL, and although it's about SQL, it's still a great read, especially the references. https://blog.langchain.dev/llms-and-sql/

I'm also super keen for the 32k API limit for OpenAI, that's going to be great.

👤thejosh🕑2y🔼0🗨️0

(Replying to PARENT post)

> If you want to be able to do Q&A against an existing corpus of documentation, can fine-tuning an LLM on that documentation get good results, or is that a waste of time compared to the trick where you search for relevant content and paste that into a prompt along with your question?

If you want exact detailed recall, then using a framework that provides search and recall (by embeddings or otherwise) is probably always going to beat fine tuning, but also remember, it doesn’t have to be either-or.

I mean, if you want a person to handle Q&A on a corpus, is it better for them to have studied it, or have direct access to the corpus with an appropriate index? The answer is clearly that its better if they’ve trained and have access to the corpus, and while LLMs aren’t the same as people, I think the answer for them is the same here.

👤dragonwriter🕑2y🔼0🗨️0

(Replying to PARENT post)

As long as the search part of search + prompt is good, the prompt part will emit accurate results or will say it couldn't find the answer. You can also cite the sources this way.

It seems pretty expensive because you may be pasting a lot of context into each query. If you allow the user to have follow up queries and you want to retain context of the conversation that seems expensive too. But it does seem like it should give the best results for Q&A as long as question is directly answered in your data somewhere.

👤furyofantares🕑2y🔼0🗨️0

(Replying to PARENT post)

With search, you don’t know if the model/search engine will retrieve the right context. With fine tuning you don’t know if it will forget important info, or learn incorrect things.

👤lumost🕑2y🔼0🗨️0