Retrieval-Augmented Generation is one of the most practical LLM patterns to emerge in the last two years. The idea is simple: instead of relying on a model's parametric memory, you fetch relevant context at query time and stuff it into the prompt.
The pipeline
At a high level: chunk documents, embed chunks, index embeddings, then at query time embed query, find nearest neighbours, inject into prompt, and generate answer. Each of those steps is a place things can go wrong.
0 Comments