diff --git a/content/patterns/rag-quickstart/_index.adoc b/content/patterns/rag-quickstart/_index.adoc new file mode 100644 index 0000000000..037bd55245 --- /dev/null +++ b/content/patterns/rag-quickstart/_index.adoc @@ -0,0 +1,31 @@ +--- +title: RAG AI Quickstart +date: 2026-05-13 +tier: sandbox +summary: This pattern deploys the RAG AI Quickstart with test pipelines on CPU or GPU. +rh_products: + - Red Hat OpenShift Container Platform + - Red Hat OpenShift GitOps + - Red Hat OpenShift AI +industries: + - General +aliases: /rag-quickstart/ +links: + github: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag + install: getting-started + bugs: https://github.com/validatedpatterns-sandbox/ai-quickstart-rag/issues + feedback: https://docs.google.com/forms/d/e/1FAIpQLScI76b6tD1WyPu2-d_9CCVDr3Fu5jYERthqLKJDUGwqBg7Vcg/viewform +--- +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] + +include::modules/rag-quickstart-about.adoc[leveloffset=+1] + +include::modules/rag-quickstart-architecture.adoc[leveloffset=+1] + +[id="next-steps-rag-quickstart"] +== Next steps + +* link:rag-quickstart-getting-started[Install this pattern.] diff --git a/content/patterns/rag-quickstart/cluster-sizing.adoc b/content/patterns/rag-quickstart/cluster-sizing.adoc new file mode 100644 index 0000000000..996a2347d7 --- /dev/null +++ b/content/patterns/rag-quickstart/cluster-sizing.adoc @@ -0,0 +1,13 @@ +--- +title: Cluster sizing +weight: 30 +aliases: /rag-quickstart/cluster-sizing/ +--- + +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] +include::modules/ai-quickstart-rag/metadata-ai-quickstart-rag.adoc[] + +include::modules/cluster-sizing-template.adoc[] diff --git a/content/patterns/rag-quickstart/customizing-this-pattern.adoc b/content/patterns/rag-quickstart/customizing-this-pattern.adoc new file mode 100644 index 0000000000..7c3fe9c7fd --- /dev/null +++ b/content/patterns/rag-quickstart/customizing-this-pattern.adoc @@ -0,0 +1,126 @@ +--- +title: Customizing this pattern +weight: 20 +aliases: /rag-quickstart/customizing/ +--- + +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] + +[id="customizing-rag-quickstart"] +== Customizing the RAG AI Quickstart pattern + +Without any changes, this pattern runs a CPU-backed LLM and does not require a GPU. This can be limiting in terms of usable models as well as speed, so you might want to use a GPU instead. + +[id="enabling-gpu"] +=== Enabling GPU support + +To enable GPU support, set `global.device` to `gpu` in `values-global.yaml` and push your changes to GitHub. This adds NFD and the NVIDIA GPU Operator to the pattern installation and enables the models to run using an NVIDIA accelerator. + +[NOTE] +==== +If you are running this pattern on an OpenShift cluster on AWS, setting `global.device` to `gpu` automatically creates a GPU (`g6.2xlarge`) machine and add it as a worker node to your cluster. +==== + +[id="changing-models"] +=== Changing models + +To update the models, edit `overrides/values-cpu.yaml` (if `global.device` is set to `cpu`) or `overrides/values-gpu.yaml` (if set to `gpu`). + +The default CPU-based model is defined as follows: + +[source,yaml] +---- +global: + models: + llama-3-2-3b-instruct-cpu: + id: meta-llama/Llama-3.2-3B-Instruct + enabled: true + resources: + limits: + cpu: "6" + memory: 48Gi + requests: + cpu: "2" + memory: 24Gi + args: + - --enable-auto-tool-choice + - --chat-template + - /chat-templates/tool_chat_template_llama3.2_json.jinja + - --tool-call-parser + - llama3_json + - --dtype + - auto + - --max-model-len + - "16384" + - --max-num-seqs + - "1" +---- + +You can change this to any vLLM-compatible model that you have accepted the terms and conditions for with your HuggingFace API token. You can also adjust the resource parameters as needed for your environment. + +The runtime defaults to `vllm/vllm-openai:v0.11.1`. If you need a later version, you can override the image: + +[source,yaml] +---- +llm-service: + deviceConfigs: + gpu: + image: vllm/vllm-openai:nightly +---- + +[NOTE] +==== +The example above sets a GPU-specific container image. To override the CPU-based image instead, use the key `llm-service.deviceConfigs.cpu.image`. +==== + +[id="multiple-models"] +=== Defining multiple models + +You can define multiple LLM models to be served simultaneously. For example: + +[source,yaml] +---- +global: + models: + deepseek-r1: + id: Valdemardi/DeepSeek-R1-Distill-Llama-70B-AWQ + enabled: true + resources: + limits: + cpu: "32" + memory: 200Gi + requests: + cpu: "24" + memory: 150Gi + args: + - --reasoning-parser + - deepseek_r1 + - --tool-call-parser + - llama3_json + - --enable-auto-tool-choice + - --quantization + - awq_marlin + - --dtype + - float16 + - --max-model-len + - "65536" + gpt-oss-120b: + id: openai/gpt-oss-120b + enabled: true + resources: + limits: + cpu: "32" + memory: 200Gi + requests: + cpu: "24" + memory: 150Gi + args: + - --tool-call-parser + - openai + - --enable-auto-tool-choice +---- + +For a complete list of customizable values, see the link:https://github.com/rh-ai-quickstart/ai-architecture-charts[AI Architecture charts] repository. diff --git a/content/patterns/rag-quickstart/getting-started.adoc b/content/patterns/rag-quickstart/getting-started.adoc new file mode 100644 index 0000000000..44811c81ef --- /dev/null +++ b/content/patterns/rag-quickstart/getting-started.adoc @@ -0,0 +1,167 @@ +--- +title: Getting started +weight: 10 +aliases: /rag-quickstart/getting-started/ +--- + +:toc: +:imagesdir: /images +:_content-type: ASSEMBLY +include::modules/comm-attributes.adoc[] + +[id="deploying-rag-quickstart-pattern"] +== Deploying the RAG AI Quickstart pattern + +.Prerequisites + +* An OpenShift cluster (version 4.18 or later) + ** To create an OpenShift cluster, go to the https://console.redhat.com/[Red Hat Hybrid Cloud console]. + ** Select *OpenShift \-> Red Hat OpenShift Container Platform \-> Create cluster*. +* A https://huggingface.co/[HuggingFace] account with an API token that has read permissions. + ** You must accept the terms and conditions for the https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct[meta-llama/Llama-3.2-3B-Instruct] model with the account that the API token belongs to. +* The Helm binary. For instructions, see link:https://helm.sh/docs/intro/install/[Installing Helm]. +* Additional installation tool dependencies. For details, see link:https://validatedpatterns.io/learn/quickstart/[Patterns quick start]. + +[id="preparing-for-deployment"] +== Preparing for deployment +.Procedure + +. Fork the link:https://github.com/validatedpatterns-sandbox/ai-quickstart-rag[ai-quickstart-rag] repository on GitHub. You must fork the repository to customize this pattern. + +. Clone the forked copy of this repository. ++ +[source,terminal] +---- +$ git clone git@github.com:your-username/ai-quickstart-rag.git +---- + +. Go to the root directory of your Git repository: ++ +[source,terminal] +---- +$ cd ai-quickstart-rag +---- + +. Run the following command to set the upstream repository: ++ +[source,terminal] +---- +$ git remote add -f upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git +---- + +. Verify the setup of your remote repositories by running the following command: ++ +[source,terminal] +---- +$ git remote -v +---- ++ +.Example output ++ +[source,terminal] +---- +origin git@github.com:your-username/ai-quickstart-rag.git (fetch) +origin git@github.com:your-username/ai-quickstart-rag.git (push) +upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git (fetch) +upstream git@github.com:validatedpatterns-sandbox/ai-quickstart-rag.git (push) +---- + +. Make a local copy of the secrets template outside of your repository to hold credentials for the pattern. ++ +[WARNING] +==== +Do not add, commit, or push this file to your repository. Doing so may expose personal credentials to GitHub. +==== ++ +Run the following command: ++ +[source,terminal] +---- +$ cp values-secret.yaml.template ~/values-secret-ai-quickstart-rag.yaml +---- + +. Populate this file with secrets, or credentials, that are needed to deploy the pattern successfully: ++ +[source,terminal] +---- +$ vim ~/values-secret-ai-quickstart-rag.yaml +---- + +.. Edit the `llm-service` section to use your HuggingFace API token: ++ +[source,yaml] +---- + - name: llm-service + fields: + - name: hf_token + value: hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx +---- + +. Optional: To customize the deployment, create and switch to a new branch by running the following command: ++ +[source,terminal] +---- +$ git checkout -b my-branch +---- ++ +Make your changes, then stage and commit them: ++ +[source,terminal] +---- +$ git add +$ git commit -m "Customize deployment" +---- ++ +Push the changes to your forked repository: ++ +[source,terminal] +---- +$ git push origin my-branch +---- + +[id="deploying-cluster-using-patternsh-file"] +== Deploying the pattern by using the pattern.sh file + +To deploy the pattern by using the `pattern.sh` file, complete the following steps: + +. Log in to your cluster by following this procedure: + +.. Obtain an API token by visiting link:https://oauth-openshift.apps../oauth/token/request[https://oauth-openshift.apps../oauth/token/request]. + +.. Log in to the cluster by running the following command: ++ +[source,terminal] +---- +$ oc login --token= --server=https://api..:6443 +---- ++ +Or log in by running the following command: ++ +[source,terminal] +---- +$ export KUBECONFIG=~/ +---- + +. Deploy the pattern to your cluster. Run the following command: ++ +[source,terminal] +---- +$ ./pattern.sh make install +---- + +.Verification + +To verify a successful installation, check the health of the ArgoCD applications: + +. Run the following command: ++ +[source,terminal] +---- +$ ./pattern.sh make argo-healthcheck +---- ++ +It might take several minutes for all applications to synchronize and reach a healthy state. This includes downloading the LLM models and populating the vector database. + +. Verify that the Operators are installed by navigating to *Operators -> Installed Operators* in the {ocp} web console. + +. After all applications are healthy, open the RAG chatbot UI by clicking the route link in the *Networking -> Routes* page of the `ai-quickstart-rag-prod` namespace. diff --git a/modules/rag-quickstart-about.adoc b/modules/rag-quickstart-about.adoc new file mode 100644 index 0000000000..bb01e6083e --- /dev/null +++ b/modules/rag-quickstart-about.adoc @@ -0,0 +1,75 @@ +:_content-type: CONCEPT +:imagesdir: ../../images +include::comm-attributes.adoc[] + +[id="about-rag-quickstart"] += About the RAG Quickstart pattern + +Use retrieval-augmented generation (RAG) to enhance large language models with specialized data sources for more accurate and context-aware responses. + +Use case:: + +* Deploy a RAG-powered chatbot that connects users to internal documentation through a single chat interface. +* Explore retrieval-augmented generation capabilities including document ingestion, custom system prompts, and agent-based RAG. +* Use a GitOps approach to provision AI infrastructure including LLM serving, vector storage, and safety guardrails. ++ +[NOTE] +==== +Based on the requirements of a specific implementation, certain details might differ. However, all Validated Patterns that are based on a portfolio architecture, generalize one or more successful deployments of a use case. +==== + +Background:: + +This pattern is scaffolding around the link:https://github.com/rh-ai-quickstart/rag[RAG AI Quickstart]. It provisions the OpenShift cluster with link:https://www.redhat.com/en/products/ai/openshift-ai[{rhoai}] in a configuration suitable for LlamaStack. It deploys NFD and the NVIDIA GPU Operator for LLM inference on GPU nodes and manages secrets through the {solution-name-upstream} framework. On AWS, GPU worker nodes can be provisioned automatically. By default, this pattern uses a CPU-based LLM. + +Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant external knowledge to improve accuracy, reduce hallucinations, and support domain-specific conversations. + +The included demo application features FantaCo, a fictional large enterprise that launched a secure RAG chatbot connecting employees to HR, procurement, sales, and IT documentation. Users can explore the capabilities of RAG by: + +- Exploring FantaCo's solution +- Uploading new documents to be embedded +- Tweaking sampling parameters to influence LLM responses +- Using custom system prompts +- Switching between simple and agent-based RAG + +[id="about-rag-quickstart-solution"] +== About the solution + +This pattern deploys a complete RAG pipeline on a single OpenShift cluster by using a GitOps approach. The {solution-name-upstream} framework handles infrastructure provisioning, including GPU operators, AI platform configuration, and secrets management. The RAG AI Quickstart delivers the application layer: document ingestion, embedding, retrieval, and LLM-powered chat. + +The solution uses LlamaStack to standardize the building blocks of the AI stack with a consistent interface for model serving, vector storage, and safety guardrails. Kubeflow Pipelines ingests documents, embeds them, and stores them in PostgreSQL with PGVector. At query time, the system retrieves relevant embeddings to ground LLM responses in real data. + +[id="about-rag-quickstart-technology"] +== About the technology + +The following technologies are used in this solution: + +https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[{rh-ocp}]:: +An enterprise-ready Kubernetes container platform built for an open hybrid cloud strategy. It provides a consistent application platform to manage hybrid cloud, public cloud, and edge deployments. + +https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it[{rh-gitops}]:: +A declarative application continuous delivery tool for Kubernetes based on the ArgoCD project. Application definitions, configurations, and environments are declarative and version controlled in Git. + +https://www.redhat.com/en/technologies/cloud-computing/openshift/openshift-ai[{rhoai}]:: +A flexible, scalable MLOps platform with tools to build, deploy, and manage AI-enabled applications. This pattern uses {rhoai} to serve the LLM inference endpoint. + +https://github.com/meta-llama/llama-stack[LlamaStack]:: +A standardized framework for building AI applications with Llama models. It provides consistent APIs for model inference, vector storage, safety guardrails, and agentic workflows. + +https://www.postgresql.org/[PostgreSQL] with https://github.com/pgvector/pgvector[PGVector]:: +An open source relational database extended with PGVector for storing and querying vector embeddings used in document retrieval. + +https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2[all-MiniLM-L6-v2]:: +A sentence transformer model used to generate vector embeddings from documents and queries for similarity search. + +https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct[Llama 3.2-3B-Instruct]:: +The default large language model used for generating responses. The pattern also supports Llama 3.1-8B and Llama 3.3-70B-Instruct on GPU-equipped clusters. + +https://huggingface.co/meta-llama/Llama-Guard-3-1B[Llama Guard 3]:: +A safety model that provides content filtering and guardrails to block harmful requests and responses. + +https://www.kubeflow.org/docs/components/pipelines/[Kubeflow Pipelines]:: +A platform for building and deploying ML workflows. This pattern uses Kubeflow Pipelines for document ingestion and embedding. + +https://streamlit.io/[Streamlit]:: +An open source Python framework used to build the RAG chatbot user interface. diff --git a/modules/rag-quickstart-architecture.adoc b/modules/rag-quickstart-architecture.adoc new file mode 100644 index 0000000000..4b865ca836 --- /dev/null +++ b/modules/rag-quickstart-architecture.adoc @@ -0,0 +1,131 @@ +:_content-type: CONCEPT +:imagesdir: ../../images +include::comm-attributes.adoc[] + +[id="rag-quickstart-architecture"] += RAG Quickstart architecture + +The following figure provides a high-level overview of the RAG Quickstart architecture. + +.RAG system architecture +image::rag-quickstart/rag-architecture.png[RAG System Architecture,link="/images/rag-quickstart/rag-architecture.png"] + +The architecture consists of two main pipelines: + +* *RAG Pipeline* — Handles user queries and generates responses through LlamaStack APIs, with safety guardrails, model serving, and vector retrieval. +* *Ingestion Pipeline* — Processes documents from multiple sources, generates embeddings, and stores them in the vector database. + +[id="rag-quickstart-rag-pipeline"] +== RAG pipeline + +The RAG pipeline processes user queries through the following components: + +Frontend UI:: +Provides the user interface for submitting queries and viewing responses. The Streamlit-based UI communicates with the LlamaStack APIs by using REST. + +LlamaStack APIs:: +The central orchestration layer that routes queries to the appropriate backend services. LlamaStack provides a standardized interface for model inference, vector retrieval, tool use, and safety guardrails. + +Guard Rails:: +Screens both incoming queries and outgoing responses for harmful content using Llama Guard. Llama Guard checks incoming queries for prompt injection, manipulative content, and inappropriate requests. It also validates generated responses for harmful content and compliance before returning them to the user. + +Model Servers:: +Serve the LLM for response generation. The pattern supports multiple serving backends including vLLM on Red Hat OpenShift AI, and Ollama for CPU-based deployments. The default model is `meta-llama/Llama-3.2-3B-Instruct`. + +Vector DBs:: +Store document embeddings in PostgreSQL with PGVector. When a query arrives, the retriever converts it to a vector embedding and performs a similarity search to find relevant document chunks, which are passed as context to the LLM. + +Tools:: +Provide agent-based capabilities for more complex workflows. When agent-based RAG is enabled, LlamaStack can invoke tools to perform multi-step reasoning and retrieval. + +[id="rag-quickstart-ingestion-pipeline"] +== Ingestion pipeline + +The ingestion pipeline processes documents and updates the knowledge base. Documents can be ingested from three sources: + +S3 Bucket:: +Documents stored in S3-compatible object storage (MinIO) are processed through OpenShift AI Pipelines (Kubeflow) for batch ingestion. + +URL:: +The system downloads documents from web URLs and processes them through a Python script for embedding. + +Uploads:: +Users can upload documents directly through the frontend UI or retriever listener for on-demand ingestion. + +All ingestion paths feed into the Retriever and Embedding Service, which uses Docling libraries to chunk documents into appropriate segments and the all-MiniLM-L6-v2 model to generate vector embeddings. The resulting embeddings are stored in PGVector for retrieval. + +[id="rag-quickstart-deployment"] +== Deployment architecture + +The following table describes the pod structure when deployed on OpenShift: + +[cols="1,2,3",options="header"] +|=== +| Pod | Purpose | Key characteristics + +| Frontend +| User interface +| Streamlit-based UI, communicates with LlamaStack APIs by using REST + +| LlamaStack +| RAG orchestration +| Central application logic, routes queries to model servers, vector DBs, guard rails, and tools + +| LLM Service +| Language model inference +| Runs vLLM with Llama models, optimized for GPU utilization, deployed by using KServe InferenceService on {rhoai} + +| Guard Rails +| Content moderation +| Runs Llama Guard for input and output safety screening, can be independently scaled + +| Vector Database +| Embedding storage and search +| PostgreSQL with PGVector, requires persistent storage, deployed as StatefulSet + +| Embedding Service +| Vector embeddings +| Generates embeddings for documents and queries using all-MiniLM-L6-v2 + +| Ingestion Pipeline +| Document processing +| Kubeflow Pipelines workflows, uses Docling for document chunking, connected to S3-compatible storage (MinIO) +|=== + +[id="rag-quickstart-technologies"] +== Implementation technologies + +[cols="1,2",options="header"] +|=== +| Component | Technology + +| Application Framework +| LlamaStack + +| LLM Service +| vLLM with meta-llama/Llama-3.2-3B-Instruct + +| Vector Database +| PostgreSQL + PGVector + +| Container Orchestration +| {rh-ocp} + {rhoai} + +| Safety Models +| meta-llama/Llama-Guard-3-1B + +| Embedding Model +| all-MiniLM-L6-v2 + +| Document Processing +| Docling + +| Pipeline Orchestration +| Kubeflow Pipelines + +| Object Storage +| MinIO (S3-compatible) + +| Frontend +| Streamlit +|=== diff --git a/static/images/rag-quickstart/Llama-UI.png b/static/images/rag-quickstart/Llama-UI.png new file mode 100644 index 0000000000..2980222017 Binary files /dev/null and b/static/images/rag-quickstart/Llama-UI.png differ diff --git a/static/images/rag-quickstart/jupyter-nb.png b/static/images/rag-quickstart/jupyter-nb.png new file mode 100644 index 0000000000..b4efd2de14 Binary files /dev/null and b/static/images/rag-quickstart/jupyter-nb.png differ diff --git a/static/images/rag-quickstart/kfp-configure.png b/static/images/rag-quickstart/kfp-configure.png new file mode 100644 index 0000000000..e40f351cfe Binary files /dev/null and b/static/images/rag-quickstart/kfp-configure.png differ diff --git a/static/images/rag-quickstart/kfp-logs.png b/static/images/rag-quickstart/kfp-logs.png new file mode 100644 index 0000000000..45d75b4f8c Binary files /dev/null and b/static/images/rag-quickstart/kfp-logs.png differ diff --git a/static/images/rag-quickstart/kfp-pipeline.png b/static/images/rag-quickstart/kfp-pipeline.png new file mode 100644 index 0000000000..31de168adf Binary files /dev/null and b/static/images/rag-quickstart/kfp-pipeline.png differ diff --git a/static/images/rag-quickstart/kfp-run.png b/static/images/rag-quickstart/kfp-run.png new file mode 100644 index 0000000000..489749df2b Binary files /dev/null and b/static/images/rag-quickstart/kfp-run.png differ diff --git a/static/images/rag-quickstart/psql-pgvector-1.png b/static/images/rag-quickstart/psql-pgvector-1.png new file mode 100644 index 0000000000..5964127d0b Binary files /dev/null and b/static/images/rag-quickstart/psql-pgvector-1.png differ diff --git a/static/images/rag-quickstart/rag-architecture.drawio b/static/images/rag-quickstart/rag-architecture.drawio new file mode 100644 index 0000000000..3ae48f6ca0 --- /dev/null +++ b/static/images/rag-quickstart/rag-architecture.drawio @@ -0,0 +1,216 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/static/images/rag-quickstart/rag-architecture.png b/static/images/rag-quickstart/rag-architecture.png new file mode 100644 index 0000000000..3154ed9052 Binary files /dev/null and b/static/images/rag-quickstart/rag-architecture.png differ diff --git a/static/images/rag-quickstart/rag-ui-1.png b/static/images/rag-quickstart/rag-ui-1.png new file mode 100644 index 0000000000..d686b6219f Binary files /dev/null and b/static/images/rag-quickstart/rag-ui-1.png differ diff --git a/static/images/rag-quickstart/rag-ui-2.png b/static/images/rag-quickstart/rag-ui-2.png new file mode 100644 index 0000000000..31736974dd Binary files /dev/null and b/static/images/rag-quickstart/rag-ui-2.png differ diff --git a/static/images/rag-quickstart/rag-ui-3.png b/static/images/rag-quickstart/rag-ui-3.png new file mode 100644 index 0000000000..34533822b7 Binary files /dev/null and b/static/images/rag-quickstart/rag-ui-3.png differ diff --git a/static/images/rag-quickstart/rag-ui-4.png b/static/images/rag-quickstart/rag-ui-4.png new file mode 100644 index 0000000000..40efe5b9b7 Binary files /dev/null and b/static/images/rag-quickstart/rag-ui-4.png differ diff --git a/static/images/rag-quickstart/rag-ui-5.png b/static/images/rag-quickstart/rag-ui-5.png new file mode 100644 index 0000000000..709432a629 Binary files /dev/null and b/static/images/rag-quickstart/rag-ui-5.png differ diff --git a/static/images/rag-quickstart/rag-ui-6.png b/static/images/rag-quickstart/rag-ui-6.png new file mode 100644 index 0000000000..627c415a99 Binary files /dev/null and b/static/images/rag-quickstart/rag-ui-6.png differ diff --git a/static/images/rag-quickstart/rhoai-project-1.png b/static/images/rag-quickstart/rhoai-project-1.png new file mode 100644 index 0000000000..22e7a9a284 Binary files /dev/null and b/static/images/rag-quickstart/rhoai-project-1.png differ diff --git a/static/images/rag-quickstart/rhoai-project-2.png b/static/images/rag-quickstart/rhoai-project-2.png new file mode 100644 index 0000000000..107e041094 Binary files /dev/null and b/static/images/rag-quickstart/rhoai-project-2.png differ diff --git a/static/images/rag-quickstart/workbench.png b/static/images/rag-quickstart/workbench.png new file mode 100644 index 0000000000..f54d780e21 Binary files /dev/null and b/static/images/rag-quickstart/workbench.png differ