This repository contains the implementation of a comprehensive Rare Beauty Product Data Pipeline and Virtual Product Recommender with Augmented Reality (AR) Try-On features. The solution integrates modern data engineering tools and techniques to provide an end-to-end workflow, from scraping product data to building a semantic product recommendation system and virtual try-on experience.
Live Demo: https://drive.google.com/file/d/1924QjMDTodv-voJRR15mtWnntPlutizF/view?usp=sharing
- Project Overview
- Tech Stack
- Folder Structure
- Pipeline Architecture
- Steps Completed
- Setup & Installation
- How to Use
- Future Work
This project builds an advanced data pipeline and recommendation system for Rare Beauty products. The goal is to provide personalized product recommendations based on reviews, product descriptions, shades, price, popularity, and personal preferences, while also allowing users to visualize how makeup products would look on them using an AR try-on feature.
- Web Scraping: Collects product data, reviews, and variants from Rare Beauty's website.
- Data Warehouse: Integrates with Snowflake for storage, analysis, and transformations.
- Semantic Search: Uses Pinecone and OpenAI embeddings to recommend products based on similarity to user queries.
- AR Try-On: Implements a virtual makeup try-on using MediaPipe FaceMesh for lipstick and eyeshadow shades.
- Streamlit Frontend: Provides an interactive, user-friendly web interface to search products and try on makeup virtually.
- Python: Used for web scraping, data transformations, and API integrations.
- SQL (Snowflake): Data storage and transformations.
- dbt: For managing data transformations and models in Snowflake.
- Pinecone: Vector search for semantic product recommendations.
- OpenAI Embeddings: Used to generate semantic vectors for product and review data.
- Streamlit: Web application framework for interactive product search and AR try-on.
- MediaPipe FaceMesh: Used for the AR makeup try-on feature.
- AWS S3: For storing product data files and ensuring easy access.
📦
├─ .gitignore
├─ README.md
├─ ar-tryon
│ ├─ app.js
│ └─ index.html
├─ frontend
│ └─ app.py
├─ pipelines
│ └─ merge_and_clean.py
├─ rare_beauty_dbt
│ ├─ dbt_project.yml
│ ├─ logs
│ │ └─ dbt.log
│ ├─ rare_beauty
│ │ ├─ .gitignore
│ │ ├─ README.md
│ │ ├─ analyses
│ │ │ └─ .gitkeep
│ │ ├─ macros
│ │ │ └─ .gitkeep
│ │ ├─ models
│ │ │ ├─ int_rare_beauty_products.sql
│ │ │ ├─ sources.yml
│ │ │ ├─ stg_rare_beauty_master_merged.sql
│ │ │ └─ stg_rare_beauty_products.sql
│ │ ├─ seeds
│ │ │ └─ .gitkeep
│ │ ├─ snapshots
│ │ │ └─ .gitkeep
│ │ └─ tests
│ │ └─ .gitkeep
│ └─ target
│ ├─ catalog.json
│ ├─ compiled
│ │ └─ rare_beauty
│ │ └─ rare_beauty
│ │ └─ models
│ │ ├─ int_rare_beauty_products.sql
│ │ ├─ stg_rare_beauty_master_merged.sql
│ │ └─ stg_rare_beauty_products.sql
│ ├─ graph.gpickle
│ ├─ graph_summary.json
│ ├─ index.html
│ ├─ manifest.json
│ ├─ partial_parse.msgpack
│ ├─ run
│ │ └─ rare_beauty
│ │ └─ rare_beauty
│ │ └─ models
│ │ ├─ int_rare_beauty_products.sql
│ │ ├─ stg_rare_beauty_master_merged.sql
│ │ └─ stg_rare_beauty_products.sql
│ ├─ run_results.json
│ └─ semantic_manifest.json
├─ requirements.txt
├─ scraper
│ ├─ merge_master_reviews.py
│ ├─ merge_rare_beauty.py
│ ├─ rare_beauty_bestsellers.py
│ ├─ rare_beauty_product_details.py
│ ├─ rare_beauty_scraper.py
│ ├─ rare_beauty_variant_scraper.py
│ └─ scrape_reviews.py
└─ semantic_search
└─ search.py
-
Web Scraping:
rare_beauty_scraper.py: Scrapes product details like name, category, price, image URL, and more.rare_beauty_product_details.py: Scrapes detailed product information such as descriptions, ingredients, and shades.rare_beauty_variants.py: Fetches variant details from Shopify’s product JSON endpoints.- Output: Data is stored in CSV files and merged into a
rare_beauty_master.csv.
-
Data Integration:
- Snowflake Setup: Data is ingested into Snowflake from an S3 bucket and transformed using dbt.
- Transformations: Cleaned and enriched data is transformed into views and tables in Snowflake for further analysis.
-
Recommendation System:
- Pinecone Setup: Semantic search capabilities are implemented using Pinecone to index and query product vectors.
- OpenAI Embeddings: Product names, descriptions, and reviews are converted into vectors for recommendation.
-
Augmented Reality Try-On:
- FaceMesh Integration: Using MediaPipe’s FaceMesh, users can visualize how lipstick and eyeshadow shades look on their faces in real-time.
-
Frontend:
- Streamlit Web App: The frontend allows users to input queries, filter by product category or price range, and try on makeup virtually.
-
Scraping:
- Built scrapers to collect product details, variants, and reviews from Rare Beauty’s website.
-
Data Storage:
- Integrated AWS S3 for data storage and Snowflake for data warehousing.
-
Data Transformation:
- Used dbt to create a streamlined Snowflake data model for product and review data.
-
Recommender System:
- Implemented a semantic search using Pinecone and OpenAI embeddings for product recommendations.
-
AR Try-On:
- Developed an AR feature for virtual makeup try-on using MediaPipe’s FaceMesh.
-
Frontend:
- Built a user-friendly interface with Streamlit to interact with the recommender and AR try-on system.
- Python 3.x
- Snowflake Account
- Pinecone Account
- OpenAI API Key
- AWS S3 Bucket
-
Clone this repository:
git clone https://github.com/yourusername/rare-beauty-recommender.git cd rare-beauty-recommender -
Install Python dependencies:
pip install -r requirements.txt
Set up your Snowflake, Pinecone, and OpenAI API credentials in the appropriate configuration files (e.g.,
secrets.py,.env). -
Run the scrapers to collect the product data:
python rare_beauty_scraper.py python rare_beauty_product_details.py python rare_beauty_variants.py
-
Load the data into Snowflake and run dbt transformations:
dbt run
-
Run the Streamlit web app:
streamlit run app.py
- Product Search: Use the Streamlit interface to search for Rare Beauty products based on name, category, or price range.
- Try-On Makeup: Use your webcam to try on lipstick or eyeshadow shades from Rare Beauty’s product line.
- Semantic Recommendations: Enter a query and get personalized product recommendations based on semantic search powered by Pinecone and OpenAI embeddings.
