Skip to content

shrey-soni/ai-job-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Job Search Agent 🤖

An automated job search agent for Senior Frontend Engineers. It scrapes job listings across 6 different platforms, evaluates each one with Google's Gemma 3 AI, and sends only high-quality matches (score > 75) to your Telegram.

Runs on a 6-hour schedule. No UI, no auto-applying — just smart, filtered signal straight to your phone.


Supported Job Boards

  1. RemoteOK (remoteok.com)
  2. Indeed (in.indeed.com)
  3. LinkedIn (linkedin.com/jobs)
  4. Instahyre (instahyre.com)
  5. Wellfound / AngelList (wellfound.com)
  6. YCombinator (ycombinator.com/jobs)

Note: Headless scraping of LinkedIn, Indeed, and Wellfound can sometimes result in bot-blocking (0 jobs returned for that platform). The orchestrator is designed to catch these errors gracefully and continue scraping the other platforms.


Prerequisites


Setup

1. Clone and install dependencies

git clone https://github.com/shrey-soni/ai-job-scraper.git
cd ai-job-scraper
npm install

2. Install the Playwright Chromium browser

npm run install:browsers

3. Configure environment variables

cp .env.example .env

Open .env and fill in your keys:

GEMINI_API_KEY=your_gemini_api_key_here
TELEGRAM_TOKEN=your_telegram_bot_token_here
CHAT_ID=your_telegram_chat_id_here

How to get your Telegram Chat ID

  1. Start your bot on Telegram (send it any message)
  2. Visit in your browser:
    https://api.telegram.org/bot<YOUR_TOKEN>/getUpdates
    
  3. Look for "chat": { "id": <THIS_IS_YOUR_CHAT_ID> }

Run

npm start

The agent will:

  1. Sequentially scrape all 6 configured platforms.
  2. Evaluate new jobs using gemma-3-4b-it (via the Gemini API endpoint).
  3. Send Telegram notifications for all matches (score > 75).
  4. Run again automatically every 6 hours.

Example Telegram Message

🔥 Senior Frontend Engineer
🏢 Acme Corp
⭐ Score: 88

Strong React/Next.js role with full remote work. Senior-level signals throughout.

🚩 _None_

Apply → https://remoteok.com/jobs/12345

Project Structure

ai-job-scraper/
├── src/
│   ├── index.js             # Orchestrator + cron scheduler
│   ├── scraper.js           # Multi-source scraper runner
│   ├── filter.js            # Gemma 3 AI evaluation + retry logic
│   ├── notifier.js          # Telegram bot notifications
│   ├── db.js                # SQLite storage (duplicate prevention)
│   └── sources/             # Playwright scrapers for each platform
│       ├── indeed.js
│       ├── instahyre.js
│       ├── linkedin.js
│       ├── remoteok.js
│       ├── wellfound.js
│       └── ycombinator.js
├── jobs.db                  # Auto-created SQLite database
├── .env                     # Your secrets (never commit this)
├── .env.example             # Template
└── package.json

How the AI Scoring Works

Each job is evaluated by Google's gemma-3-4b-it against these criteria:

Criteria Effect
React / Next.js / TypeScript focus ✅ Boost
Senior-level signals ✅ Boost
Job posted within the last 7 days ✅ Boost
Backend-heavy (Go, Python infra, DevOps only) ❌ Penalty

Only jobs with score > 75 are sent to Telegram. Scores are integers between 0–100.


Notes

  • Processed job links are stored in jobs.db — the agent will never send you the same job twice
  • All evaluated jobs are saved to the DB (not just matches), so re-runs skip already-seen listings
  • API calls to the LLM are throttled with a 5-second delay between each job + exponential backoff on failures to heavily respect free-tier rate limits.

About

AI powered job scraper.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors