Forage AI Careers!

Web Crawling Engineer

Technology

Full Time

Remote

Apply for this job

Experience

3+ years of professional experience

Qualification

Bachelor’s degree in any field

Offered Salary

Based on experience

Posted On

11 February, 2026

Valid Till

10 March, 2026

Apply for this job

We are seeking an experienced Web Crawling Engineer to design, build, and maintain robust data extraction systems at scale. You’ll work on developing sophisticated web scraping infrastructure that handles high-volume data collection while ensuring reliability, efficiency, and compliance.

Requirements:

Experience:

3+ years of professional experience in web scraping and data extraction

Technical Skills:

Strong proficiency in Python with extensive experience in web scraping frameworks (Scrapy, BeautifulSoup, Selenium, or similar)
Deep understanding of HTML, CSS, JavaScript, and DOM manipulation for effective data extraction
Hands-on experience with PostgreSQL for data storage and management
Proficiency with Redis for caching, queue management, and session handling
Experience with RabbitMQ for distributed task management and message queuing
Solid knowledge of AWS EC2 for deploying and managing crawling infrastructure
Proven experience implementing and managing residential and rotating proxy solutions to handle rate limiting and geo-restrictions
Understanding of anti-bot mechanisms and techniques to work within website terms of service

Core Competencies:

Ability to analyze website structures and develop efficient extraction strategies
Experience handling dynamic content, AJAX requests, and JavaScript-rendered pages
Strong debugging skills for troubleshooting scraping issues and proxy failures
Knowledge of data quality validation and cleaning techniques
Understanding of ethical scraping practices and robots.txt compliance

Responsibilities

Design and implement scalable web crawling systems using Python-based frameworks
Develop and maintain distributed scraping pipelines using RabbitMQ for task distribution
Manage proxy rotation strategies to ensure uninterrupted data collection
Optimize crawler performance and resource utilization on AWS EC2 instances
Implement data storage solutions using PostgreSQL and caching layers with Redis
Monitor crawler health, handle errors, and implement retry mechanisms
Ensure data quality through validation and normalization processes
Collaborate with data engineering and analytics teams to meet data requirements
Stay updated on changes to target websites and adapt scrapers accordingly

Other Infrastructure Requirements:

Since this is a completely work-from-home position, you will also require the following –

High-speed internet connectivity for video calls and efficient work.
Capable business-grade computer (e.g., modern processor, 8 GB+ of RAM)
Dedicated workspace (at home) for uninterrupted and efficient work.
Headphones with clear audio quality.
Stable power connection and backups in case of internet/power failure.
A Windows machine is preferred.

Nice to Have

Experience with containerization (Docker) and orchestration tools
Knowledge of additional AWS services (S3, Lambda, SQS)
Familiarity with API development and reverse engineering
Experience with cloud-based scraping services or platforms
Understanding of legal and ethical considerations in web scraping