Forage AI Careers!
Web Crawling Engineer
Technology
Full Time
Remote
We are seeking an experienced Web Crawling Engineer to design, build, and maintain robust data extraction systems at scale. You’ll work on developing sophisticated web scraping infrastructure that handles high-volume data collection while ensuring reliability, efficiency, and compliance.
Requirements:
Experience:
3+ years of professional experience in web scraping and data extraction
Technical Skills:
-
Strong proficiency in Python with extensive experience in web scraping frameworks (Scrapy, BeautifulSoup, Selenium, or similar)
-
Deep understanding of HTML, CSS, JavaScript, and DOM manipulation for effective data extraction
-
Hands-on experience with PostgreSQL for data storage and management
-
Proficiency with Redis for caching, queue management, and session handling
-
Experience with RabbitMQ for distributed task management and message queuing
-
Solid knowledge of AWS EC2 for deploying and managing crawling infrastructure
-
Proven experience implementing and managing residential and rotating proxy solutions to handle rate limiting and geo-restrictions
-
Understanding of anti-bot mechanisms and techniques to work within website terms of service
Core Competencies:
-
Ability to analyze website structures and develop efficient extraction strategies
-
Experience handling dynamic content, AJAX requests, and JavaScript-rendered pages
-
Strong debugging skills for troubleshooting scraping issues and proxy failures
-
Knowledge of data quality validation and cleaning techniques
-
Understanding of ethical scraping practices and robots.txt compliance
Responsibilities
-
Design and implement scalable web crawling systems using Python-based frameworks
-
Develop and maintain distributed scraping pipelines using RabbitMQ for task distribution
-
Manage proxy rotation strategies to ensure uninterrupted data collection
-
Optimize crawler performance and resource utilization on AWS EC2 instances
-
Implement data storage solutions using PostgreSQL and caching layers with Redis
-
Monitor crawler health, handle errors, and implement retry mechanisms
-
Ensure data quality through validation and normalization processes
-
Collaborate with data engineering and analytics teams to meet data requirements
-
Stay updated on changes to target websites and adapt scrapers accordingly
Other Infrastructure Requirements:
Since this is a completely work-from-home position, you will also require the following –
-
High-speed internet connectivity for video calls and efficient work.
-
Capable business-grade computer (e.g., modern processor, 8 GB+ of RAM)
-
Dedicated workspace (at home) for uninterrupted and efficient work.
-
Headphones with clear audio quality.
-
Stable power connection and backups in case of internet/power failure.
-
A Windows machine is preferred.
Nice to Have
-
Experience with containerization (Docker) and orchestration tools
-
Knowledge of additional AWS services (S3, Lambda, SQS)
-
Familiarity with API development and reverse engineering
-
Experience with cloud-based scraping services or platforms
-
Understanding of legal and ethical considerations in web scraping
Apply Now