Forage AI Careers!
Software Engineer
Technology
Full Time
Remote
About Forage AI
Forage AI builds next-generation systems for large-scale data collection and processing — including web crawling, document parsing, enrichment pipelines, and automation.
We primarily work in Python, design cloud-native systems (AWS-first, with exposure to GCP/Azure), and increasingly integrate GenAI and agent-based workflows into our stack.
Our engineering culture emphasizes ownership, clarity, and reliability. Every developer owns their module end-to-end and collaborates closely in a high-trust, high-impact environment.
Role Overview
This role offers a unique blend of advanced web scraping and GenAI/LLM-driven capabilities to build intelligent, scalable data collection systems and pipelines. You will work on designing and operating generic, reusable scrapers, while also leveraging GenAI and AI agents to enhance extraction, enrichment, validation, and automation workflows.
Beyond building crawlers, you will use GenAI to help create stable, production-grade data products and services aligned with real-world business use cases — transforming raw web and document data into reliable, market-ready solutions. The work sits at the intersection of deep technical problem-solving and market orientation, in a fast-moving, high-tech, and cutting-edge engineering environment.
The role is highly hands-on and emphasizes clean design, reliability, scalability, and impact, with strong ownership from design through deployment and operation.
Key Responsibilities
-
Develop and maintain Python-based systems for large-scale crawling, parsing, enrichment, and processing of structured and unstructured data.
-
Build generic, reusable crawling crawlers capable of extracting data from thousands of websites and documents using shared, configurable codebases.
-
Design and implement GenAI-assisted data extraction and enrichment workflows, including:
– Using LLMs to interpret semi-structured or unstructured content (HTML, PDFs, text-heavy pages).
– Applying prompt-driven logic for classification, normalization, entity extraction, and validation. -
Implement RAG (Retrieval-Augmented Generation) patterns by combining crawled data with vector databases to improve accuracy, consistency, and explainability.
-
Integrate AI agents into data pipelines to autonomously navigate websites, understand page context, select the correct interaction paths, and extract high-value data from dynamic or evolving layouts.
-
Handle complex anti-crawling challenges, including IP rotation, retries, throttling, headers, fingerprinting, and bot-detection mechanisms.
-
Derive common patterns from semi-structured data, build resilient parsing logic, and gracefully manage edge cases and failures.
-
Build and operate end-to-end automated pipelines (crawl → process → enrich → validate → store → deliver), including AI-powered enrichment stages.
-
Design and maintain ETL/ELT workflows with strong validation, monitoring, error-handling, and auditability.
-
Work with SQL, NoSQL, and vector databases, contributing to data modeling, storage, and retrieval strategies.
-
Implement and consume APIs and microservices, including services that expose AI-powered enrichment or extraction capabilities.
-
Contribute to cloud-native system design on AWS (S3, Lambda, ECS/EKS, SQS/SNS, RDS/DynamoDB, CloudWatch).
-
Own live execution of crawlers and pipelines, managing turnaround times, exceptions, QA checks, and delivery SLAs.
-
Write unit and integration tests, debug production issues, profile performance, and participate in code reviews.
-
Implement observability (logging, metrics, tracing) and follow security best practices (secrets management, IAM, least privilege).
-
Collaborate closely with Dev, QA, and Ops teams; ship incrementally using small PRs, design docs, and measurable outcomes.
Required Qualifications
-
3–4 years of professional experience as a Software Engineer.
-
Strong proficiency in Python, with solid understanding of data structures, algorithms, and clean software design
-
Hands-on experience with web crawling and scraping, including:
– Requests, Scrapy, BeautifulSoup (BS4), Pandas, Urllib
– Selenium / Playwright (or similar browser automation tools) -
Proven experience scraping large-scale or complex websites, including social media platforms.
-
Strong understanding of anti-bot measures and resilient crawling strategies.
-
Working knowledge of SQL and experience with at least one RDBMS (PostgreSQL, SQL Server, etc.).
-
Exposure to AWS services and cloud-native concepts.
-
Comfortable working on Linux and using Git for version control.
-
Practical understanding of system design and distributed systems basics.
Preferred / Good to Have (Prioritized)
-
Containers & CI/CD:
– Docker, GitHub Actions / Jenkins
– Basic exposure to Kubernetes -
Data Infrastructure:
– Airflow, Spark, Kafka, or large-scale ETL systems -
Infrastructure as Code:
– Terraform or CloudFormation
– Basic cloud cost and performance optimization -
Frontend / JavaScript:
– Basic familiarity is a nice-to-have -
Exposure to GCP or Azure
How We Work
-
Strong ownership: design → build → deploy → operate.
-
Pragmatic engineering with small PRs and incremental delivery.
-
Emphasis on clear communication, documentation, and reliability.
-
Engineering decisions guided by scale, cost, and long-term maintainability.
Work-from-Home Requirements
-
Reliable high-speed internet for calls and collaboration.
-
A capable computer (modern CPU, 8GB+ RAM).
-
Headphones with clear audio quality.
-
Stable power and backup arrangements.
Forage AI is an equal-opportunity employer.
We value curiosity, craftsmanship, and collaboration, and we look for engineers who enjoy solving hard problems at scale.
Apply Now