Forage AI Careers!
Junior Python Developer
Technology
Full Time
Remote
In this role, you’ll be working with an amazingly passionate and talented team of engineers and data scientists who are working at the bleeding edge of data science and data automation.
Who are we?
As we dawn upon the age of AI, there’s nothing more crucial than the underlying data that provides the basis for intelligent and powerful software. With this in mind, Forage AI was born. We are a data automation and data science pioneer built to help democratize data. Our mission is to create powerful data assets on the fly – and we believe that by accomplishing this over and over again, in different industries and varied use cases, we can achieve incredible feats. Our suite of services includes the extraction of unstructured data from websites and documents – with a particular emphasis on extremely broad, generic and wide scale data collection – and the subsequent processing and structuring of this data using best in class approaches in ML/NLP as well as with passionate and deeply committed research teams to create top-tier datasets. Our core belief in the value of data perfection leads to an extreme level of commitment to data precision and accuracy, helping us stand apart from the rest and leading to remarkable outcomes.
Here’s what you’ll do:
Our web crawling team is very unique in the industry – while we have many “single-site” crawlers, our unique proposition and technical efforts are all geared towards building “generic” bots that can crawl and parse data from thousands of websites and documents, all using the same code. This requires a whole different level of thinking, planning, and coding. You will:
-
Build, improve, and run our generic robots to extract data from both the web and documents – handling critical information among a wide variety of structures and formats without error.
-
Craft highly scalable solutions to revolutionize our web crawling strategies.
-
Derive common patterns from semi-structured data, build code to handle them, and be able to deal with exceptions as well.
-
Be responsible for the live execution of our robots, managing turnaround times, exceptions, QA, and delivery, and building a bleeding-edge infrastructure to handle volume and scope.
-
Responsible for end-to-end project automation using Python.
Requirements:
-
Bachelor’s degree in Computer Science/Information Technology Engineering is preferred.
-
2-3 years of experience in web crawling using Python.
-
Must have expertise in scraping social media websites and a strong understanding of overcoming complex anti-crawling measures.
-
Must have hands-on experience in Python libraries like Requests, Scrapy, Pandas, Urllib, or BeautifulSoup (BS4).
-
Experience with API Development would be an added advantage.
-
Must have experience in working with at least one standard RDBMS (PostgreSQL, SQLServer, etc).
-
Must have knowledge and exposure to AWS, Docker & Lambda.
-
Must have created and handled fully automated End to End project pipelines using Python.
-
Experience with web-based automation tools (Selenium, Puppeteer, Mechanise, Render) would be an added advantage.
Other Infrastructure Requirements:
Since this is a completely work-from-home position, you will also require the following –
-
High-speed internet connectivity for video calls and efficient work.
-
Capable business-grade computer (e.g., modern processor, 8 GB+ of RAM, and no other obstacles to interrupted efficient work).
-
Headphones with clear audio quality.
-
Stable power connection and backups in case of internet/power failure.
Apply Now