Founding Data Engineer
Progressive Data Jobs
Position Summary
Airys is an AI-powered platform helping nonprofits and community organizations unlock climate resilience data and funding opportunities that are often buried in fragmented municipal systems. Our mission is to make resilience planning and investment transparent, accessible, and actionable, so nonprofits can better advocate for vulnerable communities, guide equitable infrastructure decisions, and secure critical resources.
At the same time, Airys provides the same structured data to the private sector—commercial real estate, insurers, and infrastructure investors—who need clear visibility into how cities are preparing for climate risks. By connecting nonprofit and community priorities with private-sector diligence needs, we create a shared data backbone that drives investment toward projects that strengthen resilience for all.
Building the first comprehensive database and AI-powered platform that aggregates climate resilience insights from various planning documents across all US states. The platform will enable investment professionals to quickly research climate adaptation efforts for due diligence.
- Trial: 1-3 weeks, 20 hrs/week starting immediately
- $40-62.5/hr based on experience
- Remote or hybrid SF Bay Area
- Potential founding team role based on performance/mutual fit
What You’ll Build
Transform our scrappy prototype into a scalable system that processes tens of thousands of municipal documents nationwide, extracts structured project data using AI, and provides fast location-based search for climate adaptation projects.
Key Responsibilities
Scale the Architecture
- Migrate prototype to production cloud infrastructure (AWS/GCP)
- Build distributed systems for parallel document processing and web scraping
- Design scalable databases (relational + vector) with cost/performance optimization
Production Data Pipeline
- Create robust ETL with error handling, monitoring, and automated retries
- Implement accurate geocoding across inconsistent municipal address formats
- Standardize data validation across diverse state/municipal document types
- Build RESTful APIs and efficient search functionality
AI-Powered Extraction
- Scale LLM-based PDF processing while managing API costs
- Implement semantic search across infrastructure project databases
- Enhance AI extraction accuracy from unstructured municipal documents
Technical Requirements
- Languages: Python, TypeScript, SQL
- Cloud Platforms: AWS, GCP, or Azure with distributed systems experience
- Databases: PostgreSQL, vector databases (Pinecone, Supabase)
- AI/ML: LangChain, vector embeddings, RAG, conversational agents
- Data Pipeline Tools: Apache Airflow (or similar tools)
- Web Scraping: Scrapy, Selenium, Google Custom Search API
- Geocoding: Google Maps API, OpenStreetMap, PostGIS
- PDF Processing: Text extraction and document parsing libraries
Nice to have
Technical Mindset
- Thinks in systems and can architect for scale from day one
- Comfortable making technical decisions with limited guidance
- Experience debugging production issues and optimizing performance
- Wants to shape engineering culture and hiring as the team grows
Enjoy working in start-up environment
- Takes ownership and drives projects to completion
- Adapts quickly to changing requirements and priorities
- Direct communication style with both technical and non-technical stakeholders
Passionate about urban climate adaptation and resilience
- Background in municipal/government document analysis
- Familiarity with infrastructure planning or environmental data
- Interest in climate risk assessment or sustainability tech
- Previous work with public sector or policy-related datasets