Skip to content
← Back to work

Production data sync — US proptech

6 pipelines · 100+ integrations · 3× daily · 1 year uptime

380+ days uptime
100+ integrations/run
9 failure classes
15+ lead gen sites
1 engineer

Sole engineer on a mission-critical data synchronisation system for a US mortgage company. Built the entire platform from scratch — no inherited code, no team, no handholding. The system syncs loan application data between 6 authenticated mortgage lender portals and the company’s internal platform, running 100+ integrations per run, 3 times daily.

These aren’t open APIs. Each platform requires browser-based authentication — login forms, MFA codes, session tokens that expire mid-flow, and anti-bot detection. A script that works today can break tomorrow because a lender changed a button class or added a captcha.

Checkpoint-based workflow engine

An async, queue-based execution system. Steps are grouped into checkpoints. If authentication expires mid-execution, DOM and URL watchers detect the redirect, pause the queue, re-authenticate, and resume from the last safe checkpoint — not from the top of the script. Errors that occur inside the auth window are intelligently ignored because they’re symptoms of the session death, not real failures.

Browser persistence pool

Instead of creating a new browser instance per request (which triggers redundant MFA), I built a pool of warm browser sessions that stay authenticated across requests. This reduced MFA rate-limit failures from a daily occurrence to near-zero and cut integration time significantly.

Error classification & directive system

9 distinct failure scenarios, each with its own automated recovery policy. SELECTOR_NOT_FOUND aborts and marks the script stale. AUTH_CREDENTIALS_INVALID notifies via Slack immediately. NETWORK_ERROR requeues with backoff. MFA_TIMEOUT pauses and waits. The system self-manages — if a script fails too many times, it blocks itself and alerts the team.

AI document scraper

Certain dates had to be extracted from PDF documents hosted on the lender portals. I built a 3-tier extraction system: first, regex-based text matching on extracted text. If that fails, the page most likely to contain the date is converted to an image and sent to OpenAI for OCR. If the PDF is image-based (scanned documents), the entire PDF is converted to images and sent to OpenAI. Each tier uses confidence scoring with retry logic — if the score is below threshold, it retries and averages before accepting or rejecting.

AI captcha bypass

Built a vision-model-based captcha solver that screenshots text-based captchas and extracts the code using OpenAI. Used for scraping the NMLS website during the lead generation project.

Lead generation pipelines

Scraped 15+ mortgage industry websites — Arbor Financial Group, Arive, C2 Financial, Coast 2 Coast, Edge Home Finance, Hoot Home Loans, Loan Factory, Motto Mortgage, NMLS, Nexa Mortgage, The Mortgage Calculator, Xpert Home Lending, and others. Each pipeline handled platform-specific authentication, pagination, rate limiting, and data normalisation into a strict internal schema.

Production Chrome extension

Live on the Chrome Web Store. Redirects lender portal navigations to the correct dedicated login URLs — generic login pages didn’t work for the company’s accounts. Also monitors which team members are actively using the extension and tracks login activity — built to solve an ops problem where someone was repeatedly resetting shared credentials, breaking workflows for the entire team.

Next project

Autowright

Self-healing Playwright · 12 error categories · 24 scenarios