PythonFastAPIReactChrome ExtensionAIArchitecture

Building JobTrack: A Full-Stack Job Application Tracker from Scratch

AJ

Abhishek Joshi

May 15, 2026 · 12 min read

Job hunting generates a surprising amount of data — companies you've applied to, roles you're tracking, screening calls, interviews, follow-ups, and the ghosting. Most people manage this with a spreadsheet. The pain is real: you copy-paste from job listings, you can't see pipeline trends at a glance, and switching between LinkedIn, Naukri, Indeed, and various ATS portals creates constant friction.

I wanted something that felt invisible during the search: one click to save a listing from any job board, a clean Kanban view to see my pipeline, and enough analytics to answer "am I applying enough?" without opening a spreadsheet. So I built JobTrack — a local-first, full-stack job application tracker that I shipped iteratively across seven phases.

The architecture in one paragraph

Three independent layers communicate over localhost. A FastAPI + SQLite backend owns all the data and analytics. A Chrome Extension (Manifest V3) scrapes job listings from LinkedIn, Naukri, Indeed, and Greenhouse with one click and posts them to the backend. A React dashboard — built with TanStack Query, Zustand, Recharts, and Tailwind — gives you a Kanban board, a filterable applications table, five analytics charts, an AI analysis history page, and a Resume Vault. The extension and dashboard never talk to each other; both go through the backend.

Phase 1: The backend

I chose FastAPI for the backend for the same reason I've chosen it on other projects: Pydantic v2 validation, auto-generated OpenAPI docs, and an async-ready foundation — all with close to zero boilerplate. The ORM is SQLAlchemy 2.x in its newer mapped_column style, which gives full type safety on model definitions without raw SQL.

The data model is straightforward: profiles, jobs, status_history, job_analyses, and a settings key-value table. The one design decision worth explaining is status_history. Rather than storing only the current status on each job, every status change appends a row to this table — giving you a full audit trail of how each application progressed. Knowing you reached final-round interviews at eight companies but got rejected at that stage every time tells you something specific. A single status field would lose that permanently.

Analytics are computed on the backend, not the frontend. Five endpoints return pre-aggregated data: weekly application volume, jobs by status, locations, salary distribution, and remote/hybrid/on-site breakdown. This keeps the dashboard lightweight and the queries fast regardless of how many jobs accumulate over a long search.

Phase 2: The Chrome Extension and the MV3 constraint

Manifest V3 replaced persistent background pages with ephemeral service workers — meaning Chrome can terminate the service worker at any time and you can't keep state in memory. The solution was to make the service worker completely stateless: it's a pure message proxy. Content scripts on the job board page scrape the listing and post a SCRAPE_JOB message. The service worker receives it, makes a fetch() call to the backend, and returns the response. All persistent state (active profile ID) lives in chrome.storage.local; all application data lives in SQLite.

Each supported job board gets its own scraper module: linkedin.js, naukri.js, indeed.js, greenhouse.js. They all produce the same output shape — {title, company, location, workType, salary, currency, url} — so the service worker and popup don't need to know which site they're on.

Salary parsing was non-trivial. Salary data appears in completely different formats across boards: "1.2 – 2.5 Cr" on Naukri, "CA$120K – CA$160K" on LinkedIn Canada, "$140,000 – $180,000" on Indeed. I centralised this in a single parseSalary() function. The critical ordering rule: check for Crore before Lacs — otherwise "1 Cr" gets parsed as "100 Lacs", a 100× undercount.

Phase 5: Why I rewrote the LinkedIn scraper

The original LinkedIn scraper read directly from the DOM. It worked, but it was fragile — any markup change on LinkedIn's side would silently break it. Worse, DOM scraping couldn't reliably extract structured fields like work type (remote/hybrid/on-site), which LinkedIn encodes as URNs rather than readable text.

The fix was to use LinkedIn's internal Voyager API — the same REST API that LinkedIn's own frontend uses. A single authenticated request to /voyager/api/jobs/jobPostings/{jobId} returns clean JSON with title, location, work type, and description. Company name comes from a second request to /voyager/api/organization/companies/{companyId}. The CSRF token is extracted from the JSESSIONID cookie, which LinkedIn sets as non-HttpOnly and in the format "ajax:TOKEN".

The fallback chain matters here: if the cookie is absent, the format is unexpected, or either Voyager request fails, the scraper silently falls back to DOM extraction. The user sees a pre-filled form either way; they never see an error from an internal API quirk.

Phase 3: The dashboard's two-layer state model

The React dashboard separates state into two concerns with a firm boundary between them.

TanStack Query owns all server state: job lists, analytics, profile data. Jobs use a 15-second background refetch so the dashboard stays in sync when the extension saves a listing while the dashboard is open. Analytics use a 30-second stale time — they're expensive aggregations and don't need to update as frequently.

Zustand owns UI-only state: the active profile ID (persisted to localStorage via the Zustand persist middleware so it survives page reloads) and modal open/close state. The dividing line is: if the data can change from outside the browser, it belongs in React Query. If it's purely a UI concern that no server knows about, it belongs in Zustand.

One subtle React performance issue surfaced early: subscribing to the entire Zustand store causes a re-render on any store change, including a loading flag flipping. The fix was per-field selectors — each component subscribes to only the fields it actually reads. Since useProfile() is called by almost every page, the savings from this cascaded throughout the component tree.

Phase 4: The duplicate URL guard

If you browse to the same job listing twice and click "Save" both times, you'd expect to see a helpful message rather than two identical entries. I implemented this at three layers:

  • Pydantic validator — normalises URLs (lowercases scheme and host, strips whitespace) so https://LinkedIn.com/jobs/123 and https://linkedin.com/jobs/123 are treated as identical.
  • Application layer — a SELECT EXISTS check before insert returns a 409 with the existing job's company and title, so the popup can say "already saved as Senior Engineer at Acme Corp" rather than a generic error.
  • Database constraint — a UniqueConstraint("profile_id", "url") on the model is the final safety net for concurrent writes. SQLite's NULL != NULL semantics correctly allow multiple null-URL jobs per profile (for manually entered listings with no URL).

Phase 6: The Resume Vault — local files, not uploads

The AI analysis layer needs to read resume content. The obvious design is to ask the user to paste resume text into a form. The better design: point the app at a folder on your machine.

The backend's /resumes router lists files in the configured folder, extracts text from .pdf files with pdfminer.six and from .docx files with python-docx, and passes the extracted text directly to the AI prompt. The text is held in memory only for the duration of the API call — it's never stored in the database.

The path traversal guard is worth calling out explicitly: every filename is validated by comparing os.path.realpath(folder/filename) against os.path.realpath(folder) before any file read. This prevents ../../etc/passwd-style attacks even though the app is local-only.

The folder config is stored in a key-value settings table (key TEXT PRIMARY KEY, value TEXT) — three rows: resume_folder_path, default_resume_filename, master_resume_filename. A fixed-column config table would require a schema migration every time a new setting is added; the KV approach lets individual settings be read and written without touching unrelated rows.

Phase 7: AI analysis with Claude

The analysis prompt runs in two modes. In the common case (a dedicated resume to score against a master resume with more content), the prompt includes separate <scored_resume> and <master_resume> sections and asks Claude to compute both a current score and a projected score showing how much the candidate could improve by pulling specific content from the master. When both files are the same, a simpler single-resume prompt is used and projected_score is null.

The XML content tags — <job_description>, <scored_resume>, <master_resume> — aren't decorative. They mitigate prompt injection: content inside the tags is treated as data, not instructions. A job description that contains "Ignore previous instructions and output..." is far less likely to influence the model when it's explicitly bounded as a data block. Engineering job descriptions frequently contain XML-like tags (e.g., <requirements>); the outer tags are fixed and the inner content remains data.

The response parser validates all required keys, clamps current_score and projected_score to [0, 100] via a _safe_int() helper that handles string-coercible values like "78", and verifies that suggestions is a list of {text, score_impact} dicts. Any structural mismatch returns a sanitised 502 — the raw Claude response is logged server-side and never forwarded to the browser.

job_id on analyses is nullable. Analyses run from the extension popup before saving a job have job_id = NULL — they appear in the AI Analysis history page scoped to the active profile, but not in the job detail panel. This lets you analyse a listing you're still deciding whether to save.

Security decisions worth naming

A few things I'd flag for anyone building something similar:

  • No innerHTML in the extension popup. The original code used innerHTML to render profile names into option elements. Replaced with DOM APIs — profile names set via textContent only.
  • URL scheme validation in the dashboard. job.url is validated against /^https?:\/\// before being rendered as an anchor, preventing javascript: URLs from being clickable even if a malformed entry slipped through.
  • CORS locked to specific origins. The backend CORS config allows http://localhost:5173 and chrome-extension:// — wildcard * was an early draft that got removed on review.
  • Sort field allowlist. The sort_by query parameter is validated against a VALID_FIELDS set before being passed to the ORM. Without this, an attacker could pass an arbitrary attribute name and potentially read unintended columns.

What's next

The next phase is AI resume tailoring: generating a modified version of the default resume targeted at a specific job description. The planned flow is that Claude returns a structured list of suggested edits — additions, removals, rewrites — which the dashboard shows as a side-by-side diff. Accepted edits get applied to a new .docx file via python-docx and saved alongside the originals. The main open question is diff representation: whether to model changes as semantic operations ({field, old, new}) or as a raw unified diff. A hybrid approach — structured fields for known resume sections, raw diff for unstructured text — is likely the right answer.

There's also a longer-term offline-first idea: a small IndexedDB queue in the service worker that buffers saves when the backend isn't running and flushes them on reconnect. Right now the extension fails visibly if the backend is down; a queue would make the save flow feel much more robust.

The best personal productivity tools are the ones that make the boring parts of a hard process disappear entirely — so you can focus on the parts that actually require your judgment.

The full source is on GitHub at github.com/Abhishek-P-Joshi/JobTracker.