Building JobTrack: A Full-Stack Job Application Tracker from Scratch
Abhishek Joshi
May 15, 2026 · 12 min read
Job hunting generates a surprising amount of data — companies you've applied to, roles you're tracking, screening calls, interviews, follow-ups, and the ghosting. Most people manage this with a spreadsheet. The pain is real: you copy-paste from job listings, you can't see pipeline trends at a glance, and switching between LinkedIn, Naukri, Indeed, and various ATS portals creates constant friction.
I wanted something that felt invisible during the search: one click to save a listing from any job board, a clean Kanban view to see my pipeline, and enough analytics to answer "am I applying enough?" without opening a spreadsheet. So I built JobTrack — a local-first, full-stack job application tracker that I shipped iteratively across seven phases.
The architecture in one paragraph
Three independent layers communicate over localhost. A FastAPI + SQLite backend owns all the data and analytics. A Chrome Extension (Manifest V3) scrapes job listings from LinkedIn, Naukri, Indeed, and Greenhouse with one click and posts them to the backend. A React dashboard — built with TanStack Query, Zustand, Recharts, and Tailwind — gives you a Kanban board, a filterable applications table, five analytics charts, an AI analysis history page, and a Resume Vault. The extension and dashboard never talk to each other; both go through the backend.
Phase 1: The backend
I chose FastAPI for the backend for the same reason I've chosen it on other projects: Pydantic v2
validation, auto-generated OpenAPI docs, and an async-ready foundation — all with close to zero
boilerplate. The ORM is SQLAlchemy 2.x in its newer mapped_column style, which gives full
type safety on model definitions without raw SQL.
The data model is straightforward: profiles, jobs,
status_history, job_analyses, and a settings key-value table.
The one design decision worth explaining is status_history. Rather than storing only the
current status on each job, every status change appends a row to this table — giving you a full audit
trail of how each application progressed. Knowing you reached final-round interviews at eight companies
but got rejected at that stage every time tells you something specific. A single status field would lose
that permanently.
Analytics are computed on the backend, not the frontend. Five endpoints return pre-aggregated data: weekly application volume, jobs by status, locations, salary distribution, and remote/hybrid/on-site breakdown. This keeps the dashboard lightweight and the queries fast regardless of how many jobs accumulate over a long search.
Phase 2: The Chrome Extension and the MV3 constraint
Manifest V3 replaced persistent background pages with ephemeral service workers — meaning Chrome can
terminate the service worker at any time and you can't keep state in memory. The solution was to make
the service worker completely stateless: it's a pure message proxy. Content scripts on the job board
page scrape the listing and post a SCRAPE_JOB message. The service worker receives it,
makes a fetch() call to the backend, and returns the response. All persistent state
(active profile ID) lives in chrome.storage.local; all application data lives in SQLite.
Each supported job board gets its own scraper module: linkedin.js, naukri.js,
indeed.js, greenhouse.js. They all produce the same output shape —
{title, company, location, workType, salary, currency, url} — so the service
worker and popup don't need to know which site they're on.
Salary parsing was non-trivial. Salary data appears in completely different formats across boards:
"1.2 – 2.5 Cr" on Naukri, "CA$120K – CA$160K" on LinkedIn Canada,
"$140,000 – $180,000" on Indeed. I centralised this in a single parseSalary()
function. The critical ordering rule: check for Crore before Lacs — otherwise "1 Cr" gets
parsed as "100 Lacs", a 100× undercount.
Phase 5: Why I rewrote the LinkedIn scraper
The original LinkedIn scraper read directly from the DOM. It worked, but it was fragile — any markup change on LinkedIn's side would silently break it. Worse, DOM scraping couldn't reliably extract structured fields like work type (remote/hybrid/on-site), which LinkedIn encodes as URNs rather than readable text.
The fix was to use LinkedIn's internal Voyager API — the same REST API that LinkedIn's
own frontend uses. A single authenticated request to
/voyager/api/jobs/jobPostings/{jobId} returns clean JSON with title, location,
work type, and description. Company name comes from a second request to
/voyager/api/organization/companies/{companyId}. The CSRF token is extracted from
the JSESSIONID cookie, which LinkedIn sets as non-HttpOnly and in the format
"ajax:TOKEN".
The fallback chain matters here: if the cookie is absent, the format is unexpected, or either Voyager request fails, the scraper silently falls back to DOM extraction. The user sees a pre-filled form either way; they never see an error from an internal API quirk.
Phase 3: The dashboard's two-layer state model
The React dashboard separates state into two concerns with a firm boundary between them.
TanStack Query owns all server state: job lists, analytics, profile data. Jobs use a 15-second background refetch so the dashboard stays in sync when the extension saves a listing while the dashboard is open. Analytics use a 30-second stale time — they're expensive aggregations and don't need to update as frequently.
Zustand owns UI-only state: the active profile ID (persisted to
localStorage via the Zustand persist middleware so it survives page reloads) and modal
open/close state. The dividing line is: if the data can change from outside the browser, it belongs in
React Query. If it's purely a UI concern that no server knows about, it belongs in Zustand.
One subtle React performance issue surfaced early: subscribing to the entire Zustand store causes a
re-render on any store change, including a loading flag flipping. The fix was per-field selectors —
each component subscribes to only the fields it actually reads. Since useProfile() is
called by almost every page, the savings from this cascaded throughout the component tree.
Phase 4: The duplicate URL guard
If you browse to the same job listing twice and click "Save" both times, you'd expect to see a helpful message rather than two identical entries. I implemented this at three layers:
- Pydantic validator — normalises URLs (lowercases scheme and host, strips
whitespace) so
https://LinkedIn.com/jobs/123andhttps://linkedin.com/jobs/123are treated as identical. - Application layer — a
SELECT EXISTScheck before insert returns a 409 with the existing job's company and title, so the popup can say "already saved as Senior Engineer at Acme Corp" rather than a generic error. - Database constraint — a
UniqueConstraint("profile_id", "url")on the model is the final safety net for concurrent writes. SQLite'sNULL != NULLsemantics correctly allow multiple null-URL jobs per profile (for manually entered listings with no URL).
Phase 6: The Resume Vault — local files, not uploads
The AI analysis layer needs to read resume content. The obvious design is to ask the user to paste resume text into a form. The better design: point the app at a folder on your machine.
The backend's /resumes router lists files in the configured folder, extracts text from
.pdf files with pdfminer.six and from .docx files with
python-docx, and passes the extracted text directly to the AI prompt. The text is held in
memory only for the duration of the API call — it's never stored in the database.
The path traversal guard is worth calling out explicitly: every filename is validated by comparing
os.path.realpath(folder/filename) against os.path.realpath(folder) before
any file read. This prevents ../../etc/passwd-style attacks even though the app is
local-only.
The folder config is stored in a key-value settings table (key TEXT PRIMARY KEY,
value TEXT) — three rows: resume_folder_path,
default_resume_filename, master_resume_filename. A fixed-column config table
would require a schema migration every time a new setting is added; the KV approach lets individual
settings be read and written without touching unrelated rows.
Phase 7: AI analysis with Claude
The analysis prompt runs in two modes. In the common case (a dedicated resume to score against a master
resume with more content), the prompt includes separate <scored_resume> and
<master_resume> sections and asks Claude to compute both a current score and a
projected score showing how much the candidate could improve by pulling specific content from the
master. When both files are the same, a simpler single-resume prompt is used and
projected_score is null.
The XML content tags — <job_description>, <scored_resume>,
<master_resume> — aren't decorative. They mitigate prompt injection: content inside
the tags is treated as data, not instructions. A job description that contains "Ignore previous
instructions and output..." is far less likely to influence the model when it's explicitly bounded
as a data block. Engineering job descriptions frequently contain XML-like tags (e.g.,
<requirements>); the outer tags are fixed and the inner content remains data.
The response parser validates all required keys, clamps current_score and
projected_score to [0, 100] via a _safe_int() helper that handles
string-coercible values like "78", and verifies that suggestions is a list
of {text, score_impact} dicts. Any structural mismatch returns a sanitised 502
— the raw Claude response is logged server-side and never forwarded to the browser.
job_id on analyses is nullable. Analyses run from the extension popup before saving a job
have job_id = NULL — they appear in the AI Analysis history page scoped to the active
profile, but not in the job detail panel. This lets you analyse a listing you're still deciding whether
to save.
Security decisions worth naming
A few things I'd flag for anyone building something similar:
- No innerHTML in the extension popup. The original code used
innerHTMLto render profile names into option elements. Replaced with DOM APIs — profile names set viatextContentonly. - URL scheme validation in the dashboard.
job.urlis validated against/^https?:\/\//before being rendered as an anchor, preventingjavascript:URLs from being clickable even if a malformed entry slipped through. - CORS locked to specific origins. The backend CORS config allows
http://localhost:5173andchrome-extension://— wildcard*was an early draft that got removed on review. - Sort field allowlist. The
sort_byquery parameter is validated against aVALID_FIELDSset before being passed to the ORM. Without this, an attacker could pass an arbitrary attribute name and potentially read unintended columns.
What's next
The next phase is AI resume tailoring: generating a modified version of the default resume targeted at
a specific job description. The planned flow is that Claude returns a structured list of suggested edits
— additions, removals, rewrites — which the dashboard shows as a side-by-side diff. Accepted edits get
applied to a new .docx file via python-docx and saved alongside the
originals. The main open question is diff representation: whether to model changes as semantic
operations ({field, old, new}) or as a raw unified diff. A hybrid approach —
structured fields for known resume sections, raw diff for unstructured text — is likely the right
answer.
There's also a longer-term offline-first idea: a small IndexedDB queue in the service worker that buffers saves when the backend isn't running and flushes them on reconnect. Right now the extension fails visibly if the backend is down; a queue would make the save flow feel much more robust.
The best personal productivity tools are the ones that make the boring parts of a hard process disappear entirely — so you can focus on the parts that actually require your judgment.
The full source is on GitHub at github.com/Abhishek-P-Joshi/JobTracker.