Skip to main content

How Job Scoring and Ranking Works

The core of your job search assistant is a flexible and programmable scoring engine — one that analyzes job listings using both user-driven logic and smart default heuristics.

This page breaks down how that engine works, and how you can steer it.


🧠 What Gets Scored?

Each job result (from LinkedIn, Glassdoor, etc.) is enriched and scored along two axes:

  • Labeling: What kind of source does this listing come from?
  • Scoring: How trustworthy or relevant is it likely to be?

Example Heuristics

DomainLabelScore
greenhouse.ioATS3
lever.coATS3
linkedin.comAggregator_T12
wellfound.comAggregator_T21
remoteok.comAggregator_T30
Employer's own domainEmployer2.5
Unknown sourcesUnknown0

These are defined in the source code and used via the label_and_score() function.


⚙️ How Is the Score Calculated?

The score is not from a model — it's a lightweight rule-based heuristic using:

  • The URL domain (to determine platform type)
  • Whether the company name appears in the domain (→ Employer boost)
  • Predefined scores for known ATS or aggregator sites

This logic lives in:


flow\_jobposting/
├── scoring.py # domain scoring
├── promptflow\_runner.py # LLM query + result extraction


📦 Promptflow: Semantic Scoring via LLM

Each job listing is also passed through an LLM with your customized prompt and schema.

  • Prompt: Encodes your goals (e.g., “I want remote jobs in policy with good work–life balance”)
  • Schema: Tells the assistant what to extract (e.g., salary, remote, stack)

The LLM processes the job description and outputs a structured result with:

  • Boolean or categorical fields (is_remote, location_match, etc.)
  • Textual notes (summary, concerns)
  • An optional user-defined score (e.g., 1–10) inside the schema

If your prompt includes logic like:

“Give me a score out of 10 based on fit for a data policy role in a non-profit”

...then the LLM result includes this value, and the system can use it to sort results.


🧹 Filtering & Ranking

After scoring, a filtering step selects top candidates per query:

  • Top n for trusted types (Employer, ATS)
  • Limited entries from unknown sources
  • Full results saved in *_scored_full.csv
  • Filtered results saved in *_results.csv

This is managed by:

filter_top_candidates(df, n_per_label=2, n_unknown=1)

This prevents spammy or low-quality job boards from flooding your shortlist.


🔧 Want to Influence Rankings?

You can guide the assistant’s scoring logic in two ways:

  1. Customize the prompt (Prompt Editor Tab) Add things like:

    “I care most about impact, not salary” “Avoid jobs in large corporations”

  2. Edit the schema (Schema Editor Tab) Add fields like has_equity, hiring_manager, or visa_sponsorship.

Then, instruct the assistant to weigh or rate jobs using those fields.


📝 Result Files

After a run, you’ll find:

  • scored_full.csv: all jobs, with label and score
  • results.csv: filtered, top-ranked entries
  • *_meta.json: metadata about the run (counts, label stats)

🧭 Summary

The job scoring pipeline blends rule-based heuristics with LLM-powered judgment, filtered for quality and enriched with your preferences.

  • ✏️ You define what to extract
  • 🧠 The LLM rates jobs accordingly
  • 🧹 The pipeline keeps only the best per query

Ready to customize? Continue to Designing Your Job Search Agent.