How Job Scoring and Ranking Works

The core of your job search assistant is a flexible and programmable scoring engine — one that analyzes job listings using both user-driven logic and smart default heuristics.

This page breaks down how that engine works, and how you can steer it.

🧠 What Gets Scored?

Each job result (from LinkedIn, Glassdoor, etc.) is enriched and scored along two axes:

Labeling: What kind of source does this listing come from?
Scoring: How trustworthy or relevant is it likely to be?

Example Heuristics

Domain	Label	Score
`greenhouse.io`	ATS	3
`lever.co`	ATS	3
`linkedin.com`	Aggregator_T1	2
`wellfound.com`	Aggregator_T2	1
`remoteok.com`	Aggregator_T3	0
Employer's own domain	Employer	2.5
Unknown sources	Unknown	0

These are defined in the source code and used via the label_and_score() function.

⚙️ How Is the Score Calculated?

The score is not from a model — it's a lightweight rule-based heuristic using:

The URL domain (to determine platform type)
Whether the company name appears in the domain (→ Employer boost)
Predefined scores for known ATS or aggregator sites

This logic lives in:

flow\_jobposting/
├── scoring.py               # domain scoring
├── promptflow\_runner.py     # LLM query + result extraction

📦 Promptflow: Semantic Scoring via LLM

Each job listing is also passed through an LLM with your customized prompt and schema.

Prompt: Encodes your goals (e.g., “I want remote jobs in policy with good work–life balance”)
Schema: Tells the assistant what to extract (e.g., salary, remote, stack)

The LLM processes the job description and outputs a structured result with:

Boolean or categorical fields (is_remote, location_match, etc.)
Textual notes (summary, concerns)
An optional user-defined score (e.g., 1–10) inside the schema

If your prompt includes logic like:

“Give me a score out of 10 based on fit for a data policy role in a non-profit”

...then the LLM result includes this value, and the system can use it to sort results.

🧹 Filtering & Ranking

After scoring, a filtering step selects top candidates per query:

Top n for trusted types (Employer, ATS)
Limited entries from unknown sources
Full results saved in *_scored_full.csv
Filtered results saved in *_results.csv

This is managed by:

filter_top_candidates(df, n_per_label=2, n_unknown=1)

This prevents spammy or low-quality job boards from flooding your shortlist.

🔧 Want to Influence Rankings?

You can guide the assistant’s scoring logic in two ways:

Customize the prompt (Prompt Editor Tab) Add things like:

“I care most about impact, not salary” “Avoid jobs in large corporations”
Edit the schema (Schema Editor Tab) Add fields like has_equity, hiring_manager, or visa_sponsorship.

Then, instruct the assistant to weigh or rate jobs using those fields.

📝 Result Files

After a run, you’ll find:

scored_full.csv: all jobs, with label and score
results.csv: filtered, top-ranked entries
*_meta.json: metadata about the run (counts, label stats)

🧭 Summary

The job scoring pipeline blends rule-based heuristics with LLM-powered judgment, filtered for quality and enriched with your preferences.

✏️ You define what to extract
🧠 The LLM rates jobs accordingly
🧹 The pipeline keeps only the best per query

Ready to customize? Continue to Designing Your Job Search Agent.

🧠 What Gets Scored?​

Example Heuristics​

⚙️ How Is the Score Calculated?​

📦 Promptflow: Semantic Scoring via LLM​

🧹 Filtering & Ranking​

🔧 Want to Influence Rankings?​

📝 Result Files​

🧭 Summary​