How Job Scoring and Ranking Works
The core of your job search assistant is a flexible and programmable scoring engine — one that analyzes job listings using both user-driven logic and smart default heuristics.
This page breaks down how that engine works, and how you can steer it.
🧠 What Gets Scored?
Each job result (from LinkedIn, Glassdoor, etc.) is enriched and scored along two axes:
- Labeling: What kind of source does this listing come from?
- Scoring: How trustworthy or relevant is it likely to be?
Example Heuristics
Domain | Label | Score |
---|---|---|
greenhouse.io | ATS | 3 |
lever.co | ATS | 3 |
linkedin.com | Aggregator_T1 | 2 |
wellfound.com | Aggregator_T2 | 1 |
remoteok.com | Aggregator_T3 | 0 |
Employer's own domain | Employer | 2.5 |
Unknown sources | Unknown | 0 |
These are defined in the source code and used via the label_and_score()
function.
⚙️ How Is the Score Calculated?
The score is not from a model — it's a lightweight rule-based heuristic using:
- The URL domain (to determine platform type)
- Whether the company name appears in the domain (→ Employer boost)
- Predefined scores for known ATS or aggregator sites
This logic lives in:
flow\_jobposting/
├── scoring.py # domain scoring
├── promptflow\_runner.py # LLM query + result extraction
📦 Promptflow: Semantic Scoring via LLM
Each job listing is also passed through an LLM with your customized prompt and schema.
- Prompt: Encodes your goals (e.g., “I want remote jobs in policy with good work–life balance”)
- Schema: Tells the assistant what to extract (e.g.,
salary
,remote
,stack
)
The LLM processes the job description and outputs a structured result with:
- Boolean or categorical fields (
is_remote
,location_match
, etc.) - Textual notes (
summary
,concerns
) - An optional user-defined score (e.g., 1–10) inside the schema
If your prompt includes logic like:
“Give me a score out of 10 based on fit for a data policy role in a non-profit”
...then the LLM result includes this value, and the system can use it to sort results.
🧹 Filtering & Ranking
After scoring, a filtering step selects top candidates per query:
- Top
n
for trusted types (Employer
,ATS
) - Limited entries from unknown sources
- Full results saved in
*_scored_full.csv
- Filtered results saved in
*_results.csv
This is managed by:
filter_top_candidates(df, n_per_label=2, n_unknown=1)
This prevents spammy or low-quality job boards from flooding your shortlist.
🔧 Want to Influence Rankings?
You can guide the assistant’s scoring logic in two ways:
-
Customize the prompt (
Prompt Editor Tab
) Add things like:“I care most about impact, not salary” “Avoid jobs in large corporations”
-
Edit the schema (
Schema Editor Tab
) Add fields likehas_equity
,hiring_manager
, orvisa_sponsorship
.
Then, instruct the assistant to weigh or rate jobs using those fields.
📝 Result Files
After a run, you’ll find:
scored_full.csv
: all jobs, with label and scoreresults.csv
: filtered, top-ranked entries*_meta.json
: metadata about the run (counts, label stats)
🧭 Summary
The job scoring pipeline blends rule-based heuristics with LLM-powered judgment, filtered for quality and enriched with your preferences.
- ✏️ You define what to extract
- 🧠 The LLM rates jobs accordingly
- 🧹 The pipeline keeps only the best per query
Ready to customize? Continue to Designing Your Job Search Agent.