π Architecture Overview
A mental map of the
jobserp_explorer
codebase
This repo is designed for both CLI and UI users β offering a unified logic core underneath multiple interfaces. Hereβs how the pieces fit together.
π§ Top-Level Structureβ
jobserp_explorer/
βββ app.py β Streamlit frontend entry
βββ cli.py β CLI wrapper, routes to core scripts
βββ views/ β Each tab of the Streamlit UI
βββ core/ β One-shot scripts and orchestrators
βββ flow_jobposting/ β LLM templates, schema, scoring logic
βββ flow_pagecateg/ β LLM-based classification pipeline
βββ config/ β Paths and schema loading
βββ utils/ β General helpers (paths, state, etc.)
βββ run_manager.py β Manages run UID, directories, etc.
π Dual Entry Pointsβ
-
π₯ Streamlit UI (
app.py
)
Interactive interface with tabs for query, config, results, and prompt editing. -
π CLI Tool (
cli.py
)
Runjobserp-explorer
commands in a terminal: scrape, score, export, etc.
π Both interfaces call the same underlying orchestration logic.
π Pipeline Flowβ
graph TD
A[User Query] --> B[Scraper]
B --> C[Job Scoring (LLM)]
C --> D[Page Classification (LLM)]
D --> E[Results Export / Streamlit Display]
Each stage is modular and corresponds to:
core/*.py
: execution logicflow_*/
: LLM schema + prompt flowdata/01_fetch_serps/run_*/
: saved runs and metadata
π§ Core Modulesβ
core/
β
00_fetch_remotive_jobs.py
: optional Remotive jobs01_serp_scraper.py
: Google/Bing scraping02_label_and_score.py
: LLM-based scoring03_export_results_to_jsonl.py
: data persistence09_run_promptflow.py
: run a specific flow10_run_full_pipeline.py
: end-to-end launcher
flow_jobposting/
& flow_pagecateg/
β
These follow a PromptFlow
design pattern:
schema.json
: defines expected outputprompt.jinja2
: LLM promptllm_wrapper.py
: prompt runnerflow.dag.yaml
: node sequencing
Each flow is a self-contained micro-pipeline for a specific AI task.
views/
(Streamlit)β
Each tab in the UI has a corresponding file:
query_tab.py
: write and submit job queriesconfig_tab.py
: tweak schema, prompt, and pathsresults_tab.py
: browse resultsjinja_editor_tab.py
: edit LLM promptsjson_editor_tab.py
: edit output schema
run_manager.py
β
- Generates timestamped run IDs
- Creates folders in
data/01_fetch_serps/run_...
- Keeps track of whatβs complete (
done_tracker.csv
)
π‘ This makes the app "session-safe" β every pipeline run is tracked separately.
π§ Mental Model: Frontends β Shared Logicβ
CLI and UI are interchangeable frontends to the same backend logic.
This enables:
- quick debugging via CLI
- intuitive experimentation via UI
- future API exposure or scheduling
π Future Growthβ
This architecture scales well to:
- β¨ scheduling flows (via cron or
main.py
) - π plugin flows (e.g., new scoring logic)
- π¦ local multi-user runs
If youβre looking to contribute, understanding this layout is a great start.