π Architecture Overview
A mental map of the
jobserp_explorercodebase
This repo is designed for both CLI and UI users β offering a unified logic core underneath multiple interfaces. Hereβs how the pieces fit together.
π§ Top-Level Structureβ
jobserp_explorer/
βββ app.py β Streamlit frontend entry
βββ cli.py β CLI wrapper, routes to core scripts
βββ views/ β Each tab of the Streamlit UI
βββ core/ β One-shot scripts and orchestrators
βββ flow_jobposting/ β LLM templates, schema, scoring logic
βββ flow_pagecateg/ β LLM-based classification pipeline
βββ config/ β Paths and schema loading
βββ utils/ β General helpers (paths, state, etc.)
βββ run_manager.py β Manages run UID, directories, etc.
π Dual Entry Pointsβ
-
π₯ Streamlit UI (
app.py)
Interactive interface with tabs for query, config, results, and prompt editing. -
π CLI Tool (
cli.py)
Runjobserp-explorercommands in a terminal: scrape, score, export, etc.
π Both interfaces call the same underlying orchestration logic.
π Pipeline Flowβ
graph TD
A[User Query] --> B[Scraper]
B --> C[Job Scoring (LLM)]
C --> D[Page Classification (LLM)]
D --> E[Results Export / Streamlit Display]
Each stage is modular and corresponds to:
core/*.py: execution logicflow_*/: LLM schema + prompt flowdata/01_fetch_serps/run_*/: saved runs and metadata
π§ Core Modulesβ
core/β
00_fetch_remotive_jobs.py: optional Remotive jobs01_serp_scraper.py: Google/Bing scraping02_label_and_score.py: LLM-based scoring03_export_results_to_jsonl.py: data persistence09_run_promptflow.py: run a specific flow10_run_full_pipeline.py: end-to-end launcher
flow_jobposting/ & flow_pagecateg/β
These follow a PromptFlow design pattern:
schema.json: defines expected outputprompt.jinja2: LLM promptllm_wrapper.py: prompt runnerflow.dag.yaml: node sequencing
Each flow is a self-contained micro-pipeline for a specific AI task.
views/ (Streamlit)β
Each tab in the UI has a corresponding file:
query_tab.py: write and submit job queriesconfig_tab.py: tweak schema, prompt, and pathsresults_tab.py: browse resultsjinja_editor_tab.py: edit LLM promptsjson_editor_tab.py: edit output schema
run_manager.pyβ
- Generates timestamped run IDs
- Creates folders in
data/01_fetch_serps/run_... - Keeps track of whatβs complete (
done_tracker.csv)
π‘ This makes the app "session-safe" β every pipeline run is tracked separately.
π§ Mental Model: Frontends β Shared Logicβ
CLI and UI are interchangeable frontends to the same backend logic.
This enables:
- quick debugging via CLI
- intuitive experimentation via UI
- future API exposure or scheduling
π Future Growthβ
This architecture scales well to:
- β¨ scheduling flows (via cron or
main.py) - π plugin flows (e.g., new scoring logic)
- π¦ local multi-user runs
If youβre looking to contribute, understanding this layout is a great start.