📐 Architecture Overview

A mental map of the jobserp_explorer codebase

This repo is designed for both CLI and UI users — offering a unified logic core underneath multiple interfaces. Here’s how the pieces fit together.

🧭 Top-Level Structure

jobserp_explorer/
├── app.py                  → Streamlit frontend entry
├── cli.py                  → CLI wrapper, routes to core scripts
├── views/                  → Each tab of the Streamlit UI
├── core/                   → One-shot scripts and orchestrators
├── flow_jobposting/        → LLM templates, schema, scoring logic
├── flow_pagecateg/         → LLM-based classification pipeline
├── config/                 → Paths and schema loading
├── utils/                  → General helpers (paths, state, etc.)
└── run_manager.py          → Manages run UID, directories, etc.

🔄 Dual Entry Points

🖥 Streamlit UI (app.py)
Interactive interface with tabs for query, config, results, and prompt editing.
🛠 CLI Tool (cli.py)
Run jobserp-explorer commands in a terminal: scrape, score, export, etc.

👉 Both interfaces call the same underlying orchestration logic.

🔗 Pipeline Flow

graph TD
    A[User Query] --> B[Scraper]
    B --> C[Job Scoring (LLM)]
    C --> D[Page Classification (LLM)]
    D --> E[Results Export / Streamlit Display]

Each stage is modular and corresponds to:

core/*.py: execution logic
flow_*/: LLM schema + prompt flow
data/01_fetch_serps/run_*/: saved runs and metadata

🧠 Core Modules

`core/`

00_fetch_remotive_jobs.py: optional Remotive jobs
01_serp_scraper.py: Google/Bing scraping
02_label_and_score.py: LLM-based scoring
03_export_results_to_jsonl.py: data persistence
09_run_promptflow.py: run a specific flow
10_run_full_pipeline.py: end-to-end launcher

`flow_jobposting/` & `flow_pagecateg/`

These follow a PromptFlow design pattern:

schema.json: defines expected output
prompt.jinja2: LLM prompt
llm_wrapper.py: prompt runner
flow.dag.yaml: node sequencing

Each flow is a self-contained micro-pipeline for a specific AI task.

`views/` (Streamlit)

Each tab in the UI has a corresponding file:

query_tab.py: write and submit job queries
config_tab.py: tweak schema, prompt, and paths
results_tab.py: browse results
jinja_editor_tab.py: edit LLM prompts
json_editor_tab.py: edit output schema

`run_manager.py`

Generates timestamped run IDs
Creates folders in data/01_fetch_serps/run_...
Keeps track of what’s complete (done_tracker.csv)

💡 This makes the app "session-safe" — every pipeline run is tracked separately.

🧠 Mental Model: Frontends ↔ Shared Logic

CLI and UI are interchangeable frontends to the same backend logic.

This enables:

quick debugging via CLI
intuitive experimentation via UI
future API exposure or scheduling

🛠 Future Growth

This architecture scales well to:

✨ scheduling flows (via cron or main.py)
🔌 plugin flows (e.g., new scoring logic)
📦 local multi-user runs

If you’re looking to contribute, understanding this layout is a great start.

🧭 Top-Level Structure​

🔄 Dual Entry Points​

🔗 Pipeline Flow​

🧠 Core Modules​

core/​

flow_jobposting/ & flow_pagecateg/​

views/ (Streamlit)​

run_manager.py​

🧠 Mental Model: Frontends ↔ Shared Logic​

🛠 Future Growth​