Skip to main content

πŸ“ Architecture Overview

A mental map of the jobserp_explorer codebase

This repo is designed for both CLI and UI users β€” offering a unified logic core underneath multiple interfaces. Here’s how the pieces fit together.


🧭 Top-Level Structure​

jobserp_explorer/
β”œβ”€β”€ app.py β†’ Streamlit frontend entry
β”œβ”€β”€ cli.py β†’ CLI wrapper, routes to core scripts
β”œβ”€β”€ views/ β†’ Each tab of the Streamlit UI
β”œβ”€β”€ core/ β†’ One-shot scripts and orchestrators
β”œβ”€β”€ flow_jobposting/ β†’ LLM templates, schema, scoring logic
β”œβ”€β”€ flow_pagecateg/ β†’ LLM-based classification pipeline
β”œβ”€β”€ config/ β†’ Paths and schema loading
β”œβ”€β”€ utils/ β†’ General helpers (paths, state, etc.)
└── run_manager.py β†’ Manages run UID, directories, etc.

πŸ”„ Dual Entry Points​

  • πŸ–₯ Streamlit UI (app.py)
    Interactive interface with tabs for query, config, results, and prompt editing.

  • πŸ›  CLI Tool (cli.py)
    Run jobserp-explorer commands in a terminal: scrape, score, export, etc.

πŸ‘‰ Both interfaces call the same underlying orchestration logic.


πŸ”— Pipeline Flow​

graph TD
A[User Query] --> B[Scraper]
B --> C[Job Scoring (LLM)]
C --> D[Page Classification (LLM)]
D --> E[Results Export / Streamlit Display]

Each stage is modular and corresponds to:

  • core/*.py: execution logic
  • flow_*/: LLM schema + prompt flow
  • data/01_fetch_serps/run_*/: saved runs and metadata

🧠 Core Modules​

core/​

  • 00_fetch_remotive_jobs.py: optional Remotive jobs
  • 01_serp_scraper.py: Google/Bing scraping
  • 02_label_and_score.py: LLM-based scoring
  • 03_export_results_to_jsonl.py: data persistence
  • 09_run_promptflow.py: run a specific flow
  • 10_run_full_pipeline.py: end-to-end launcher

flow_jobposting/ & flow_pagecateg/​

These follow a PromptFlow design pattern:

  • schema.json: defines expected output
  • prompt.jinja2: LLM prompt
  • llm_wrapper.py: prompt runner
  • flow.dag.yaml: node sequencing

Each flow is a self-contained micro-pipeline for a specific AI task.


views/ (Streamlit)​

Each tab in the UI has a corresponding file:

  • query_tab.py: write and submit job queries
  • config_tab.py: tweak schema, prompt, and paths
  • results_tab.py: browse results
  • jinja_editor_tab.py: edit LLM prompts
  • json_editor_tab.py: edit output schema

run_manager.py​

  • Generates timestamped run IDs
  • Creates folders in data/01_fetch_serps/run_...
  • Keeps track of what’s complete (done_tracker.csv)

πŸ’‘ This makes the app "session-safe" β€” every pipeline run is tracked separately.


🧠 Mental Model: Frontends ↔ Shared Logic​

CLI and UI are interchangeable frontends to the same backend logic.

This enables:

  • quick debugging via CLI
  • intuitive experimentation via UI
  • future API exposure or scheduling

πŸ›  Future Growth​

This architecture scales well to:

  • ✨ scheduling flows (via cron or main.py)
  • πŸ”Œ plugin flows (e.g., new scoring logic)
  • πŸ“¦ local multi-user runs

If you’re looking to contribute, understanding this layout is a great start.