JONESTECH_SYS
DEPLOY
DOC:003 // DEPLOYMENT SERVICES & DELIVERY

What We
Build For You

From a single model on your workstation to a multi-GPU inference cluster — on-site or remote, we handle the full stack. You get a running system, not a tutorial.

ON-SITE
We Come To You

For organisations that need physical access, hands-on setup, or simply prefer someone in the room. We travel nationwide to deploy directly on your hardware.

NATIONWIDE TRAVEL
01
REMOTE
Fully Remote Deployment

Everything delivered over SSH and screenshare. Faster to schedule, identical outcome. Most deployments complete in a single session — you're running by end of day.

SAME-DAY AVAILABLE
02
WHAT WE DEPLOY

Four Core
Service Areas

Every engagement starts with understanding your use case and hardware. We deploy the right tool for the job — not whatever's trending. Each service is available on-site or remotely.

01
CORE SERVICE
ON-SITE + REMOTE
Local LLM Deployment

Full-stack installation of open-weight language models on your hardware. Model selection, quantization tuning, inference server setup, and an OpenAI-compatible API layer — done and tested before handover.

Supports Llama 3, Mistral, Qwen 2.5, Phi-4, Gemma 3
Backends: Ollama · llama.cpp · vLLM · LM Studio
Single machine to multi-GPU cluster
Windows, Linux, macOS — all supported
OpenAI-compatible endpoint — your apps work unchanged
DEPLOY NOW →
02
ADVANCED
ON-SITE + REMOTE
RAG Pipeline Build

Connect your local LLM to your documents, databases, or knowledge bases. A full Retrieval-Augmented Generation pipeline — vector DB, embeddings, ingestion, and query interface — entirely on-premise.

Vector DBs: ChromaDB · Qdrant · pgvector
Local embedding models (nomic-embed, bge)
Document ingestion: PDF, Markdown, SQL, web
LangChain / LlamaIndex integration
Chat UI or headless API — your choice
BUILD A RAG →
03
PERFORMANCE
REMOTE
Fine-tune & Adapt

Custom LoRA / QLoRA fine-tuning on your own data. Adapt a general model to your domain — legal, medical, customer support, code — with targeted training that never leaves your infrastructure.

LoRA, QLoRA, full fine-tune options
Unsloth · Axolotl · TRL training stacks
Your training data never leaves your machine
Eval benchmarking before and after
Merged model ready for production deployment
FINE-TUNE →
04
INTEGRATION
ON-SITE + REMOTE
Agent Frameworks

Turn your local LLM into an autonomous agent. Tool calling, multi-step reasoning, memory systems — all orchestrated on-premise with open-source frameworks and your existing APIs.

OpenAI-compatible function / tool calling
LangGraph · CrewAI · AutoGen
Local web search, code execution, APIs
MCP (Model Context Protocol) support
Memory: short-term context + vector long-term
BUILD AGENTS →
HOW IT GOES

Every Engagement,
Same Playbook

Whether you're on-site or remote, the process is the same. We've run enough deployments to know exactly where things go wrong — and how to skip those parts entirely.

01
DISCOVERY CALL

30 minutes. Use case, hardware specs, any existing stack. We scope exactly what's needed — no upsell.

REMOTE ONLY
02
HARDWARE AUDIT

GPU, VRAM, RAM, storage profile. We match your hardware to the right model and quantization level.

REMOTE OR ON-SITE
03
DEPLOYMENT

Model pull, server config, API layer, monitoring. We do it live — you watch or we walk you through.

REMOTE OR ON-SITE
04
TESTING

Throughput benchmarks, TTFT measurements, endpoint verification. We don't leave until it's confirmed working.

LIVE SESSION
05
HANDOVER

Full documentation, runbooks, and a recorded session. Your team can manage it — or we handle ongoing support.

REMOTE OR ON-SITE
PACKAGES

Pick Your
Engagement

STARTER
Single Deploy

One model, one machine, fully working. Good for individuals and small teams trying local LLMs for the first time.

  • Hardware assessment
  • Model selection + quantization
  • Inference server setup (Ollama / llama.cpp)
  • OpenAI-compatible API endpoint
  • Basic documentation
  • RAG pipeline
  • Ongoing support
GET STARTED →
01
ONGOING
Managed Support

Retained support for teams who want someone on call. Model updates, troubleshooting, and new feature deployments as needed.

  • Priority response to issues
  • Model version updates
  • Performance tuning
  • New service deployments
  • Monthly check-in calls
  • Access to new tooling as it lands
  • On-site visits on request
ENQUIRE →
03
TOOLS WE USE

The Stack Behind Every Job

INFERENCE SERVERS
Ollama
llama.cpp
vLLM
LM Studio (local GUI)
TabbyAPI
MODELS
Llama 3.3 70B / 8B
Qwen 2.5 Coder 32B
Mistral Small 22B
Phi-4 14B
Gemma 3 · DeepSeek R2
RAG & ORCHESTRATION
LangChain · LlamaIndex
ChromaDB · Qdrant
pgvector (Postgres)
LangGraph · CrewAI
MCP (Model Context Protocol)
FINE-TUNING
Unsloth
Axolotl
TRL (HuggingFace)
LoRA · QLoRA · Full FT
GGUF / AWQ / GPTQ export
NEXT STEP

Tell Us What
You're Building

On-site or remote, one machine or a cluster — start with a quick message. We'll come back with a clear scope and timeline.