Services — Jonestech

ON-SITE

We Come To You

For organisations that need physical access, hands-on setup, or simply prefer someone in the room. We travel nationwide to deploy directly on your hardware.

Physical installation on your servers or workstations
On-premises network configuration and firewall setup
Live handover and walkthrough with your team
Ideal for air-gapped or high-security environments
Same-day testing and sign-off before we leave

NATIONWIDE TRAVEL

REMOTE

Fully Remote Deployment

Everything delivered over SSH and screenshare. Faster to schedule, identical outcome. Most deployments complete in a single session — you're running by end of day.

Secure SSH access or screenshare session
Full stack deployed and tested live
Recorded session available for your team
Ideal for teams already comfortable with remote ops
Fastest path from zero to running inference

SAME-DAY AVAILABLE

WHAT WE DEPLOY

Four Core
Service Areas

Every engagement starts with understanding your use case and hardware. We deploy the right tool for the job — not whatever's trending. Each service is available on-site or remotely.

CORE SERVICE

ON-SITE + REMOTE

Local LLM Deployment

Full-stack installation of open-weight language models on your hardware. Model selection, quantization tuning, inference server setup, and an OpenAI-compatible API layer — done and tested before handover.

Supports Llama 3, Mistral, Qwen 2.5, Phi-4, Gemma 3

Backends: Ollama · llama.cpp · vLLM · LM Studio

Single machine to multi-GPU cluster

Windows, Linux, macOS — all supported

OpenAI-compatible endpoint — your apps work unchanged

DEPLOY NOW →

ADVANCED

ON-SITE + REMOTE

RAG Pipeline Build

Connect your local LLM to your documents, databases, or knowledge bases. A full Retrieval-Augmented Generation pipeline — vector DB, embeddings, ingestion, and query interface — entirely on-premise.

Vector DBs: ChromaDB · Qdrant · pgvector

Local embedding models (nomic-embed, bge)

Document ingestion: PDF, Markdown, SQL, web

LangChain / LlamaIndex integration

Chat UI or headless API — your choice

BUILD A RAG →

PERFORMANCE

REMOTE

Fine-tune & Adapt

Custom LoRA / QLoRA fine-tuning on your own data. Adapt a general model to your domain — legal, medical, customer support, code — with targeted training that never leaves your infrastructure.

LoRA, QLoRA, full fine-tune options

Unsloth · Axolotl · TRL training stacks

Your training data never leaves your machine

Eval benchmarking before and after

Merged model ready for production deployment

FINE-TUNE →

INTEGRATION

ON-SITE + REMOTE

Agent Frameworks

Turn your local LLM into an autonomous agent. Tool calling, multi-step reasoning, memory systems — all orchestrated on-premise with open-source frameworks and your existing APIs.

OpenAI-compatible function / tool calling

LangGraph · CrewAI · AutoGen

Local web search, code execution, APIs

MCP (Model Context Protocol) support

Memory: short-term context + vector long-term

BUILD AGENTS →

HOW IT GOES

Every Engagement,
Same Playbook

Whether you're on-site or remote, the process is the same. We've run enough deployments to know exactly where things go wrong — and how to skip those parts entirely.

◈

DISCOVERY CALL

30 minutes. Use case, hardware specs, any existing stack. We scope exactly what's needed — no upsell.

REMOTE ONLY

▲

HARDWARE AUDIT

GPU, VRAM, RAM, storage profile. We match your hardware to the right model and quantization level.

REMOTE OR ON-SITE

◎

DEPLOYMENT

Model pull, server config, API layer, monitoring. We do it live — you watch or we walk you through.

REMOTE OR ON-SITE

✦

TESTING

Throughput benchmarks, TTFT measurements, endpoint verification. We don't leave until it's confirmed working.

LIVE SESSION

→

HANDOVER

Full documentation, runbooks, and a recorded session. Your team can manage it — or we handle ongoing support.

REMOTE OR ON-SITE

PACKAGES

Pick Your
Engagement

STARTER

Single Deploy

One model, one machine, fully working. Good for individuals and small teams trying local LLMs for the first time.

Hardware assessment
Model selection + quantization
Inference server setup (Ollama / llama.cpp)
OpenAI-compatible API endpoint
Basic documentation
RAG pipeline
Ongoing support

GET STARTED →

FULL BUILD

Production Stack

Complete deployment with RAG, monitoring, and full handover. For teams ready to run AI in production.

Everything in Single Deploy
RAG pipeline (vector DB + embeddings)
Document ingestion pipeline
Load balancing + auto-restart
Monitoring and alerting setup
Full documentation + recorded session
1 month post-deployment support

SCOPE A PROJECT →

ONGOING

Managed Support

Retained support for teams who want someone on call. Model updates, troubleshooting, and new feature deployments as needed.

Priority response to issues
Model version updates
Performance tuning
New service deployments
Monthly check-in calls
Access to new tooling as it lands
On-site visits on request

ENQUIRE →

What We
Build For You

Four Core
Service Areas

Every Engagement,
Same Playbook

Pick Your
Engagement

The Stack Behind Every Job

Tell Us What
You're Building

What We Build For You

Four CoreService Areas

Every Engagement,Same Playbook

Pick YourEngagement

The Stack Behind Every Job

Tell Us WhatYou're Building

What We
Build For You

Four Core
Service Areas

Every Engagement,
Same Playbook

Pick Your
Engagement

Tell Us What
You're Building