I went all-in on AI on March 1, 2023 — my first conversation with ChatGPT. Everything since then has been building on that commitment.

By May 2023, I had my own API client. By March 2024, I was calling Claude Opus directly. But I was still dependent on other people’s servers. This fall, I decided to change that. Enterprises are interested in keeping their IP secure, and the answer is running your LLM locally.  You control it.

In November I installed Ubuntu on a spare workstation with an 8GB GPU. Within a week I was running Llama 3.2, Gemma 2, and vision models locally. I set up embeddings for RAG. By late November I had DeepSeek V2 running, weeks before anyone was talking about it.

By December I was running Llama 3.1 405B — with CPU offloading. Slow, but it worked. Then I upgraded to an RTX 4090. Then vLLM for faster inference. Then LangChain, ChromaDB, Faster-Whisper for speech-to-text, and TTS for voice output.

In sixty days I went from nothing to a complete local AI system: language models, embeddings, retrieval, voice in, voice out. All running on my own hardware. No data leaving my network.

Now I’m running an EPYC server with 120GB of VRAM, and over 384GB of memory, and running local models with GPT-4 levels of intelligence.

This isn’t a hobby. It’s infrastructure for the work I’m doing with clients on complex planning and strategy problems. Enterprise AI requires privacy. It requires control. You can’t upload sensitive data to someone else’s servers and hope for the best.

The commitment was made in March 2023. The infrastructure caught up in January 2025.

Did you like the article

Don't miss next posts

If You want to receive notifications about my next post please subscribe

Leave a Reply

avatar
  Subscribe  
Notify of