Someone during my weekly Class, asked a very pertinent question. "Why Can't we build our own LLM"
It sounds exciting. Strategic. Even necessary. But if you zoom out and look at it practically, especially from a cost vs value lens, it usually doesn’t make sense.
So i did some research and a little maths, Let’s break it down in plain English.
What Training an LLM Actually Does
When people hear “train a model,” they imagine something like a database being filled up. Upload documents --> train model --> model remembers everything.
That’s not what happens. When an LLM is trained, it doesn’t store documents, files, or business data. It doesn’t remember your SAP tables or your Excel sheets.
Instead, it does something much more abstract. It learns patterns, and compresses those patterns intonumbers called weights.
Think about how humans learn. If you read a thousand finance reports, you don’t memorize each one line by line. You start recognizing patterns, how revenue is discussed, how risks are described, what a good vs bad quarter looks like.
That intuition is what an LLM builds. Except instead of intuition, it stores everything as billions of numerical weights.
The Reality of Building Even a “Small” LLM
Lets Say you want to build a LLM with a small 7 Billion Parameters.
Training a model like this requires an enormous amount of data, typically in the range of hundreds of billions of tokens(pieces of text). Not just raw data, but cleaned, deduplicated, high-quality data.
Then comes the compute.FLOPS (Floating-Point Operations per Second) is a measure of a computer’s ability to perform arithmetic calculations on real numbers.
This isn’t something you run on a few cloud instances over a weekend.
This is the kind of workload that requires specialized GPU clusters, high-speed networking, and serious engineering.
To Train a 7B Parameter a GPU will have to work close to ~30K Hours, Top Cloud providers rent this at $3/hour.
What Does It Actually Cost?
By the time you factor everything in, hardware, data pipelines, engineering talent, and the inevitable trial-and-error, you’re realistically looking at:
~$1 million to $3 million to build a 7B model properly.
And that’s just to train it once. Not to maintain it. Not to improve it. Not even to run it at scale. A model doesn't works only on one training, it has to be retrained atleast 4-5 times which means
~$4 million to $5 million for a 7B Model.The GPT-4 is widely believed to have roughly 1.7 - 1.8 Trillion Parameters.
The ROI Lens Changes Everything
If you look at this purely from a financial perspective, the difference is hard to ignore. Building your own model is a high upfront investment with uncertain returns and a long payback period.
Using existing models is a much smaller investment with faster returns and far less risk. And in most cases, the second option delivers 80–90% of the value at a fraction of the cost.
A Quick Note on India’s Push: Sarvam AI
It’s worth calling out efforts like Sarvam AI, which is building India-focused language models (often referred to informally as “India’s GPT”).
This is a great example of when building models does make sense.
Why? Because the goal isn’t just to replicate existing LLMs, it’s to solve uniquely Indian challenges:
Synopsis
Training an LLM doesn’t store your organisational knowledge.
It converts patterns into weights. And most companies don’t need new weights. They need better ways to turn their data into decisions.
One Simple Takeaway
Don’t build the brain. Use the brain. Focus on what it can do for your business.
Hi Mukul,
ReplyDeleteGreat post on Why Don’t we build our own LLM.
Well explained!
OpenAI India