The Value of Data and AI: Competitive Advantages and Economics of Artificial Intelligence
By Siddharth Khetarpal
AI companies are different from software businesses in the sense that they have a human component which might increase costs and decrease margins. AI companies often require humans to train the model. The SaaS companies operate like other software businesses as the product is produced once and sold many times. This is not the case for AI companies as every new customer requires new engagement with their own costs beyond typical support. The major costs of AI firms are:
1. Cloud Infrastructure: With the dominance of SaaS, the cost of running the software, whether on servers or on desktops has moved back from the customer to the vendor. Therefore, most software companies pay big AWS or Azure bills every month. Training a single model can cost hundreds or thousands of dollars. This cost is often variable as the data that goes into a model changes over time (Data Drift).
Executing a long series of matrix multiplications just requires more math than reading a database.
AI applications utilize more data in the form of media such as images, audio, or video. These data consume high storage costs and often suffer from region of interest issues — an application may need to process a large file to find a small, relevant snippet. For example, an AI tool which diagnosis a medical condition based on patient scans such as CT, MRI etc. may detect a condition only if it has been trained on a scanner of a particular make and will not be able to detect from images of other scanners. More images from other scanner models might have to be fed to improve the ‘sensitivity’ of the model (see Viz.ai).
AI complexity is growing at an incredible rate and the processors, as they exist currently will not be able to keep up. The compute used in the largest AI training runs have increased by 2x over 3-4 months as compared to Moore’s Law which took 2 years to double computing power. This has been addressed by Distributed Computing wherein several processor cores are used to run an application.
2. Humans in the mix: There are two forms of these systems. First, training most of today’s state-of-the-art AI models involves the manual cleaning and labelling of large datasets. This process is laborious, expensive, and among the biggest barriers to more widespread adoption of AI. Plus, as we discussed above, training doesn’t end once a model is deployed. To maintain accuracy, new training data needs to be continually captured, labelled, and fed back into the system. Although techniques like drift detection and active learning can reduce the burden, anecdotal data shows that many companies spend up to 10–15% of revenue on this process — usually not counting core engineering resources — and suggests ongoing development work exceeds typical bug fixes and feature additions.
Second, for many tasks, especially those requiring greater cognitive reasoning, humans are often plugged into AI systems in real time. Social media companies, for example, employ thousands of human reviewers to augment AI-based moderation systems. Many autonomous vehicle systems include remote human operators, and most AI-based medical devices interface with physicians as joint decision makers. More and more startups are adopting this approach as the capabilities of modern AI systems are becoming better understood. A number of AI companies that planned to sell pure software products are increasingly bringing a services capability in-house and booking the associated costs.
Reducing one cost say, cloud cost, tends to increase the other i.e human cost as the two costs are linked. They both can be reduced though as AI models and underlying hardware improves.
The major issues with AI systems are edge cases which require the AI application to address cases that it hasn’t come across before. This is usually because users tend to put any data to process as AI apps have open ended user interphases.
Moats for AI Firms
The defences that software companies build are around aspects such as network effects, high switching costs and economies of scale.
Technical differentiation is harder to achieve and usually comes in the form of creating a complex piece of software. Data, which is the core of an AI system is often owned by customers and over time becomes a commodity. The moats for a AI appear to be shallower in this regard.
Some of the following are good ways of scaling and defending AI companies.
1. Eliminate Model Complexity: Firms which can use a single model for all customer versus ones that train a new model for every customer have huge advantages in terms of saved Cost of Goods Sold.
2. Plan for High Variable Costs: Deeply understanding the distribution of data feeding the models is valuable. Treat model maintenance and human failover as first-order problems. Track down and measure your real variable costs — don’t let them hide in R&D. Make conservative unit economic assumptions in your financial models, especially during a fundraise.
3. Plan for a change in the Tech Stack: Tech stacks are evolving and tying up an AI app to one stack is not suggested.
To summarize most AI systems today aren’t quite software, in the traditional sense. And AI businesses, as a result, don’t look exactly like software businesses. They involve ongoing human support and material variable costs. They often don’t scale quite as easily as we’d like. And strong defensibility — critical to the “build once / sell many times” software model — doesn’t seem to come for free.
This may be good news — Things like variable costs, scaling dynamics, and defensive moats are ultimately determined by markets — not individual companies. The fact that we’re seeing unfamiliar patterns in the data suggests AI companies are truly something new — pushing into new markets and building massive opportunities. There are already a number of great AI companies who have successfully navigated the idea maze and built products with consistently strong performance