The Value of Data and AI: Competitive Advantages and Economics of Machine Learning
By Siddharth Khetarpal
Data exhibits decreasing marginal returns to scale like any other factor of production. The same principle applied to machine learning. An example of this is the Stanford Dog Breed Classification and how it behaves as the training data increases. The accuracy improves as the number of training images increases, but it does so at a decreasing rate.
The below figure shows how the error rate in the ImageNet Competition has declined over the last several years. An important fact about this competition is that the number of training and test observations have been fixed during this period. This means that the improved performance of the winning systems cannot depend on sample size since it has been constant. Other factors such as improved algorithms, improved hardware, and improved expertise have been much more important than the number of observations in the training data.
Firm Size and Boundaries: Will ML increase or decrease minimum efficient scale? The answer depends on the relationship between fixed costs and variable costs. If firms have to spend significant amounts to develop customized solutions to their problems, we might expect that fixed costs are significant and firm size must be large to amortize those costs. On the other hand, if firms can buy off-the-shelf services from cloud vendors, we would expect that fixed costs and minimum efficient scale to be small.
Suppose, for example, that an oil change service would like to greet returning customers by name. They can accomplish this using a database that joins license plate numbers with customer names and service history. It would be prohibitively expensive for a small provider to write the software to enable this, so only the large chains could provide such services. On the other hand, a third party might develop a smartphone app that could provide this service for a nominal cost. This service might allow minimum efficient scale to decrease. The same considerations apply for other small service providers such as restaurants, dry cleaners, or convenience stores.
Pricing: The availability of cloud computing and machine learning offers lots of opportunities to adjust prices based on customer characteristics. Auctions and other novel pricing mechanism can be implemented easily. The fact that prices can be so easily adjust implies that various forms of differential pricing can be implemented. However, it must be remembered that customers are not helpless; they can also avail themselves of enhanced search capabilities. For example, airlines can adopt strategies that tie purchase price to departure date. But services can be created that reverse-engineer the airline algorithms and advise consumers about when to purchase.
Price Differentiation: Traditionally, price differentiation has been classified into three categories:
1. First degree (personalized),
2. Second degree (versioning: same price menu for all consumers, but prices vary with respect to quantity or quality),
3. Third degree (group pricing based on membership)
Fully personalized pricing is unrealistic, but prices based on fine grained features of consumers may well be feasible, so the line between third degree and first degree is becoming somewhat blurred. Shiller  and Dube  have investigated how much consumer surplus can be extracted using ML models.
Second-degree price discrimination can also be viewed as pricing by group membership, but recognizing the endogeneity of group membership and behavior. Machine learning using observational data will be of limited help in designing such pricing schemes. However, reinforcement learning techniques such as multi-armed bandits may be useful.
According to most non-economists the only thing worse that price differentiation is price discrimination! However, most economists recognize that price differentiation is often beneficial from both an efficiency and an equity point of view. Price differentiation allows markets to be served that would otherwise not be served and often those unserved markets involve low-income consumers.
Returns to scale: There are at least 3 types of returns to scale that could be relevant for machine learning.
1. Classical supply side returns to scale (decreasing average cost)
2. Demand side returns to scale (network effects)
3. Learning by doing (improvement in quality or decrease in cost due to experience).
1. Supply side returns to scale: It might seem like software is the paradigm case of supply side returns to scale: there is a large fixed cost of developing the software, and a small variable cost of distributing it. But if we compare this admittedly simple model to the real world, there is an immediate problem. Software development is not a one-time operation; almost all software is updated and improved over time. Mobile phone operating systems are a case in point: there are often monthly release of bug fixes and security improvements, coupled with yearly releases of major upgrades. Note how different this is from physical goods — -true, there are bug fixes for mechanical problems in a car, but the capabilities of the car remain more-or-less constant over time. A notable exception is the Tesla brand, where new updated operating systems are released periodically.
As more and more products become network enabled, we can expect to see this happen more often. Your TV, which used to be a static device, will be able to learn new tricks. Many TVs now have voice interaction and we can expect that machine learning will continue to advance in this area. This means that your TV will become more and more adept at communication and likely will become better at discerning your preferences for various sorts of content. The same goes for other appliances — -their capabilities will no longer be fixed at time of sale, but will evolve over time. This raises interesting economic questions about the distinction between goods and services. When someone buys a mobile phone, a TV, or a car, they are not just buying a static good, but rather a device that allows them to access a whole panoply of services. This, in turn, raises a whole range of questions about pricing and product design.
2. Demand side returns to scale: Demand side economies of scale, or network effects, come in different varieties. There are direct network effects, where the value of a product or service to an incremental adopter depends on the total number of other adopters; and there are indirect network effects where there are two or more types of complementary adopters. Users prefer an operating system with lots of applications and developers prefer operating systems with lots of users. Direct network effects could be relevant to choices of programming languages used in machine learning systems, but the major languages are open source. Similarly, it is possible that prospective users might prefer cloud providers that have a lot of other users.
However, it seems that this is no different than many other industries. Automobile purchasers may well have a preference for popular brands since dealers, repair shops, parts, and mechanics are readily available. There is a concept that is circulating among lawyers and regulators called “data network effects”. The model is that a firm with more customers can collect more data and use this data to improve its product. This is often true — the prospect of improving operations is that makes ML attractive — -but it is hardly novel. And it is certainly not a network effect! This is essentially a supply-side effect known as “learning by doing” (also known as the “experience curve” or “learning curve”.) For a complete description of Network Effects, refer NFX’s Network Effects Bible.
3. Learning by doing: Learning by doing is generally modelled as a process where unit costs decline (or quality increases) as cumulative production or investment increases. The rough rule of thumb is that a doubling of output leads to a unit cost decline of 10 to 25 percent. Though the reasons for this efficiency increase are not firmly established, the important point is that learning by doing requires intention and investment by the firm and described in Stiglitz and Greenwald . This distinguishes learning-by-doing from demand-side or supply-side network effects that are typically thought to be more-or-less automatic. This is not true either; entire books have been written about strategic behaviour in the presence of network effects. But there is an important difference between learning-by-doing and so-called “data network effects”. A company can have huge amounts of data, but if it does nothing with the data, it produces no value.
The problem is not lack of resources, but is lack of skills. A company that has data but no one to analyse it is in a poor position to take advantage of that data. If there is no existing expertise internally, it is hard to make intelligent choices about what skills are needed and how to find and hire people with those skills. Hiring good people has always been a critical issue for competitive advantage. But since the widespread availability of data is comparatively recent, this problem is particularly acute. Automobile companies can hire people who know how to build automobiles since that is part of their core competency. They may or may not have sufficient internal expertise to hire good data scientists, which is why we can expect to see heterogeneity in productivity as this new skill percolates through the labor markets.
It has been known for decades that there are many equilibria in repeated games. The central result in this area is the so-called “folk theorem”, which says that virtually any outcome can be achieved as an equilibrium in a repeated game.
Interaction of oligopolists can be viewed as a repeated game and in this case, particular attention is focused on collusive outcomes. There are very simple strategies that can be used to facilitate collusion.
Rapid Response Equilibrium: For example, consider the classic example of two gas stations across the street from each other who can change prices quickly and serve a fixed population of consumers. Initially they are both pricing above marginal cost. If one drops its price by a penny, the other quickly matches the price. In this case, both gas stations do worse off because they are selling at a lower price. Hence, there is no reward to price cutting, and high prices prevail.
Repeated Prisoner’s Dilemma. In the early 1980s, Robert Axelrod conducted a prisoner’s dilemma tournament. Researches submitted algorithmic strategies that were played against each other repeatedly. The winner by a large margin was a simple strategy submitted by Anatol Rapoport called “tit for tat”. In this strategy, each side starts out cooperating (charging high prices). If either player defects (cuts its price) the other player matches. Axelrod then constructed a tournament where strategies reproduced according to their payoffs in the competition. He found that the best performing strategies were very similar to tit-for-tat. This suggests that artificial agents might learn to play cooperative strategies in a classic duopoly game.
Pricing of ML services: As with any other information-based industry, software is costly to produce and cheap to reproduce. As noted above, computer hardware also exhibits at least constant returns to scale due to the ease of replicating hardware installations at the level of the chip, motherboard, racks or data centers themselves. If services become highly standardized then it is easy to fall into Bertrand-like price cutting. Even in these early days, machine pricing appears intensely competitive. For example, image recognition services cost about a tenth-of-a-cent per image at all major cloud providers. Presumably we will see vendors try to differentiate themselves along dimensions of speed and accuracy. Those firms can provide better services may be able to charge premium prices, to the extent that users are willing to pay for premium service. However, current speeds and accuracy are very high and it is unclear how users value further improvement in these dimensions.
There are relevant papers included in the reference which talk about the future of AI for interested reader.
- NBER Working Paper Series — Exploring the impact of Artificial Intelligence : Prediction versus Judgement by Ajay K. Agrawal, Joshua S. Gans and Avi Goldfarb
- NBER Working Paper Series — Artificial Intelligence, Economics, and industrial Organization by Hal Varian