I’ve been struck recently by the use of the term “scale” when it comes to discussions about artificial intelligence (AI). We use this term all the time. There is the oft quoted “the scale of AI” or the “potential” scale of AI, as well as the how to “scale” AI within organizations, across industries, or in markets and sectors. Then, of course, we have the implied scale arguments about “existential risk” from AI, which of course is to say that the massive and widespread effects of AI would become so damaging to the functioning (biological or otherwise) of the human race. Or we have, rightfully, discussions about the scale of AIs disparate impacts on protected classes of people, or just even the scale of systemic risks. Yet, what I have seen little about is the very question about “scale” itself, when it comes to AI (despite there being a company named “Scale AI”).
Why would we just assume away important discussions about scale? Well, from a garden variety answer it could just be that we all generally understand what we mean when someone talks or writes about it. The context of the conversation, or whatever, is the clue to what they mean. But, this answer is not satisfying. It is not satisfying because like all words that mean a multitude of things to a myriad of people, it hides important assumptions. These assumptions matter — a lot — when we want to do something, like take an action, pass a law, create a program, teach a class, train a group of people, make a bet, buy a stock, etc. etc. etc.
So, despite this being a blog about naval gazing on philosophical questions of scale, it is more about framing discussions about AI and how they matter when you want to do something, particularly related to AI ethics, AI safety, and the perennial and sometimes pernicious problems of measurement. Measuring things like effects and impact is notoriously challenging. «Just ask any academic who has to write an “impact statement” for their job evaluation.»
Wait a second, you say, we measure stuff all the time! We have economists, political scientists, biologists, chemists, physicists — quantum physicists! — who measure everything from large scale complex systems (like markets) to atoms in the universe. We have got measurement down. Sure, my skeptical side replies. We do measure stuff all the time. And scale is, nothing but, a system of measurement. We have to understand the units to make the scale, right?
Here is where the tricky bit comes into play. When we make decisions about the what to measure, and then decisions about how to measure it, and then decisions about systems of measurement (i.e. scales), there are also decisions about the importance of scale. An example might be helpful. Some ecologists argue that some systems are scale-dependent, meaning that unlike classic Newtonian physics, what you get depends on the size of the thing. But even if we agreed with the ecologists that there may be scale-dependent phenomena we still have to figure out the properties of measurement, and figuring out those properties involve not merely explicit scientific inquiry, but implicit value judgments. What we decide to include or to exclude actually matters.
For example, Stevens notes that “there is a certain isomorphism between what we can do with objects and the properties of the numeral series. In dealing with the aspects of objects we invoke empirical operations for determining equality (classifying), for rank-ordering, and for determining when differences and ratios between the aspects of objects are equal.” In other words, we are looking for equivalence in something (properties, structures, features, etc. that makes them isomorphic), and where there are differences, we then empirically note those in some way (usually numbers).
Ok, you say, you are following where I am going. To measure something is to find some equivalence, and then differences exist to the left, right, above, below, etc. in relation to that equivalence. When we put numbers (or even directions as my examples above) to those differences, then we have empirical measurements. But of course there are quantitative vs. qualitative differences. For instance, if I say I have an apple, and apples are a fruit, but I also have a banana, and bananas are also fruit, we would not say that apples = bananas. Just because they belong to the category of fruit does not mean that they are equivalent all the way down. We can go back to Aristotle here and his Categories if we really wanted to get pedantic.
OK Roff, you say, you’ve made your point. Scales are representations of measurement. We have all sorts of measurements. Spatial measurements, electromagnetic measurements, temperature is a measurement, even time is a measurement. And these measurements exist on scales.
Then what about the abuse of the word “scale” when it comes to AI? Well, I’d say as a good social scientist and political theorist, what is it that you are measuring? For one thing, are we measuring adoption of “AI”? By whom? For what? If we have “enterprise” AI tools, then those are often procured by large organizations to deploy across, you guessed it, their “enterprise.” So the buyers are companies or organizations. You might have a small pool of buyers, but a large number of users. (The buyer, for example might be an N=1, like the Department of Defense, but the user base is in the millions). What if you have a large number of potential buyers, but a small pool of potential users? Think of the very helpful or handy AI app that you could use on your phone, say, but that app is for something that not a lot of people use on a daily basis. The scale in one is different than the scale in the other.
We might say there are a lot of AI tools “at scale” but they aren’t being used. Or, there are a lot of AI tools within one population of people that are used, but not across all people. Seems to me that we need to have an idea of who is using what, when, and how, not merely say “there are X amount of companies pursuing AI” as the measurement of "the scale of AI,” or even that there is “X amount of dollars spent on AI.” That’s pretty bogus if we want to get down to finer grained details.
Then we have the really challenging problem of measuring the effects and/or the impact (depending upon your definition) of AI. I wouldn’t be me if I didn’t say - “well, what is the exact tool or product or use” of AI because “AI” writ large is meaningless unless you posit some isomorphic equivalence that we can find across all AI (which I highly doubt) that is in some ways meaningful and different than saying digital, computer, network, etc. etc.
Here is a recent one that caught my attention: Salesforce deployed 50 AI tools across its workforce, and it claims that in a 3 month period, those AI tools saved 50,000 hours of work. 50,000 hours amounts to 24 years worth of work. I have so many questions here. But perhaps now you too have questions about measurement. We have an AI tool, and it is used to help a worker complete a task. That assumes we have good data on how long those tasks usually take. The use of the AI tool thus systematically and observably (and reproducibly) reduces that task time by a factor or percentage of X. If I add up all the time saved, and I can really attribute it to the AI tool and nothing else, then I get to 50,000 hours. Seems pretty simple, right? The scale here is time. But the impact…. the value… is money. Ostensibly, if Salesforce pays its workers who use that tool $100/hr (just for the sake of ease), and you have 50,000 hours, then you would have saved $5 million dollars. That’s good for their bottom line (assuming the cost of developing and deploying the 50 AI tools was less than $5 million). If we like easy math, that means each AI tool would roughly need to cost $100,000 each to acquire (through whatever means) to pay for itself. The more they cost, the longer that amortization takes.
But, time and money are not the only values (in the double entendre sense), right? We could use a different scale. We could use a Likert scale to measure whether and to what extent the Salesforce employees actually like their new work flows. We could send out surveys to those individuals and ask questions about everything from how they feel, to perceptions of time management, to whether they are actually working less or whether they are working more just on other projects. They would self evaluate on a scale of say 1-5 to tell us how they are feeling. That might be one way to measure impact, yes? (I’m not even going to get into how to operationalize and measure concepts, like justice, fairness, etc. That’s a totally different blog.)
My beef with the abuse of the word “scale” as it relates to AI is that it covers a myriad of choices and assumes some sort of false equivalence across all AI and across all AI impacts or effects. On the one hand, there is a veneer of quantitative evaluation that may not in fact be present. On the other hand, there is an assumption that just because I can measure something means that it is now beyond the limits of contestation. This is sort of a hat tip to Feynman, who thinks that we need to be able to measure things to include them in our theories of phenomena. But, not to diss Feynman, not everyone agrees with a theory - or indeed your measurement. In other words, one man’s ceiling is another man’s floor.
Thus we have serious questions and avenues for research when it comes to the effects, impacts, and scale of AI because we need some way to explicitly identify what it is we are measuring, why, and how. These choices are by their nature value-loaded. They have normative importance, and there are structures of power at play in all of them. In short, they are normative value judgements about empirical values.
AI Safety is about providing, amongst other things, testing, evaluation, verification and validation of AI systems to make sure that the systems are not deemed unsafe… or more correctly… dangerous. If we can’t fully test such systems under all possible conditions, then we have a scale of risk. How we identify those risks, and then how we decide whether and to what extent to mitigate or to accept those risks are not really questions of measurement.
Indeed, even AI Ethics is about identifying not just harmful, but also beneficial, or even “wrongful” practices, behaviors, situations, systems, rules, or states of affairs. I make a division between harm and wrong here. Wrongs are rights violations. Harms may not necessarily violate rights. For example, if you set up a shop next to mine and put me out of business, I am harmed by this. But, it in no way violates my rights. (If you’re into this line of thought, please see Hofheld’s theory of rights, and one of my favorites, Joel Feinberg’s legal theories based on J.S. Mill’s Harm Principle. )
But even AI Ethics to date has been overly subscribed to issues around privacy and bias or overly generalized to “existential risk.” To do the hard work of an ethicist, that is applied moral philosophy, and applied to AI, means that, on the one hand much finer grained detail is needed, and on the other, more abstract and analytic thinking is required. Everyone and their cousin seems to have something to say about AI Ethics, without knowing much about AI or ethics.
But I digress. The simple fact remains that when someone says “scale” they are assuming a whole host of empirical and normative states of affairs. Moreover, you may not buy into those assumptions. It would do us all justice to pause for a second and question this phrase. Because familiarity breeds complacency. We need to figure out ways of creating not only “beneficial AI,” whatever that really means, but “good” science around the study of AI deployment. $5 million bucks may seem like a lot, unless it just so happens to cost a lot more in the long run.
It's always refreshing to ponder the difficult questions. "Well, I’d say as a good social scientist and political theorist, what is it that you are measuring? For one thing, are we measuring adoption of “AI”? By whom? For what? If we have “enterprise” AI tools, then those are often procured by large organizations to deploy across, you guessed it, their “enterprise.”
"It is not satisfying because like all words that mean a multitude of things to a myriad of people, it hides important assumptions. These assumptions matter — a lot — when we want to do something, like take an action". -- 🎯🎯🎯