aligning AI with human values here on this page, aligning LM human flourishing here on this page, aligning, aligning, aligning, aligning AI alignment what do these people mean by human values if they mean by human flourishing and this short sequence I'm gonna tell you, we'll show you they don't know what they need to take this question of what human flourishing or human values means, seriously, it changes the whole problem of AI alignment I'm Joe Edelman and I'm gonna be assisted in this with by Oliver QuickBird and Le Hain and this is a video essay in three parts. First what is human flourishing? Why is this question so important to AI alignment? Why is everybody getting wrong? Second, what wouldn't aligned AI? Once you have a good grip on human flourishing? What would an alignment I really look like? We'll call that wise AI chapter two. second installment of this video essay. Second Chapter, the city essay. And third, what else is necessary to make AI go well? What do we have to align besides one machine learning model? This will cover in the third chapter which we'll call full stack alignment Hi. I'm Joe Edelman. I'm known for XYZ. So I'm the guy to clear this up welcome
Hey, this is the first of three little video essays about what I'm calling full stack alignment. They cover how AI currently it's not really on the right path will make some of the biggest problems of the 20th century even worse. Explain that in this chapter where I talk about how different ways of talking about flourishing like whether human flourishing means people's preferences are satisfied or people are achieving their goals or what what does it mean? To align by human align AI to human values, etc. So that's what I'll talk about in this chapter. I'll try to get us clear together on the way we have to define flourishing to avoid doom. Deploy AI the right way. So this chapter covers everything from very personal topics of what it means to live well. To global topics about how markets function, and so on. And then the next chapters which you can find in the in the links in the description. The next chapters will be about wise AI and full stack alignment. Now why is AI is how I think AI needs to be to be to be good for society instead of adverse society. I think it needs to be wise and that chapter define wisdom define wise AI, give some demos by say I have a chap doesn't really make sense that this one so Why watch this one first. And then And then finally, in the third chapter, I talk about everything else besides the actual tech that would need to go well for AI to have good impact on society. Having some of the right tech is not enough. And I will give our plan oh yeah for the second talk. The second chapter. be giving some demos and I'll be assisted by Oliver clean queered over say hi. You'll see all of his face again in the second chapter. And then in the third chapter, talking about full stack alignment. And le Hain will help you with that. Chapter. Focusing especially in the relationship between public demands, public expectations and having AI go well for society. That will be one of three different non tech things that needs to go right. Say hi Ellie. So you'll see Ali again in chapter three All right. Let's go ahead with Chapter One.
So. In this first part of the talk, I want to show something that might sound super-abstract. But I'll try to show it in a concrete way.
The thing I want to show is, it matters how we talk about flourishing.
How we conceptualize what we want out of life... Like, what vocabulary we use for talking about it, makes a huge difference.
This is where we have to start.
Many of you probably know that “AI Alignment” is a field, and that, in this field, they use vague terms like "aligning AI with human values" or "aligning AI with human flourishing."
But the people in this field, they don’t know what human values are. So, they just try to skip that part. They say—oh, it’s some kind of mathematical function or other. We’ll find that out later. For now, let’s just keep that question open.
I want to try to show that we need to talk about what human flourishing is. What human values are. And that the vocabulary we pick for discussing this topic is hugely important—not just for agreeing about values, but for the whole project of AI alignment.
It simply can’t work if we don’t pick a vocabulary, and we need to pick the right one.
Let me start by pointing out four vocabularies we could use.
- Does human flourishing mean a world where we all accomplish our goals? In that case, AI would need to understand our goals and make them easier to accomplish. Or even accomplish them without us. Information about goals would need to be flowing around, and the mathematical function that AI is aligned with, would be shaped like goals are.
- Or does human flourishing mean a world where things are as each person prefers? In that case, AI would need to know all of our preferences, or guess them, and make the world as much like them as it can. The mathematical function that AI gets aligned with would be shaped kind of like the functions of welfare economics, which is based on preference satisfaction.
- Does human flourishing mean a world where all the people are super-happy? In that case AI would need to guess what makes people happy, and again, arrange the world to be happy-maxxing.
- Or, does it mean—and this is my favorite vocabulary—does it mean a world where people get to do the things they find most meaningful. This would require information about what’s meaningful to each person, or what could be. In that case, I’ll show that the mathematical function to align AI with, has a special form.
So these are just four vocabularies. I could list more. But let’s start here.
I want to show that these definitions of human flourishing are completely different. That the problem of “alignment” is a totally different one, depending on which you pick. It’s easier for some than for others.
And also, that these four vocabularies lead to totally different outcomes. They aren’t the same at all! We really need to pick one!
Goals vs Sources of Meaning
Let me start by picking out goals and sources of meaning. I’ll show that these vocabularies lead to very different things.
Let’s begin by generating some goals, and some sources of meaning. By asking ourselves some questions about each.
Often we ask questions like this:
- What's something that later today or tomorrow we want to get done?
- What's something we hope to get over quickly?
- What's something we hope to accelerate or delegate, to get someone else to do?
For me, I want to get this video done and published. I’d love to delegate the editing.
What about for you? What do you want to accelerate or delegate?
These are goals!
Okay, but what if we ask different questions? We can ask questions like:
- What's something in the next day or two, we want to linger on?
- Something we want to make sure to notice just because it's meaningful?
- Something we want to celebrate and cherish?
For me, I have a team meeting with a team I’m really grateful to have. I want to cherish that. It’s autumn in Berlin and the light is gorgeous, the leaves are bright yellow and orange. I want to linger on that light.
What about for you?
Call the things on the top our goals. We are intricately aware of them. They form a kind of superstructure—smaller goals fit into larger ones, and we are dimly aware of the whole tree. We make lists of goals, at different scales.
Call what’s on the bottom our sources of meaning. Or I often call them our values. We are much less articulate about them. We don’t see the same kinds of patterns in them, that we see in our goals.
What I want to show next, is that if you think flourishing is goals you’ll make certain kinds of things. If you think flourishing is meaning, you’ll make something else.
Imagine you’re building an educational platform. Are students there to pass the test or get a certification? Or are they there to follow their curiosity, to face their open questions, or to celebrate together the beauty and complexity of the world?
On top are things your customers want to get done, get over with; on the bottom, the things that make life worth living.
I guess you’re getting the sense that, if flourishing is goals, people are getting different things than if flourishing is meaning. In the latter case, some kind of caring AI would make sure they can linger on what’s meaningful to them, notice the things they find meaningful to notice, etc.
In the former case, they would be passing lots of tests!
Actually, it goes much further than that. You can think of human society as made up of various projects. Every company is a project. Every organization. Every relationship, in a way.
Some of these projects are really about goals.
These projects, I call funnels and tubes. Funnels and tubes are goal-driven.
I call something a funnel if it gets everybody to do the same thing, or work on the same goal. By this definition, the checkout area in a supermarket is a funnel. So are many organizations. There's one goal for everyone.
Similarly, I'll call something a tube if it gets people from where they are to their own goal. So Amazon, and all marketplaces, are tubes. So are Google searches. Tubes accelerate everyone to their own goal.
Both are goal-driven things, that everyone involved would accelerate if they could—to get the goal accomplished more quickly.
Many entrepreneurs see all design tasks as funnels and tubes. But that’s a big mistake.
Some things aren't designed around goals at all. Instead, they are about values, or exploration according to values. I'll call those exploratory spaces.
- spaces for exploratory thinking like your whiteboard, your journal, or a research lab.
- spaces for creativity, like jam sessions and brainstorms and creative tools
- spaces for chilling, like your living room
- spaces for vulnerability, like talks around a campfire, or a confession booth
- spaces for celebration like dance clubs, street riots, and festivals
These are not goal-driven. You know something is a space if you don't want it to be over quickly. You would accelerate an Amazon purchase if you could, or an uber ride, or an organizational goal. The things you do in a space are things that you would not accelerate.
Note: when I say you wouldn’t accelerate a space, I don’t mean it has to be pleasurable, or fun — I just mean there’s something meaningful about what you do there, something you’d miss out on if you accelerated it or delegated it away. That means it’s not just about getting to a goal.
So, I guess at this point you agree with me that the vocabulary matters, at least with these two vocabularies. If flourishing is discussed as a matter of goals, we have a world full of funnels and tubes. If flourishing is discussed as about meaning, we’d have many more spaces.
Which would you like the AI to make more of?
Short Time Horizons
Okay, okay. I’ve showed this with two of our vocabularies, but what about the other two?
Before I get into that, I want to call out a kind of meta-difference between goals and sources of meaning.
Actually, two of them.
First, goals are mostly things you have for a short time. A goal is achieved and then it’s over. They have this checkbox-aspect. In other words, goal-achievement is transactional. Funnels and tubes are perfect for goals, because they are transaction-scaling machines. Got a goal to find something out? Google it! Want to get somewhere? Get an uber! Want to get a million people to vote? Build a funnel!
Sources of meaning are different. If someone finds it meaningful to pursue open questions in science, or to be wildly creative, or to provide for their family — well, that source of meaning might not last for their whole life, but it will probably last way longer than their goals.
Because goals are something you have for a short time, they are more likely to be mistaken. Or the result of your current environment. And systems like funnels and tubes that work to help you achieve goals are more likely to cause overconsumption and addiction. If they can, they’ll generate a goal in you, then help you achieve it. All to get more transactions!
Sources of meaning aren’t like this. They’re non-transactional, so you can’t make your space arbitrarily meaningful just by running people through it faster. And—because they are observed over a longer time—sources of meaning are less likely to be mistaken, and harder to manipulate.
Okay, and here’s another meta-difference between goals and sources of meaning. Goals are things you have for reasons, and you can sometimes satisfy a goal in a way that doesn’t really address the underlying reason you have.
For instance, say I want to get skinnier, because I think women will like me better, and I’ll finally experience a love I deeply crave. Well, what happens if I do get skinner, but women still don’t like me? Or if they like me better, but I find I still can’t experience that love I craved?
Each goal has a reason. But if you imagine flourishing just as helping people achieve goals, then you risk helping them achieve all the wrong goals and not what they were really after.
Sources of meaning, again, are different here. We don’t need a reason to pursue open questions in science, or to be wildly creative, or to care for our family. I mean sure, there might be a historical reason we are into these things—maybe we met a scientist when we were young and they inspired us. But what I mean is, we aren’t doing the thing instrumentally—for a reason. Now, because of who we are, we just find it meaningful, in itself.
This, also, makes sources of meaning harder to manipulate. Advertising incepts goals into people by attaching new goals to reasons they already have. But that doesn’t work for sources of meaning. People can be inspired—like by meeting a scientist—but they can’t be given arbitrary sources of meaning that way.
Preferences and Feelings
So. Now that we’ve talked about those meta-reasons, let’s return to the other two vocabularies that I listed at the start.
- Does human flourishing mean a world where things are as each person prefers?
- Does it mean a world where people are super-happy?
Quick thinkers might have already figured this out.
Uh-oh: preferences and good feelings are both things you have for a short time, and for a reason!
That means they have the same problems that goals have. Preferences are manipulable, can be mistaken, can be a function of the environment, can cause overconsumption, etc.
In fact, I think preferences are even worse than goals!
See, preferences are usually picked from a short menu, provided by someone else. Think about it:
- When you vote, you choose an answer from a small list of options.
- When you purchase something, it’s usually from a menu or a catalogue.
- You express your preferences on social media by clicking, hearting, and liking, but you’re always clicking or liking something the algorithm already decided to show you. Usually, you’re shown no more than 20 things at a time.
This means preferences are even more manipulable than goals. And even more likely to be mistaken, or at least to not represent the true preference a perfectly informed person with every possible option would make.
Preferences are wal
Oh, and don’t even get me started on the manipulability of feelings!
I don’t even have to go into it. Each of you could probably list 20 ways people have manipulated your feelings, even without drugs.
Have I made my point?
So, my opening point was that it really matters what vocabulary you use to discuss what flourishing means. What kind of evidence counts as flourishing.
I haven’t even fully made this point. Later, I’ll talk about how these different vocabularies lead to different shapes in the mathematical function we are trying to align ML models with. With some of those shapes, alignment becomes much easier.
But I’m not going to talk about that yet.
What I hope I’ve already established, is that different vocabularies lead to very different worlds. The world of goal-flourishing is one full of funnels and tubes. The world of meaning-flourishing is full of spaces. The world of preference- and feeling-maxxing is full of manipulation—exactly the kind of manipulation that generates artificial preferences and extra feelings.
Which world would you rather live in?
Which World Do We Live In?
Actually—skip that—which world do we live in right now?
- We don’t live in a feeling-maxxed world, or people would be a lot happier. I guess even more people would be on Prozac. Or much stronger drugs. Or at sex parties.
- We don’t live in a meaning-maxxed world either. You can tell, because most big projects and companies aren’t spaces. They’re funnels and tubes.
I think we live in a goals and preferences society.
Actually, it’s worse than that. Because goals and preferences are easy to manipulate, and I think our goals and preferences have mostly been very manipulated.
We live in a fake goals and fake preferences society.
Here’s what happened:
Capitalism—and most especially Internet Capitalism—is amazing at funnels and tubes. So, over the course of the last century, as Capitalism really grew up, we saw an explosion of funnels and tubes.
At first, this provided a lot of value. People had real goals that were hard to achieve, and real preferences that were hard to satisfy. Capitalism helped a lot.
But, at some point, we ran out of legit goals, and legit preferences!
To keep the numbers up, we had to incept more and more preferences, more and more goals.
- Preferences can be manipulated by “structuring the menu”, as I mentioned earlier. You got a choice between Biden and Trump. Between Pepsi and Coke. Between 50 kinds of porn. Or 6 current memes. You can pay high rent in any of 3 major cities. You can drive or take transit. Actually scratch that—you have to drive. That menu has one option.
- Preferences and goals can also be incepted by making them seem “high-status”. You can make it high status to like skinny jeans, and people will express that preference, lest they look uncool. You can make it high status to launch a startup and sell it a year later, and people will do that, wasting their youth, hoping to stay in the status game. It’s easy! Just pay a celebrity to express the preference. Or even easier—pay to amplify the stories of people who already express the preference. Make it seem more common, and cooler, than it really is.
So: we're not seeing people's true preferences, or true goals. We're seeing fake preferences and goals. Skill at these kinds of manipulation is actually what our society measures—and considers as “flourishing”—right now.
All to drive transactions.
Which is not what anyone actually wants!
We had to create a kind of an endless hustle. A consumption society. A treadmill society.
This is what people mean, when they say “late capitalism”.
We had to keep those numbers going up. But there's only so much organic goal achievement, so much organic preference satisfaction, we can do. So we started to fake it.
Eventually, there were so many fake preferences and fake goals to keep up with, that we couldn’t attend to our sources of meaning at all. And there were so many funnels and tubes being made by entrepreneurs, and no one making spaces.
Even the spaces we used to have—which were about real meaning—got replaced.
- For instance, democracies used to be spaces, and now they are funnels. They used to be town halls, citizen assemblies, debate and discourse. They turned into funnels for riling up voters. A whirlwind of ideological battles and preference manipulation.
- Something similar happened in other areas—in education, science, and art. All these were space dominated 60 or 80 years ago. We’ve seen declines in pure science research, decay in the halls of academia. Schools got converted into career funnels. Research labs became citation funnels. These were supposed to be spaces for debate and exploration! Now its citations, tenure, etc. Fake goals!
This Explains a Lot of Things
I think this story explains a lot. For one—it explains why people in modern societies are so lonely. Why the elderly are locked alone in rooms, watching TV. Why teenagers and 20-somethings aren’t dating like they used to.
It’s the decline of spaces. Spaces getting replaced by funnels and tubes. See, spaces are often collective. Funnels and tubes are more efficient when they’re individualized.
So, modern society gives people individual experiences, when what they want are collective ones. People clearly want things like belonging, connection, community, love, they want adventures together.
In most developed countries, people used to hang out on the porch with their neighbors, or in pubs; then they started watching TV as a family; then they switched to multiple TVs, one in each room in the house; finally the TVs got upgraded to smartphones. Each person staring at their own rectangle of glass.
Over the same period, church communities got replaced by individualized yoga classes; dating and friend groups got replaced by swipe-based apps and porn.
Another things this story explains, is why modern societies became so dysfunctional, even as they got rich. Even as the numbers went up.
See, sources of meaning, and spaces, are part of what makes democracies, and science, and love—what makes them work. There’s no good science without spaces for pursuing open questions! For studying the beauty and complexity of the world! For following curiosity!
No amount of citations will make up for that.
It’s the same with democracy, or education. Spaces make them work.
So: when a proliferation of fake goals and preferences crowds out our spaces, that’s a dysfunctional society. It seems to be working: people seem to be getting their preferences met, and their goals achieved.
But everything personally and institutionally meaningful has been eliminated. Everything has become a funnel, to keeps people engaged. But the meaningful bit—the thing that made the whole thing work—is gone.
A third thing this story explains, I think, is the modern rise of ideology. Ideology is a way of holding society together. Of enforcing compliance. A shared belief system.
Now: we need to hold society together. If we stop coordinating and cooperating, things get ugly. But do we need to do it with ideology? No. Ideology isn’t the main way of holding society together. Spaces are!
It’s shared spaces and shared meaning that are best at generating coordination and cooperation. Ideology is a fallback: it’s a kind of social glue that doesn’t require spaces. But these days, it’s all we have.
Finally, I think this story explains why modern life just, kinda sucks.
We find ourselves increasingly in a world of endless funnels and tubes. Another way of saying that, is that life feels more and more like bureaucracy. Every encounter becomes about getting something done and moving on. Sometimes it's your own goals; sometimes, other people's. One checkbox brings you to the next.
We use bureaucracies for our sex lives, filling out forms and swiping through piles of profiles. Transacting. We use bureaucracies as self help or personal growth—checking checkboxes after we drink water, or go for a run.
People race around, looking for spaces, but not finding them. Meanwhiles, funnels dangle the vague promise of a space in front of them: "buy this beer and be loved by friends", "take this online course, get rich, then you'll be able to relax and explore".
But even the drunk, rich people lack spaces.
So. Let’s review.
In this chapter, I claimed it makes a difference which vocabulary you use, to talk about flourishing. I compared four such vocabularies: goals, preferences, feelings, and sources of meaning.
I tried to show that our current society has, as it’s main vocabularies, goals and preferences. And there are some problems with that.
Let me now turn to the relevance of all of this, for AI.
Relevance for AI
One thing to remember, is that we already live in a society with a lot of ML models behind the scenes. Markets, advertising, and social media recommenders, these days, involve a lot of gradient descent.
And these AIs that we already have are optimized…. for preferences and goals. For transactions.
I think this would normally be very hard to change. Preferences- and goals-optimized models are everywhere! And the people themselves predominantly use the vocabulary of preferences and goals. How the hell could we change it?
Well, there’s good news and bad news.
- The good news, is that a new, much more powerful type of ML model has just been developed. And it’s going to replace many of the previous systems, that were goals and preferences optimized. It will replace social media recommenders, of course. But I think it will also replace democratic processes to a large degree, and market structures, and advertising.
- Another piece of good news, is that these new, more powerful models — the LLMs — are good at helping people understand what’s meaningful to them. They can ask users questions and get below their preferences and goals, to the sources of meaning underneath.
- Okay, and the bad news, is that these new models are currently being optimized… for preferences and goals.
And it’s going to do all that quickly. Quickly enough that it could change the game.
So: we have a once-in-a-lifetime chance to change the system. To replace social media, markets, and democracies with new structures that are more resistant to manipulation, and that understand what’s meaningful to us.
The rest of this talk is about how to actually pull that off.
In the next chapter, chapter 2, I focus on what it’s like to align ML systems with meaning-flourishing. We call that “Wise AI”. I’ll say why the alignment problem is mathematically easier with Wise AI than with other notions of flourishing. I’ll say why Wise AI is safer. And I’ll show early demos of what a Wise AI can look like.
And then in Chapter 3, I’ll talk about everything else that would need to align, to actually pull of this large-scale social change. How people could become articulate about sources of meaning; how the AI labs could race to make wise models, not smart ones; and how large financial and political actors could be displaced by new institutions that put Wise AI at their center.
But if you look at long-term trends, we get less of these things year-by-year.