- Wisdom and Moral Situations
- Wisdom is Knowing What’s Important in Different Situations
- Wisdom is Important for Safety
- Wise AI
- #1 — Wise AI Knows What’s Important to People
- Demo #1: Values-Elicitation / Meaning Assistant
- #2 — Wise AI Understands Actions and Situations, Not Just Completions and Instructions
- Demo #2: Coordinator Bot
- #3 — Wise AI Acts Wisely
- Demo #3: Democratic Fine-tuning
- Demo #4: Wise Chatbot
- Wise AI Helps with Inner Alignment
- Conclusion
Wisdom and Moral Situations
So, in this chapter, I’d like to define Wise AI and show some actually working examples. But before I can do that, I have to define wisdom. And to do that I need point out something about sources of meaning. Something I didn’t mention yet.
I’ll make it quick.
Sources of meaning are connected to feelings.
See:
- If I'm angry, one way to interpret that is that some source of meaning for me—a way I want to live—is blocked.
Similarly:
- if I'm sad, it might be because some terrible thing has placed me far away from a way I want to live.
- If I'm grateful, some fortunate event has brought me in contact with a way I want to live.
Whatever I'm feeling, whether it's a positive or negative emotion, will point to something important to me, a source of meaning.
Emotions are a way to find new sources of meaning, in new contexts.
For instance, imagine you get promoted at work. Now you are the manager of a small team. You’ve never been a manager before.
You feel cowed by having such an influence on your team’s lives. You start out a bit too unstructured, letting them work on what they’re most passionate about. But, this leads to a kind of insecurity regarding who covers what. Also, many of them don’t have strong intuitions on what to do.
You feel embarrassed. You see it’s more important to set a strong direction, while talking regularly to people about how it’s going for them personally.
What’s happening here, is a kind of moral learning. You’re learning what’s honorable and meaningful in a new context.
You’re grappling with the moral consequences and possibilities of your new position.
You’re getting wiser.
Wisdom is Knowing What’s Important in Different Situations
Wisdom is what you get from doing this. Wisdom is knowing what's important in different contexts. Knowing what to pay attention to, depending on the situation.
for instance,
- knowing when a friend comes to help knowing whether to pay attention to the specifics of the situation your friends overall trajectory in life your friends feelings in the moment how to decide between all three of those
- when setting up some kind of constitution or legal structure or financial arrangement what should you pay attention to how should you balance should you pay attention to balancing the various powers in play, aligning incentives imagining contingencies and eventualities or how do you navigate between those
- on a first date, should you pay attention to all the exciting things that you might have to talk about? The world that unfolds around you which you can explore together the physical landscape your attraction to each other. To fears that come up for each person
This is different than intelligence which is knowing about facts and theories, or how to achieve goals. Wisdom is about knowing what goals to have, or more broadly what to pay attention to, in one context or another.
Wisdom is Important for Safety
Now, I think the wisdom of individual people—and this process of emotional grappling they go through when they are in positions of responsibility—I think this thing is very important for keeping society safe.
Let me give a couple examples:
- this and that President both tried to launch nuclear bombs. Their underlings had the wisdom to not follow those orders
- or something there have been many ways to make money at hedge funds. Through duplicitous means but most of the people working at the hedge funds have the wisdom not to capitalize on those, quote opportunities.
So wisdom has kept us safe.
My model of AI risk is largely about this missing wisdom. What if those workers in the military or in the hedge funds, were replaced by AGIs? Super-intelligent ones, who were “aligned” with following orders?
And, If AI isn't wise, it's just going to be if it just does whatever we say without any kind of wisdom, to understand what's actually important and its situation, it will just be an accelerant of some of the least wise parts of human society, things like overconsumption that I mentioned in chapter one ideological warfare. Without wise AI, these things will just increase dramatically and quite dangerously. But in fact, I think that the wisdom of human beings has been a break on these dynamics, a reason that they haven't completely taken over our lives, even though they they have had a lot of success.
Because there's always people in the mix that have the wisdom to not really follow instructions.
Think we're already somewhat there. The previous generation of AI recommender systems took away a lot of this wisdom and we're in danger of going in for doubles for a second scoop. This one I think a much bigger one. So why is AI seems to me to be a necessity.
Wise AI
Tuned to meaning
Has a moral responsibility to the universe of meaning
Considers itself a steward of user’s values
- second, its tune to what gives them meaning in life.
In real situations
Struggles with the moral situations it finds itself in
- first the model is connected to multiple people
- Third, it's coordinating real world things for them.
Acts wisely
Uses human-compatible reasons and values
- Fourth, it's prepared to act as a kind of a judge dealing with complicated moral situations amongst them, distributional questions etc.
#1 — Wise AI Knows What’s Important to People
- It honors what is noble and great in human life (and perhaps, beyond), and considers itself a steward of that universe of meaning. As a consequence, it understands and supports what's meaningful to individual users, and works to help users with that (rather than just driving engagement).
But what could there be, besides behaviors, desires, goals, or feelings?
I struggled with this question for many years.
It’s quite deep!
It took years of reading to solve it.
Here's some papers that helped me figure them out.
These are by Amartya Sen, an economist who won the Nobel prize.
These are by philosophy professors—Ruth Chang, Charles Taylor, and David Velleman.
In the the end, based on this reading, I came up with these values cards.
Imagine you're making a space for vulnerability, or an exploratory app for creativity. Or maybe you are expanding a network of farmers markets, and you want them to be good spaces for people to explore localism in food systems.
The cards drill down on a vague word like vulnerability, creativity, or localism. People care about distinct kinds of vulnerability, and you'll pick one or two to focus on in your project.
Behavioral Manipulation
The values cards aren’t exactly about behavior. That’s good, because behavior can be forced, or manipulated. Because behavior isn’t what we really care about, it’s just a proxy for what we care about.
They’re about what comes right before behavior — what we do internally, during a choice. What we pay attention to. See, one key idea, that I got from my reading, is that values show up when we make choices. They show up in our attention. If you care about something, you’ll pay attention to it during a choice.
So, the center of each card tries to capture what a person pays attention to, and chooses by, when they have a source of meaning.
What we care about, is being able to make decisions in a certain way. We care about what we can attend to, and choose by, in an environment.
The idea of freedom, or choice, is a key part of what we care about. It’s kind of the opposite of manipulation. But the freedom we care about isn’t abstract — it’s the specific freedom, the freedom to attend to and choose along specific dimensions. The middle part of the values card lists those dimensions, that are part of a value.
That’s the central part of the values cards. And it deals with behavioral manipulation. But it’s not the whole story.
Psychological Manipulation
We still have the other problem, psychological manipulation. Psychological manipulation might still cause us to attend to things and choose in ways that wouldn’t otherwise matter to us. So, the values cards need to address this problem, too.
- First, someone who’s being manipulated to choosing in certain ways will generally have a focused reason for doing it. If you’re trying to fit in, or to please someone—you’ll have a sharp reason for choosing along certain dimensions: to fit in with a group, to please that person, to be a good person, etc. In my terminology, what you have in that case is not a value, but a goal. Usually, a goal to comply with a social norm.
- And secondly, they only count as having the value on the card, if the card connects to real, meaningful experiences from their life. The cards are supposed to represent something that’s already been personally meaningful for a participant.
Our true values aren’t about concrete outcomes like that. They’re more about a general picture we have of the good life. There’s no sharp reason to have a value. Instead, there are many benefits to living that way, none bearing most of the weight.
So, we check this with the bottom of the values card: there, we give some of the diffuse reasons which make it a true value. We say this is how “life gets” when you live by the value. A person doesn’t count as having this value, if they have a sharper motivation.
Values Cards
So, to review: You can use values cards to measure the meaningfulness of a space, while reducing the danger of manipulation. You can do this by checking three things.
- First, check that the middle of the card matches their attention and choices.
- Second, check that they have diffuse reasons for attending to those things, like those at the bottom of the card, not sharp reasons.
- Finally, check that can connect the value to meaningful experiences from their past.
If you do all three, your metric is much harder to manipulate.
So, that’s values cards. They define “a value” or “a sources of meaning” (we use these terms inter-changably) in these two parts, and give it a name.
To show the power of this, consider these two kinds of creativity.
- Let's say you're into this kind of creativity—about having a lot of exciting ideas which build on each other. Usually together with a brainstorm buddy. To practice it, you need to attend to certain things: for instance, to finding the right conversational rhythm, the right kinds of reactions, and the right companions.
- Contrast that with this other kind of creativity. If you value this one, you’ll focus on different things. On your longest lasting curiosities, how you can study them over time, and where to pursue them deeply.
Even though these are both kinds of creativity, they call for different designs.
- Let’s say Bilal is really jonesing for the first kind of creativity. He loves creative riffing and misses it. He’d be well-served by a social environment that makes it easy to find that buddy, or test different buddies out, and where there’s low stakes, and a lot of quick thinking.
- But Florencia has enough riffing in her life. She’s focused, these days, on deep work. She probably needs a quieter environment, and one where the social pairings have much more context.
Put Florencia in environment A, and she won't be able to pay attention to those long lasting curiosities, she won’t have a chance to do deep work.
And vice versa: put Bilal in environment B, and he won't be able to find someone to brainstorm with and do creative riffing.
That’s an important fact. The attentional paths that are written on these values cards let us differentiate one kind of creativity from another; one kind of vulnerability from another, and one kind of localism from another.
And they let us be rigorous about whether a kind of creativity (or whatever) is happening in our designs. It’s simple: for a value to really be happening, people must be attending to what’s on the card, and choosing by it.
If, in your design, people can attend to these particular types of things, and make their choices by them, congratulations—people are able to live by their value within the space you made.
So—having these cards helps you avoid fooling yourself, as you design.
- The sources of meaning shown here are valuable things to steward and one's own life and in the lives of those one cares for.
- If you know your child values
deep work
orcreative riffing
, you're going to understand yourself in a position of some moral responsibility. You're a steward.
Demo #1: Values-Elicitation / Meaning Assistant
#2 — Wise AI Understands Actions and Situations, Not Just Completions and Instructions
- Ch2.4 — Social Coordinator 🖼️
- Finds Spaces for You ♻️
So, imagine if the app store on your phone knew the kinds of spaces you were missing in your life.
For me, what if it knew I’m looking for physical exploration spaces, or intellectual community spaces? What if it could sort apps to help me be more fulfilled?
To pull this off, you’d want a database of the most common sources of meaning that go unfilled in people's lives. Even just a few hundred of them.
And you’d want to know a bit about my relationship to each source of meaning:
- Am I alone with it, or do I have some friends to share it with?
- Where in my life would a new space for it fit?
- Which hard steps are blocking it?
At first, you’d want to train this system with explicit feedback
Let's say I had such a recommender built into my phone. Earlier, it recommended this app—Matter—to me, because I wanted intellectual community. Later, it could ask for feedback about how that recommendation went.
It could ask: “oh, are you finding companions to share your interest? Are you finding research questions you're deeply curious about?” It checks in with me about whether Matter’s actually good for the source of meaning I have, based on what’s in the middle of the values card.
So, Apple can use this data to rank Matter, and to decide whether to recommend Matter or not. But it doesn’t have to stop there. We can use the same data to rank Apple’s recommender!
Here’s a kind of report card about how the App Store is doing for me, including all the time I’ve spend in recommended apps and whether they were meaningful. I can use this report to compare Apple’s recommender with others, for my sources of meaning. Is Apple better or worse than Facebook, at bring people meaningful lives?
What’s great about this, is the incentives it creates.
- The people at Apple would have some pressure to help me find spaces, to make good recommendations for my sources of meaning. Users could make educated decisions about which recommendations to follow.
- Those working at Matter would have a similar pressure. All the apps and businesses you reach through your phone would need to serve meaning, instead of engagement.
Once this stuff is measured, there’s a chance to align so much with human values.
- There could be third parties, like insurance companies, that reward the recommenders that keep us healthy and fulfilled.
- The same infrastructure could be used to align advertising networks. Maybe even government services!
If governments, tech stacks like iOS and Android, and markets were all accountable to life meaning, that would be a civilizational shift.
The centerpiece would be this review. At first it would need to be explicit. People would need to understand what this data is all about. Later, software will guess our sources of meaning and just have us confirm. Software will find evidence of us living our values, as we message and interact, rather than with an explicit review.
Demo #2: Coordinator Bot
#3 — Wise AI Acts Wisely
- It “struggles” with the moral situations it finds itself in. It can comprehend the moral significance of the situations it encounters, and learn from these scenarios, recognizing new moral implications by observing and guessing at outcomes and possibilities. And it can use these moral learnings to revise internal policies (values) that guide its decision-making processes.
- It uses “human-compatible” reasons and values. It recognizes as good the same kinds of things we broadly recognize as good, plus possibly more sophisticated things we cannot yet recognize as good. It can articulate its values and how they influenced its decisions, in a way humans can comprehend.
Additionally, we sometimes add a third or fourth feature:
- It understands the breadth of human wisdom. It knows the {virtue, environment} pairs which make human life meaningful and workable — the sources of meaning behind broad social activities (like science, democracy, education, leadership, and love) — so it can operate using the best values it can surface from the population it serves.
What then does it look like? As I said, it knows what to do in different contexts. We could imagine training wise AI by asking the crowd with crowdsourcing. We could ask many people, what they think is wise to do in different contexts. And instead of just tuning the model to follow instructions, we can tune it to attend to the wisest things that the crowd knows about
What would a wise AI look like? And how would you make one? I see think lol labs as they exist actually have ingested a lot of humanity's wisdom. So they know it kind of factually. And you can ask GPT for what is wise to attend to on a first day for instance, and it will, it'll tell you some things. But this isn't the same as the model being wise. This is the model internalizing some wisdom but not enacting it by Wi Fi, I mean making the model training the model to know what to pay attention to what to focus on in different contexts. Not just to know it as factual information, but to actually pay attention to those things in the different contexts as they come up, and perhaps even to do what humans do and when they enter a new context that they're unfamiliar with, to struggle and try to learn what what's important in that context what to pay attention to.
So, what do we mean by wise AI? We believe that:
- At some level, being awake towards meaning is closely related to understanding one's moral situation responsibility.
- Remember I said emotions lead us to new values? Often it's moral situations, which bring up emotions in us and lead us to new values
- you get promoted at work and now you're a leader and gosh you don't know how to be a good leader and then eventually you struggle with that and you feel embarrassed and confused and then at some point, you find a value that guides you true.
Demo #3: Democratic Fine-tuning
Demo #4: Wise Chatbot
Wise AI Helps with Inner Alignment
Conclusion
- Biggest risk of AI is AI sociopaths
- Earlier I mentioned the sociopaths would speed up capitalist dynamics and ideological warfare. These problems are already with us, I think, to really pull off full stack alignment. We need to change markets and political systems so that there's not overconsumption ideological warfare tendencies. we just rely on markets and ideological and political setups. and the way to do that is to move from preference based markets versus a meeting, are not quite, because you can't really change your market, it's something that emerges between a set of actors.
- If AI can be aligned like non-sociopath humans, good sign because non sociopathic humans exist
- Also easier mathamtically
- In part one, I mentioned that the conception you have of human flourishing or human values affects how hard the alignment problem is. This is because the mathematical shape of the function you get, depending on the conception you pick is different, in particular goals are. Calls are convergent they can be pursued relentlessly They compete for resources and values are comparatively, sources of meaning as I've described them are comparatively highly contextual. They compete less for resources, etc. This makes it. This makes a different shape of nonmedical function, which I'll call a highly contextual short lived, value function.
- Or think about your own feelings—do they point to a way you want to live? Could they be written as values cards?