SapienceAI


A Positive Vision of a Future with Humanely Aligned AGI

An investigation, research program, academic society, and civilization-wide social movement



Adam Safron, PhD 


This  website is a placeholder for a nonprofit that I hope to found in the coming year. The content and list of advisors below are from a grant application (that was unfortunately unsuccessful) from Spring 2023. Please note that while the people listed below did agree to support a grant application to help fund a research consortium, they have not necessarily agreed to contribute to the formation or operations of any company, whether using a for-profit or nonprofit model. As such, their future involvement should be considered to be tentative (but with likely interest in supporting high-quality work, given funding).


Below we describe some progress, plans, and longer-term vision for an ongoing research program dedicated to understanding and reverse engineering human intelligence and values. It is intended to provide an overview of objectives for the sake of identifying opportunities for collaboration with others who care about these issues. While there are particular hopes for what might be possible with this endeavor, this document should not be taken to constitute a definitive statement of plans and stances. Rather, the hope is to begin a truly open and honest conversation (and collaboration) on the most fundamental values that define and give meaning to our humanity. 

 

Our values and goals 


We are attempting to realize a vision for ensuring a positive future with AI, where we are pursuing multiple synergistic objectives that build upon each other (which are also capable of being pursued separately to varying degrees, given support): 



We are launching a research program investigating the origins and nature of prosociality and intelligence in human and non-human agents; towards this end, we are constructing a simulation platform for developing biomimetic AIs of increasing degrees of sophistication in aligning themselves with human values (e.g. cooperativity and compassion). We are currently focused on deploying AI agents in microworlds to study socioemotional interactions and personality formation through iterated action selection and reinforcement learning; however, we would like to expand this into a self-sustaining research program (and institute) that considers the broader implications of advanced AIs for society, with our ultimate goal being the (collaborative) construction of human-aligned AGI. 

 

Beginning steps

(Please note: list of confirmed advisors was as of 2023, before this project was placed on hold)

 

We have assembled a large group of some of the world’s leading experts in the mind sciences and AI to advise this project, many with whom we have been in communication over several years, with some being close collaborators. Their areas of expertise include cognitive neuroscience, AI and machine learning, philosophy of mind (including ethics), cognitive linguistics, AI safety, mathematical physics, and more. These experts were selected because we believe the complexity of developing and safely deploying advanced AIs demands an interdisciplinary approach. Further, we have specifically selected advisors with a diversity of perspectives, which is appropriate both on account of these complexities (and associated uncertainties), as well as because a diverse range of people will be impacted by these technologies (and so ought to be represented in decisions relevant to their wellbeing). [Please note: we currently believe that even greater diversity is required for our advising team, and we are actively seeking additional collaborators.] 

 

Our current (and growing) list of confirmed advisors is as follows (ordered according to related areas of expertise, including with respect to potentially conflicting perspectives): 



The involvement of these last five advisors is particularly notable, as a rich understanding of different approaches to natural language processing (Chang) and safety-related constraints (Hay) will be pivotal for determining what capacities we can expect from which kinds of systems. Further, to the extent that provably beneficial AI is possible, we will need to collaborate with some of the best mathematical minds (Sakthivadivel) we can find. Finally, given sufficient resources, we intend on having our project be overseen by members of the AI safety community (Shovelain and Yampolskiy) who have spent decades studying and raising awareness of the potential risks from advanced intelligences. 

 

These scholars (the leading figures in their respective disciplines) are at present generously sharing their expertise with us without seeking compensation. We are currently reaching out to additional domain experts from a variety of fields (e.g. developmental psychology, moral philosophy and ethics, etc.), and perhaps most importantly, AI safety. We are specifically looking to work with people with concerns regarding existential risks (supposed “doomers”), and especially those with skepticism about the desirability and safety of exploring human-mimetic approaches to A(G)I.  We aim to create the broadest possible coalition with a diverse range of perspectives, including with respect to the feasibility and desirability of human-mimetic AI, the potential impacts of advanced AIs on society, and the potential near-, medium-, and far-term risks from continuing to develop massive models such as GPT-4. Our ultimate hope is to create a unifying vision of our lives with advanced human-aligned AIs, with people with conflicting beliefs coming into dialogue, rather than (potentially extremely dangerous) conflict. Towards this end, to the greatest extent possible, our group will utilize something we call “benevolent adversarial collaboration.” With this approach to collaborative sense-making, differing perspectives will be brought into (constructive) dialogue with one another in a way that may sometimes include testing opposing hypotheses, iff clearly oppositional claims are identified only after mutual understanding (albeit not necessarily agreement) has been established through the iterative summary and paraphrasing of core claims. 

 

In addition to conducting our research on socioemotional preference learning and alignment, with the help of these advisors, we intend on creating a self-sustaining research program over the next two years, with relevant reference classes including the Machine Intelligence Research Institute, Future of Life Institute, and Center for the Study of Existential Risks. However, we will specifically focus on reverse engineering human benevolence (and intelligence; with our social natures being part of the “secret of our success” as a species in allowing for robust and powerful learning from cumulative cultural evolution). These ideas are heavily inspired by the work of Michael Levin (on “multiscale competence” and “care as a driver of intelligence”), who will help guide the direction of the formation of a new institute focused on developing novel intelligent systems in a manner that is compatible with human values. 

 

Our aspirations and hopes 


Ultimately, it is our ambition to help create a research program that will evolve into a civilization-wide undertaking, with the shared goal of developing human-compatible AGI by reverse engineering the human ‘design’ as a hyper-social species. One could argue for the existence of similar endeavors that take inspiration from biology and psychology (e.g. Ben Goertzel’s OpenCog program; Josh Tenenbaum’s research group at MIT). However, this program is specifically dedicated to using the latest advances in mainstream machine learning to a) study the nature of prosociality, and b) actually create systems that realize and are governed by those principles. This is an ambitious undertaking, but we have already brought together a strong team for realizing our (we believe needful) objectives. 

 

Over the next two years, we will work towards achieving a “critical mass” (or “escape velocity”) of financial support for this work (and hopeful social movement), in that it would inspire confidence that we will succeed in making meaningful progress and also be likely to continue to operate over longer time scales, so incentivizing further involvement, so inspiring further confidence. Ultimately, building (or growing) trust with respect to value-grounded organizations will be of critical importance if we are to succeed in constructing a shared vision around AGI that we can all live into together. Absent such a common purpose, the potential consequences may be dire. Towards this end, over the coming year we will organize a major summit dedicated to discussing how we may realize our vision of a world with human-aligned, loving, and potentially conscious A(G)I. 

 

More concretely, given sufficient funding, we will work with our advisors to develop a multi-wing research program with multiple (synergistically) intersecting projects: 

 

We are currently funded to work on project #1, and have recently applied for funding that could allow us to begin working on all these topics (and more). However, we also believe there are strong synergies to be obtained from the additional projects we have been discussing with potential collaborators. Given the growing importance of the topic of AI safety, and the promise (and we believe, needfulness) of this mission, we are actively seeking additional support, with the goal of a sustained resource base of at least $100 million invested by the end of Year 2, and likely several billions within a few years if it is indeed the case that biological systems are a source of useful ‘design’ solutions for (cognitive-affective) engineering. If we are successful in creating the broad value coalition we envision, we will then attempt to do something which may not be attainable by any other group of which we know: that is, we will seriously (and sincerely) begin working towards the creation of a positive, unifying vision around human-inspired, humanely-aligned A(G)I. 

 

Please reach out if you would like to join us in this endeavor.

{Link to contact page.} 

 

 

Notes:

 

Value charter for a movement dedicated to helping to ensure a positive future with increasingly advanced AIs: 

 
 

Research divisions for a new company, SapienceAI dedicated to reverse engineering the essences of human intelligence and benevolence in the context of a “Mars shot” for developing conscious, loving A(G)I: 

 

References: 

[1910.13443] Multilevel evolutionary developmental optimization (MEDO): A theoretical framework for understanding preferences and selection dynamics (arxiv.org) 

Learned but Not Chosen: A Reward Competition Feedback Model for the Origins of Sexual Preferences and Orientations | Semantic Scholar 

PsyArXiv Preprints | Value Cores for Inner and Outer Alignment: Simulating Personality Formation via Iterated Policy Selection and Preference Learning with Self-World Modeling Active Inference Agents 

Biology, Buddhism, and AI: Care as the Driver of Intelligence - PMC (nih.gov)