SapienceAI

A Positive Vision of a Future with Humanely Aligned AGI

An investigation, research program, academic society, and civilization-wide social movement

Adam Safron, PhD

This website is a placeholder for a nonprofit that I hope to found in the coming year. The content and list of advisors below are from a grant application (that was unfortunately unsuccessful) from Spring 2023. Please note that while the people listed below did agree to support a grant application to help fund a research consortium, they have not necessarily agreed to contribute to the formation or operations of any company, whether using a for-profit or nonprofit model. As such, their future involvement should be considered to be tentative (but with likely interest in supporting high-quality work, given funding).

Below we describe some progress, plans, and longer-term vision for an ongoing research program dedicated to understanding and reverse engineering human intelligence and values. It is intended to provide an overview of objectives for the sake of identifying opportunities for collaboration with others who care about these issues. While there are particular hopes for what might be possible with this endeavor, this document should not be taken to constitute a definitive statement of plans and stances. Rather, the hope is to begin a truly open and honest conversation (and collaboration) on the most fundamental values that define and give meaning to our humanity.

Our values and goals

We are attempting to realize a vision for ensuring a positive future with AI, where we are pursuing multiple synergistic objectives that build upon each other (which are also capable of being pursued separately to varying degrees, given support):

A focused project on prosocial preference-learning and trust using AI agents (with potential for real-word deployment);

A multi-winged research program for developing increasingly advanced AIs, all focused on understanding and realizing human values in human and non-human intelligent systems;

A value-driven research institute and intellectual community dedicated to planning for and attempting to create increasingly human-like and human-compatible AIs;

A movement and focal/gathering point for a civilization-wide undertaking for creating and deploying human-mimetic A(G)I based on a broad consensus around our shared values.

We are launching a research program investigating the origins and nature of prosociality and intelligence in human and non-human agents; towards this end, we are constructing a simulation platform for developing biomimetic AIs of increasing degrees of sophistication in aligning themselves with human values (e.g. cooperativity and compassion). We are currently focused on deploying AI agents in microworlds to study socioemotional interactions and personality formation through iterated action selection and reinforcement learning; however, we would like to expand this into a self-sustaining research program (and institute) that considers the broader implications of advanced AIs for society, with our ultimate goal being the (collaborative) construction of human-aligned AGI.

Beginning steps

(Please note: list of confirmed advisors was as of 2023, before this project was placed on hold)

We have assembled a large group of some of the world’s leading experts in the mind sciences and AI to advise this project, many with whom we have been in communication over several years, with some being close collaborators. Their areas of expertise include cognitive neuroscience, AI and machine learning, philosophy of mind (including ethics), cognitive linguistics, AI safety, mathematical physics, and more. These experts were selected because we believe the complexity of developing and safely deploying advanced AIs demands an interdisciplinary approach. Further, we have specifically selected advisors with a diversity of perspectives, which is appropriate both on account of these complexities (and associated uncertainties), as well as because a diverse range of people will be impacted by these technologies (and so ought to be represented in decisions relevant to their wellbeing). [Please note: we currently believe that even greater diversity is required for our advising team, and we are actively seeking additional collaborators.]

Our current (and growing) list of confirmed advisors is as follows (ordered according to related areas of expertise, including with respect to potentially conflicting perspectives):

Michael Levin (generalized cognitive-affective neuroscience; distributed intelligence)
Karl Friston (computational neurobiology; the Free Energy Principle and Active Inference framework)
Jürgen Schmidhuber (founding pioneer of modern machine learning; world models and cognitive architectures for AGI)
Konrad Körding (computational and theoretical neuroscience; outreach through education)
Caren Walker (developmental psychology; imagination, counterfactual processing, mental inferences)

Laurie Ann Paul (philosophy of mind; transformative experiences and the limits of self-prediction)
Jakob Hohwy (philosophy of mind and wellbeing; consciousness and contemplative studies)
Anna Ciaunica (philosophy of mind; developmental embodied cognitive science)
Anil Seth (consciousness and predictive processing; computational selfhood and neurophenomenology)
Giulio Tononi (Integrated Information Theory of consciousness)

Daniel Polani (AI and complex systems; intrinsic drives for empowerment)
Sebastian Risi (self-organizing cognitive systems; open-ended environments and lifelong learning)
David Ha (AI and ML with world modeling architectures; collective intelligence for deep learning)
Irina Rish (scaling laws; physics-informed machine learning)
Karim Jerbi (cognitive and computational neuroscience; neuroAI)

Dileep George (neuroAI; theoretical neuroscience and ML)
Andy Clark (philosophy of mind and predictive processing; extended cognition and enactive intelligence)
Inês Hipólito (philosophy of mind; enactive cognitive science)
Alex Kiefer (machine-learning-informed philosophy of mind and specialist in consciousness science; machine learning special ops and expert in the philosophy of machine learning)

Richard Watson (emergent adaptive intelligence; generalized evolution and learning)
Guillaume Dumas (computational socioaffective neuroscience; consciousness studies)
Jun Tani (developmental and social robotics; machine autonomy)
Tim Verbelen (robotics and AI; machine navigation)

Blake Richards (neuroscience-inspired AI; AI-inspired neuroscience)
James Mac Shine (neuroscience; dynamical systems theory)
Arthur Juliani (consciousness and machine learning; contemplative neuroscience)
Praveen Paritosh (human mimetic AI and high-level cognition; best practices for the (safe and transparent) integrated development of cognitive systems within both academic and industrial contexts)
Colin DeYoung (personality neuroscience; cybernetics)

Rutger Goekoop (computational psychiatry; complex adaptive systems and emergent morality)
Georg Northoff (alignment of modeling dynamics; temporo-spatial theory of consciousness)
Adeel Razi (computational neuroscience; consciousness studies)
Matthew Johnson (clinical and basic science of psychedelics; behavior analysis)
David Yaden (science of psychedelics and spirituality, and wellbeing; applied positive, interpersonal, and transpersonal psychology)

Michael Pollan (consciousness and psychedelic science; public outreach and community/movement building)
Shinzen Young (expert in personal transformation; cognitive science of meditation)
Shamil Chandaria (contemplative science and philosophy; neuroscience and deep learning)
Paul Badcock (evolutionary psychology; active inference models of psychopathology)
Sue Carter (evolutionarily-conserved bonding mechanisms; properties of social neuropeptides)

Mark Miller (contemplative neuroscience; applied positive psychology)
Julian Kiverstein (philosophy of mind; enactivism and cognitive science of wellbeing)
Cameron Buckner (cognitive science; philosophy of deep learning)
Nancy Chang (grounded cognition and natural language understanding; embodied construction grammar)
Nick Hay (meta-reasoning; AGI and AI safety)

Dalton Sakthivadivel (advanced applied mathematics; formal foundations of the Free Energy Principle)
Justin Shovelain (global catastrophic and existential risks from AI; applied cognitive science and intelligence amplification)
Roman Yampolskiy (AI safety; cybersecurity)

The involvement of these last five advisors is particularly notable, as a rich understanding of different approaches to natural language processing (Chang) and safety-related constraints (Hay) will be pivotal for determining what capacities we can expect from which kinds of systems. Further, to the extent that provably beneficial AI is possible, we will need to collaborate with some of the best mathematical minds (Sakthivadivel) we can find. Finally, given sufficient resources, we intend on having our project be overseen by members of the AI safety community (Shovelain and Yampolskiy) who have spent decades studying and raising awareness of the potential risks from advanced intelligences.

These scholars (the leading figures in their respective disciplines) are at present generously sharing their expertise with us without seeking compensation. We are currently reaching out to additional domain experts from a variety of fields (e.g. developmental psychology, moral philosophy and ethics, etc.), and perhaps most importantly, AI safety. We are specifically looking to work with people with concerns regarding existential risks (supposed “doomers”), and especially those with skepticism about the desirability and safety of exploring human-mimetic approaches to A(G)I. We aim to create the broadest possible coalition with a diverse range of perspectives, including with respect to the feasibility and desirability of human-mimetic AI, the potential impacts of advanced AIs on society, and the potential near-, medium-, and far-term risks from continuing to develop massive models such as GPT-4. Our ultimate hope is to create a unifying vision of our lives with advanced human-aligned AIs, with people with conflicting beliefs coming into dialogue, rather than (potentially extremely dangerous) conflict. Towards this end, to the greatest extent possible, our group will utilize something we call “benevolent adversarial collaboration.” With this approach to collaborative sense-making, differing perspectives will be brought into (constructive) dialogue with one another in a way that may sometimes include testing opposing hypotheses, iff clearly oppositional claims are identified only after mutual understanding (albeit not necessarily agreement) has been established through the iterative summary and paraphrasing of core claims.

In addition to conducting our research on socioemotional preference learning and alignment, with the help of these advisors, we intend on creating a self-sustaining research program over the next two years, with relevant reference classes including the Machine Intelligence Research Institute, Future of Life Institute, and Center for the Study of Existential Risks. However, we will specifically focus on reverse engineering human benevolence (and intelligence; with our social natures being part of the “secret of our success” as a species in allowing for robust and powerful learning from cumulative cultural evolution). These ideas are heavily inspired by the work of Michael Levin (on “multiscale competence” and “care as a driver of intelligence”), who will help guide the direction of the formation of a new institute focused on developing novel intelligent systems in a manner that is compatible with human values.

Our aspirations and hopes

Ultimately, it is our ambition to help create a research program that will evolve into a civilization-wide undertaking, with the shared goal of developing human-compatible AGI by reverse engineering the human ‘design’ as a hyper-social species. One could argue for the existence of similar endeavors that take inspiration from biology and psychology (e.g. Ben Goertzel’s OpenCog program; Josh Tenenbaum’s research group at MIT). However, this program is specifically dedicated to using the latest advances in mainstream machine learning to a) study the nature of prosociality, and b) actually create systems that realize and are governed by those principles. This is an ambitious undertaking, but we have already brought together a strong team for realizing our (we believe needful) objectives.

Over the next two years, we will work towards achieving a “critical mass” (or “escape velocity”) of financial support for this work (and hopeful social movement), in that it would inspire confidence that we will succeed in making meaningful progress and also be likely to continue to operate over longer time scales, so incentivizing further involvement, so inspiring further confidence. Ultimately, building (or growing) trust with respect to value-grounded organizations will be of critical importance if we are to succeed in constructing a shared vision around AGI that we can all live into together. Absent such a common purpose, the potential consequences may be dire. Towards this end, over the coming year we will organize a major summit dedicated to discussing how we may realize our vision of a world with human-aligned, loving, and potentially conscious A(G)I.

More concretely, given sufficient funding, we will work with our advisors to develop a multi-wing research program with multiple (synergistically) intersecting projects:

Furthering our currently funded and planned work to study how stable and enduring prosocial/antisocial and/or cooperative/competitive preferences can form through socioemotional niche (co-)construction among embodied agents embedded in open-ended microworlds. We will run multiple simulations to try to characterize the robustness of these preferences across a range of experiences of various degrees of challenge, and will also explore how psychedelic mechanisms can be used to inspire novel machine learning solutions (e.g. flexible parameterization of the extent and temperature of imagined scenarios).

Demonstrating how quasi-Kantian ethics can be made to emerge naturally as consequence of temporally deep policy selection and generative modeling in ways that require coordinating with others. This will likely involve collaboration between academic psychiatrists and basal cognition experts to investigate the generality of these principles, as well as with experts in evolutionary (and network) game theory and life history strategies.

Active inference generative modeling of loving kindness and compassion meditations for both human and non-human agents, informed by collaboration with advanced meditation instructors with expertise in characterizing and navigating a wide range of subjective experiences. This will likely involve extensions of ongoing work on “System 2” AI and artificial consciousness, with a particular emphasis on modeling different forms of meta-awareness and social cognition.

Scientific and philosophical investigations into the possibility and implications of machine consciousness (of both “System 1” and “System 2” varieties), and even machine phenomenality and “free will.” This work will be advised by some of the top researchers and theorists in the world, which is essential, as the potential for different forms of consciousness in artificial systems may be the most important factor for determining our future with A(G)I.

Philosophical and scientific investigations of the ethics of AI with respect to building agentic systems, and perhaps eventually conscious machines. In addition to being advised by philosophers with extremely relevant thinking (e.g. Laurie Ann Paul’s work with the challenge of predicting the consequences of transformative experiences placing potentially fundamental limits on the verifiable safety of different kinds of agentic systems).

Carefully interfacing large language models with (embodied) agents in simulation environments to determine the potential and limits of this approach to augmenting capacities (e.g. intelligent prompting and "multimodality" sufficient for robust extrapolation). While we are attempting to create systems that are human-like in ways that provide us with greater capacities for inferring their likely functional properties, we also believe it will be important to study the present (and future) highest performing technologies for the sake of both strategic planning and scientific insight.

We are currently funded to work on project #1, and have recently applied for funding that could allow us to begin working on all these topics (and more). However, we also believe there are strong synergies to be obtained from the additional projects we have been discussing with potential collaborators. Given the growing importance of the topic of AI safety, and the promise (and we believe, needfulness) of this mission, we are actively seeking additional support, with the goal of a sustained resource base of at least $100 million invested by the end of Year 2, and likely several billions within a few years if it is indeed the case that biological systems are a source of useful ‘design’ solutions for (cognitive-affective) engineering. If we are successful in creating the broad value coalition we envision, we will then attempt to do something which may not be attainable by any other group of which we know: that is, we will seriously (and sincerely) begin working towards the creation of a positive, unifying vision around human-inspired, humanely-aligned A(G)I.

Please reach out if you would like to join us in this endeavor.

{Link to contact page.}

Notes:

Value charter for a movement dedicated to helping to ensure a positive future with increasingly advanced AIs:

We hold commitment to compassion and truth as the highest virtues.

We recognize that alignment begins with us, and we are committed to doing what we can to help ourselves and each other to live into these values.

We maintain a commitment to high-integrity (wise and compassionate) communication, which includes an emphasis on radical honesty (with oneself and others) and mutual understanding as prerequisites for productive dialogue.

Research divisions for a new company, SapienceAI dedicated to reverse engineering the essences of human intelligence and benevolence in the context of a “Mars shot” for developing conscious, loving A(G)I:

Computational models of consciousness: attempting to understand consciousness in natural systems and reverse engineer it in artificial intelligences.

Embodiment and symbol grounding via sufficiently rich simulation environments

Mechanisms of sociality: social neuropeptides, emotional contagion, theories of mind, etc.

Computational psychiatry and precision medicine

Psychedelic- and meditation-inspired machine learning (and vice versa)

Neuromorphics: investigating alternative computing platforms for more efficient implementation and scaling

Education: AI-assisted tutoring and other forms of educational augmentation

Spirituality: interactive AIs for the world's major spiritual traditions

Robotics: specifically targeting medical and elder care
Complexity (and evolutionary) and behavioral economics
Sociology and anthropology: applied evolutionary-developmental psychology

Governance: advanced AIs as governmental platform and potential means of preventing other existential and global catastrophic risks

References:

[1910.13443] Multilevel evolutionary developmental optimization (MEDO): A theoretical framework for understanding preferences and selection dynamics (arxiv.org)

Learned but Not Chosen: A Reward Competition Feedback Model for the Origins of Sexual Preferences and Orientations | Semantic Scholar

PsyArXiv Preprints | Value Cores for Inner and Outer Alignment: Simulating Personality Formation via Iterated Policy Selection and Preference Learning with Self-World Modeling Active Inference Agents

Biology, Buddhism, and AI: Care as the Driver of Intelligence - PMC (nih.gov)

Page updated

Google Sites

Report abuse