OpenAI is embarking on a groundbreaking endeavour led by Ilya Sutskever, the company’s chief scientist and co-founder. This new initiative aims to explore methods to steer and control “superintelligent” AI systems. In a recent blog post authored by Sutskever and Jan Leike, a lead on OpenAI’s alignment team. They project that AI with intelligence surpassing that of humans could become a reality within the next decade. However, they also caution that this advanced AI might not necessarily have benevolent intentions, thereby necessitating research on how to control and restrict it.
The present dilemma resides in the lack of a resolution for manoeuvring or managing a potentially superintelligent artificial intelligence and inhibiting it from engaging in unfavourable behaviour. Prevalent methodologies for aligning AI, like reinforcement learning through human input, heavily depend on human oversight. Yet, as Sutskever and Leike assert, humans will struggle to effectively supervise AI systems that are significantly smarter than them.
To tackle this predicament and make progress in the field of “superintelligence alignment,” OpenAI is establishing a new team called Superalignment. Led by Sutskever and Leike. This team will be granted access to 20% of the company’s computing resources secured to date. It will encompass scientists and engineers who were formerly part of OpenAI’s alignment division. Along with researchers from other departments within the organization. Their foremost objective for the upcoming four years will revolve around tackling the fundamental technical obstacles associated with governing superintelligent AI.
The approach they plan to take involves building a “human-level automated alignment researcher.” The aim is to train AI systems using human feedback, enabling them to assist in evaluating other AI systems and ultimately conducting alignment research. The term “alignment research” refers to ensuring that AI systems achieve desired outcomes and do not veer off course.
OpenAI hypothesizes
OpenAI hypothesizes that AI can outpace humans in making progress in alignment research. Leike, Schulman, and Wu, in a previous blog post, proposed that as AI systems advance, they can assume more alignment work, conceiving, implementing, studying, and developing superior alignment techniques compared to what is currently available. They anticipate collaboration between AI systems and human researchers to ensure the alignment of AI with human values. Consequently, human researchers would shift their focus from generating alignment research to reviewing the work done by AI systems.
Naturally, no method is foolproof, and the limitations of OpenAI are acknowledged by Leike, Schulman, and Wu. The utilization of AI for evaluation purposes may amplify inconsistencies, biases, or vulnerabilities present in the AI itself. Furthermore, the most challenging aspects of the alignment problem might not be solely related to engineering.
Despite these challenges, Sutskever and Leike believe that pursuing this endeavour is worthwhile. They emphasize that superintelligence alignment is fundamentally a machine-learning problem. Enlisting the expertise of exceptional machine learning specialists. Even if they are not currently working on alignment, is crucial to finding a resolution. OpenAI intends to share the outcomes of their efforts extensively and considers contributing to the alignment and safety of non-OpenAI models as an integral part of their mission.”