Forget the AI Race. Let’s Invest in a Data Grid for AI

2023. 8. 13. 09:11Artificial Intelligence

 

Articles

Akash “Aki” Jain is President of Palantir USG, Inc. where he focuses on Artificial Intelligence, USG Technical Engagement, Enterprise Data Management and Cloud Architecture.


Private Sector Perspective — Neil Armstrong’s small step for man in 1969 was a symbolic resolution to the Cold War’s most visible global security power struggle: The Space Race. As victor, the U.S. proved its technological superiority, leading the Soviets to largely concede the space domain to the U.S. More broadly, the U.S. ability to come from behind demonstrated the underlying strength of its economic, technological, and scientific systems.

Today, we are in another pursuit for technological superiority what has been dubbed the artificial intelligence (AI) race. However, unlike the lunar landing, the so-called “AI Race” has no clearly defined finish line. We know we have an immediate competitor (China), but how will we know if – and when – we have won? The ambiguity around this question is why I believe we need to forget the notion of a singular AI Race and instead focus our efforts on building a data infrastructure to tackle any AI challenge.

For the U.S. to position itself in a place of technological strength for decades to come, we must create systems that allow for steady, continuous, and trustworthy progress on AI. Andrew Ng’s analogy of AI as electricity helps us see that we’re in the infancy of AI’s potential. While AI is substantially more complex than electricity, it is similar in that it is a powerful enabling technology, not an end in itself. What we are missing is a way to make enduring, scalable, and reliable use of that technology.

 

When introduced in the 19th century, incandescent bulbs were revolutionary. At first, only those consumers who had their own electricity generator could make use of them, limiting their seemingly endless potential. Thomas Edison realized that he could sell many more lightbulbs if there was an easy way for anyone to receive electricity. Without the electrical grid, we would never have seen the widespread adoption of the incandescent bulb and the subsequent surge of innovation to create many other electric devices. We’ve been caught up designing individual AI/ML models (“lightbulbs”), but we don’t have a unified infrastructure to serve as our modern-day electrical grid equivalent.

One particularly compelling reason to invest in such an infrastructure is that AI models aren’t something that can be sprinkled around to make any project better. Most AI models deployed today are quite brittle: they have been developed and trained for a specific use case under specific conditions. Once deployed, model performance and quality may degrade quickly as the data environment evolves. Further, throwing a model at an adjacent problem without re-training it usually does not work.

 

To give an illustrative example, an AI model for identifying pathologies in X-Ray films was unable to be repurposed at another hospital due to a difference in the radiology films used by different machines — and that was for a nearly identical use case. To harness the value that is currently available from AI, ensuring models are continually provided with appropriate training data and feedback to improve is critical. We must create a holistic AI and data environment – a “grid” – that works around AI model brittleness by making it easy to re-train and evaluate models, and share training data (within appropriate security, data protection, and usage boundaries).

The U.S. Government will struggle to retain the lead in AI because this infrastructure does not exist. Academics, government researchers, and private companies are off in silos building incredible AI/ML capabilities — but like a solitary lightbulb, they are illuminating but a single room in a single house at a time. For the U.S. to maintain technological superiority, we must build the data infrastructure that will allow entire skyscrapers to be illuminated. We must also enable new innovations not just “lightbulbs,” but “toasters,” “televisions,” and beyond. We must have a means for scaling existing capabilities in the real world and encouraging the development of new ones. And we must do so in a way that stays true to our democratic values. Let’s invest in a data grid for AI.

 

Just as governments play a role in enforcing standards related to electrical current flow, the U.S. Government has a role to play in establishing our own “grid.” Investing in a data grid for AI is a strategic move that will not only result in an immediate spike in innovation in the short term, but will allow for sustained, step-by-step advancement in the long term. There are several key components we’ll need to get right:

 

1. Design for iteration, not stagnation: AI systems are _learning_ systems that require constant iteration and feedback. We must build our infrastructure so that it can evolve. Electrical grids today are flexible to support a variety of energy sources, from solar power to coal fired generators, so too must our AI infrastructure empower AI firms and Government programs to adapt to various systems. And, just as grids can surge resources in response to demands, we should build in connectivity, which in turn will allow us to discover and build towards emerging demand and refine and further develop new capabilities.


2. Create an AI deployment infrastructure: Government consumers should have an easy access point to discover, evaluate, and deploy potential AI/ML solutions and training data, monitor algorithm performance, and capture and save any feedback.


3. Adopt open data standards: Just as we have standards for voltage, we need standards for the format, quality, and curation of data, systems, and APIs.


4. Fund an AI training data library: AI/ML models depend upon quality data to train and test. Large, diverse datasets help mitigate algorithmic biases, and our Government is best positioned to conduct quality assurance on this data and enable appropriate access to it. By building out this training data library thoughtfully, instead of via ad hoc, disconnected efforts, our Government can both spur AI development and ensure that training data sets are curated ethically and transparently.


5. Keep our grid secure: We must protect our electricity grid from hackers – similarly, we must ensure that our AI training data, algorithms, and deployment infrastructure are secure.

 

Following these guidelines in conjunction with existing calls to adhere to strong AI ethics principles and standards, we can invest in an AI infrastructure that enables not just the occasional “incandescent bulb,” but empowers an entire generation with access to enabling technologies that will buoy innovation as more potential use cases are discovered. By investing in a grid, we can unlock the enormous potential of AI development and ensure our technological, economic, democratic, and military superiority for decades to come.

 

Thoughts


As I witness the AI boom based on generative AI such as OpenAI's GPT3.5 and GPT4 emerging everywhere, I had a vague sense of anxiety. However, the phrase "AI race with no finish line" somewhat alleviated this ambiguity. Many companies are trying to do something "remarkable" using GPT. However, in reality, rather than producing anything significantly impactful, it often ends up merely creating a chatbot with good performance. Moreover, there's a downside that these customized chatbots are difficult to repurpose for other uses.

I believe Palantir's CTO was addressing these issues. He suggested that for AI to be sustainable and a problem-solving tool, certain elements need to be considered. Among these, the most important seem to be, "Can the model be easily retrained and evaluated?" and "Are there standards prepared for scalable AI?"

At a time when AI for AI is being discussed, it's evident that to make AI sustainable and a truly problem solving tool, rather than jumping into model competition, we need to build systems. And it seems that Palantir's CTO expressed this system as an AI Data Grid.

Most AI models deployed today are quite brittle: they have been developed and trained for a specific use case under specific conditions. Once deployed, model performance and quality may degrade quickly as the data environment evolves. Further, throwing a model at an adjacent problem without re-training it usually does not work.


Just like the potential of electricity exploded when standards for electricity and voltage were established, and a system was built to easily pull and use electricity elsewhere, resulting in "problem-solving products" like toasters, hair dryers, and washing machines, I think the same could be true for AI.

Considering Elon Musk's built ecosystem and his relationship with Peter Thiel as a member of the PayPal Mafia, using Palantir's data grid could easily migrate AI learned from Tesla's cars to Optimus's Vision, use it to create a product for SpaceX, and create an environment where X.AI could easily customize TruthGPT, a language model (LLM) learned from Twitter data, to solve problems for Tesla, SpaceX, or other companies. In fact, Tesla uses Palantir's Foundry (it's not explicitly revealed, but a Tesla Domain exists on Foundry), which means that the scenario mentioned earlier is not entirely nonsensical.

The analogy of GPT4 and other generative AI models to a "light bulb" is impressive. When the potential of a light bulb lies not in "emitting light" but in "creating a use case that can use light," I believe we can fully utilize the potential of this AI when a data infrastructure that illuminates not just a solitary bulb but an entire high-rise building is established. We need not bulbs, but toasters or televisions that can solve problems.

반응형