The widespread adoption of Machine Learning (ML) is currently at an exciting inflection point, with the ready availability of scalable computing capacity, the rapid advancement of ML technologies, and a massive proliferation of data-transforming businesses across industries. Generative AI applications like ChatGPT have recently captured widespread attention and imagination, and Amazon Web Services (AWS) believe that most customer experiences and applications will be reinvented with this system.
AWS has been focused on AI and ML for over 20 years, with many of the capabilities customers use driven by ML. The cloud computing platform has the broadest and deepest portfolio of AI and ML services, making it accessible to anyone who wants to use it. Enterprise customers like Intuit, Thomson Reuters, AstraZeneca, Ferrari, Bundesliga, 3M, and BMW, as well as thousands of startups and government agencies around the world, are transforming themselves, their industries, and their missions with ML. The same democratizing approach is taken to Generative AI, and several new innovations have been announced to make it easy and practical for customers to use in their businesses.
Generative AI refers to a type of Artificial Intelligence that has the ability to generate new content and ideas such as conversations, stories, images, videos, and music. These AI models are powered by ML models that are pre-trained on vast amounts of data, commonly referred to as Foundation Models (FMs). Recent advancements in ML, particularly transformer-based neural network architecture, have led to the development of models containing billions of parameters. The largest pre-trained model in 2019 contained 330M parameters, while the largest models now contain more than 500B parameters, which is a 1,600x increase in size in just a few years. FMs such as GPT3.5, BLOOM, and Stable Diffusion can perform various tasks that span multiple domains, including writing blog posts, generating images, solving math problems, engaging in dialog, and answering questions based on a document. The size and general-purpose nature of FMs make them different from traditional ML models, which typically perform specific tasks. FMs are capable of learning complex concepts and applying their knowledge in various contexts and can be customized to perform domain-specific functions using only a small fraction of the data and compute required to train a model from scratch. The potential of FMs is exciting, and its diversity will set off a wave of innovation and application experiences that have never been seen before. AWS customers are interested in quickly taking advantage of FMs and generative AI within their businesses and organizations to drive new levels of productivity and transform their offerings.
“At Amazon Web Services (AWS), we have played a key role in democratizing ML and making it accessible to anyone who wants to use it, including more than 100,000 customers of all sizes and industries. We’ve invested and innovated to offer the most performant, scalable infrastructure for cost-effective ML training and inference; developed Amazon SageMaker, which is the easiest way for all developers to build, train, and deploy models; and launched a wide range of services that allow customers to add AI capabilities like image recognition, forecasting, and intelligent search to applications with a simple API call. We take the same democratizing approach to generative AI: we work to take these technologies out of the realm of research and experiments and extend their availability far beyond a handful of startups and large, well-funded tech companies. That’s why today I’m excited to announce several new innovations that will make it easy and practical for our customers to use generative AI in their businesses,” according to Swami Sivasubramanian, VP of Databases, Analytics, and Machine Learning at Amazon Web Services (AWS).
Amazon Bedrock & Amazon Titan Models Announced
Amazon Bedrock is a new service that makes it easy for customers to build and scale generative AI-based applications using high-performing FMs from AI21 Labs, Anthropic, Stability AI, and Amazon. The service provides access to some of the most cutting-edge FMs available today, including the Jurassic-2 family of LLMs from AI21 Labs, which are multilingual, Claude from Anthropic, and Stability AI’s suite of text-to-image foundation models. Bedrock also offers customers the ability to customize their models easily, with as few as 20 labeled examples being enough to fine-tune the model for a particular task. The service is in limited preview, and customers are excited about its potential for their development teams. This new service aims to democratize FMs, and Amazon’s partners are building practices to help enterprises go faster with generative AI.
Amazon also announced two new LLMs — a generative LLM and an embeddings LLM — called Titan FMs, which are designed to detect and remove harmful content in the data that customers provide for customization, reject inappropriate content in the user input, and filter the models’ outputs that contain inappropriate content.
News Instances for Amazon EC2 Trn1n & Amazon EC2 Inf2: the most cost-effective cloud infrastructure for Generative AI
To effectively run, build, and customize FMs, customers require performant and cost-effective infrastructure that is purpose-built for ML. AWS has invested in its own silicon over the last five years, prioritizing performance and price performance for demanding workloads such as ML training and inference. AWS Trainium and AWS Inferentia chips offer the lowest cost for training models and running inference in the cloud. Leading AI startups like AI21 Labs, Anthropic, Cohere, Grammarly, Hugging Face, Runway, and Stability AI run on AWS due to the ability to maximize performance and control costs by choosing the optimal ML infrastructure.
Trn1 instances, powered by Trainium, offer up to 50% savings on training costs over any other EC2 instance and are optimized to distribute training across multiple servers connected with 800 Gbps of second-generation Elastic Fabric Adapter (EFA) networking. Customers can deploy Trn1 instances in UltraClusters that can scale up to 30,000 Trainium chips with petabit scale networking. The general availability of new, network-optimized Trn1n instances that offer 1600 Gbps of network bandwidth and 20% higher performance for large, network-intensive models has been announced.
While training a model is typically periodic, a production application can generate predictions constantly, requiring low-latency and high-throughput networking. Amazon’s Alexa, for example, accounts for 40% of all computing costs due to the millions of requests coming in every minute. Recognizing that most of the future ML costs would come from running inferences, AWS prioritized inference-optimized silicon when investing in new chips. In 2018, Inferentia, the first purpose-built chip for inference, was announced. Inferentia helps Amazon run trillions of inferences annually, saving hundreds of millions of dollars.
The general availability of Inf2 instances powered by AWS Inferentia2, optimized for large-scale generative AI applications with models containing hundreds of billions of parameters, has been announced. Inf2 instances deliver up to 4x higher throughput and 10x lower latency compared to the prior generation Inferentia-based instances, driving up to 40% better inference price performance than other comparable Amazon EC2 instances and the lowest cost for inference in the cloud. Customers such as Runway have seen up to 2x higher throughput with Inf2 than comparable Amazon EC2 instances for some of their models, enabling them to introduce more features, deploy more complex models, and ultimately deliver a better experience for their users.
CodeWhisperer: Free for Individual Developers
The use of generative AI in coding has the potential to revolutionize developer productivity by taking over the heavy lifting of writing undifferentiated code. Amazon CodeWhisperer is an AI coding companion that can generate code suggestions in real-time based on developers’ comments and prior code in their Integrated Development Environment (IDE). Amazon has announced the general availability of CodeWhisperer for Python, Java, JavaScript, TypeScript, and C#, as well as 10 new languages, and the tool is accessible from IDEs such as VS Code, IntelliJ IDEA, and AWS Cloud9. CodeWhisperer is also the only AI coding companion with built-in security scanning for finding and suggesting remediations for hard-to-detect vulnerabilities. It is free for all individual users, with no qualifications or time limits, and for business users, there is a CodeWhisperer Professional Tier that includes administration features like single sign-on with AWS Identity and Access Management integration. Amazon’s mission is to make it possible for developers of all skill levels and organizations of all sizes to innovate using generative AI, and they are excited about the possibilities it will bring.