Introducing Web Stable Diffusion: A Breakthrough in Web-based AI Model Deployment of 2023

Web Stable Diffusion is a groundbreaking project that brings stable diffusion models onto web browsers. The project is built entirely inside the browser and requires no server support, making it the world’s first stable diffusion model to run entirely on the browser. The demo webpage is now available for users to try out the system.

Key Points of Web ML Model Deployment for Web Stable Diffusion

With AI models making incredible progress in recent times, developers can easily compose open-source models to achieve amazing tasks. Stable diffusion enables the automatic creation of photorealistic images and images in various styles based on text input.

However, these models are usually big and compute-heavy, requiring all computation requests to pass through GPU servers when developing web applications based on these models. This project aims to bring diversity to the ecosystem by solving the computing problem, bringing more computation to the client-side, and enabling personalization and privacy protection.

Importance of Web Stable Diffusion

Web Stable Diffusion is designed to bring AI natively to your browser tab. The latest advancements in hardware and the browser ecosystem have made this possible. The client side is becoming increasingly powerful, and we can now use web-based AI models to reduce service provider costs, enhance personalization and privacy protection, and improve performance.

WebGPU, which enables native GPU execution on the browser, is getting matured and solves the compute problem, making it possible to bring AI models directly to your browser tab.

MLC Technology

The key technology used in Web Stable Diffusion is machine learning compilation (MLC). The solution is built on the open-source ecosystem, including PyTorch, Hugging Face diffusers and tokenizers, rust, wasm, and WebGPU. The main flow is built on Apache TVM Unity, which enables Python-first interactive MLC development experiences that allow for the easy composition of new optimizations, all in Python, and an incremental app development process.

web stable diffusion

Building the Web Stable Diffusion

Web Stable Diffusion is built using Python and heavy reliance on optimized computed libraries. TVM Unity is used to capture key model components into an IRModule in TVM. The IRModule’s function is further transformed and generated with runnable code that can be deployed universally on any environment supported by the minimum TVM runtime. TensorIR and MetaSchedule are used to build automated solutions to generate optimized programs. These transformations are tuned on a specific device through native GPU runtimes and then used to generate optimized GPU shaders. Emscripten and TypeScript build a TVM web runtime that can deploy generated modules.

Comparison with Native GPU Runtime, Limitations, and Opportunities

Web Stable Diffusion provides options for native deployment with local GPU runtime, which can be used both as a tool to deploy on a native environment as well as a reference point to compare native GPU driver performance and WebGPU.

WebGPU works by translating WGSL (WebGPU Shading Language) shaders to native shaders, and with Chrome’s WebGPU implementation, there is a performance degradation of about 3x. However, this gap can be fixed once WebGPU implementation becomes more mature and checks the index access range, dropping such clipping.

There are opportunities to bring several times of performance improvements to the current solutions, including supporting advanced optimizations such as FlashAttention and quantization to further improve the system’s performance.

Acknowledgements for Web Stable Diffusion team

Web Stable Diffusion is only possible thanks to the collaboration with CMU School of Computer Science Catalyst MLC OctoML. The project is made possible by the open-source ecosystems that we leverage, and we thank the Apache TVM community, PyTorch, Hugging Face communities, the tokenizer wasm port by Mithril Security, and the WebAssembly, Emscripten, Rust, and WebGPU communities.

Check out the article on Segment Anything which applies Web Stable Diffusion by Meta.

FAQ about Web Stable Diffusion

1. What is Web Stable Diffusion?

Web Stable Diffusion is a project by CMU School of Computer Science Catalyst MLC and OctoML that brings stable diffusion models to web browsers, enabling the deployment of machine learning models on the client-side.

2. How does Web Stable Diffusion work?

Web Stable Diffusion is built on the open-source ecosystem, including PyTorch, Hugging Face diffusers and tokenizers, rust, wasm, and WebGPU. The models are optimized and deployed through machine learning compilation (MLC) using Apache TVM Unity.

3. What are the benefits of Web Stable Diffusion?

Web Stable Diffusion enables the deployment of machine learning models on the client-side, reducing cost for the service provider and enhancing personalization and privacy protection.

4. How can I use Web Stable Diffusion?

Web Stable Diffusion can be used on a native environment as well as on the web, with the latter powered by WebGPU. The project provides a Jupyter notebook to walk developers through the stages of importing, optimizing, building, and deploying the models.

5. What is machine learning compilation (MLC)?

Machine learning compilation (MLC) is a technique used to optimize machine learning models for deployment. It involves compiling and optimizing the model so that it can be run efficiently on a specific device or environment.

6. What is Apache TVM Unity?

Apache TVM Unity is an ongoing development in the Apache TVM project that enables interactive machine learning compilation (MLC) development experiences, allowing developers to easily compose new optimizations and incrementally bring their apps to the web.

7. What is WebGPU?

WebGPU is a new web standard that enables native GPU executions on the browser, allowing machine learning models to be run on the client side.

Scroll to Top