Blog
MemVerge Brings Intelligent Scheduling and Resource Management to AMD Instinct GPUs
After rigorous testing, AMD's cloud infra team gave MemVerge’s solution high marks for its mature, intuitive user interface and enterprise-grade scheduling, resource sharing capabilities. "Our partner ecosystem is critical to the success of the AMD Instinct business,”...
In-Weight Learning vs. In-Context Learning: Lessons from Human Psychology for AI
The past few years have taught us new terms to describe how large language models (LLMs) learn and adapt. Two of the most important are in-weight learning and in-context learning. Understanding the difference between them not only clarifies where we are with AI today...
Why AI Needs Memory
In the 1980s, supercomputers captured the imagination. They could model nuclear reactions or simulate the weather—but only if paired with storage. Without disks to hold results, every run would vanish, forcing scientists to start over. Compute without storage was raw...
AI Memory, The Next Frontier
Over the last two years, the AI conversation has been dominated by models and compute. Bigger models, faster GPUs, cheaper inference. Necessary—but not sufficient. If we want AI that is genuinely useful at work and trustworthy in the enterprise, we need to confront...
How a Machine Learning Expert thinks about RAG vs Fine-tuning
by Since OpenAI introduced ChatGPT in December 2022, the world has been swept up in the wave of Generative AI. Enterprises are now actively exploring how to leverage AI to increase productivity, streamline operations, and gain a competitive edge in an increasingly...
What Does DeepSeek Mean for Enterprise AI?
Since OpenAI introduced ChatGPT in December 2022, the world has been swept up in the wave of Generative AI. Enterprises are now actively exploring how to leverage AI to increase productivity, streamline operations, and gain a competitive edge in an increasingly digital-first world. Software development, IT management, and customer support were among the first to feel the impact.
Accelerating Data Retrieval in Retrieval Augmentation Generation (RAG) Pipelines using CXL
RAG (retrieval augmented generation) has emerged as a powerful technique to customize LLMs for users and use cases beyond the model’s training set. However, there are multiple potential bottlenecks within a RAG pipeline.
Introducing Weighted Interleaving in Linux for Enhanced Memory Bandwidth Management
With the release of Linux Kernel 6.9, system administrators have gained a powerful new tool for managing memory distribution across NUMA nodes: Weighted Interleaving. This feature is especially beneficial in systems utilizing various types of memory, including traditional DRAM and Compute Express Link (CXL) attached memory. In this article, we’ll explore Weighted Interleaving, how it works, and how to use it.
Unleashing the Future of Memory Management: Exploring CXL Dynamic Capacity Devices with Docker and QEMU
In the ever-advancing realm of technology, developers and application owners always look for innovative tools and methodologies to boost performance and scalability. A revolutionary stride in this direction is the integration of Compute Express Link (CXL) technology, particularly through the utilization of Dynamic Capacity Devices (DCD). CXL, an open standard for high-speed CPU-to-device and CPU-to-memory interconnects, substantially enhances data center and cloud environments, offering many benefits.
Memory Wall, Big Memory, and the Era of AI
In the fast-evolving landscape of artificial intelligence (AI), where models are growing larger and more complex by the day, the demand for efficient processing of vast amounts of data has ushered in a new era of computing infrastructure.
