
Predibase
Founded Year
2021Stage
Series A - II | AliveTotal Raised
$28.45MLast Raised
$12.2M | 2 yrs agoMosaic Score The Mosaic Score is an algorithm that measures the overall financial health and market potential of private companies.
+11 points in the past 30 days
About Predibase
Predibase is a developer platform that provides services related to fine-tuning and serving large language models (LLMs) within the artificial intelligence (AI) and machine learning industry. It's services include AI model fine-tuning, language model serving, and infrastructure solutions for machine learning models. It was founded in 2021 and is based in San Francisco, California.
Loading...
Predibase's Product Videos


ESPs containing Predibase
The ESP matrix leverages data and analyst insight to identify and rank leading companies in a given technology landscape.
The small language models (SLMs) tools & development market focuses on creating, optimizing, and deploying compact language models that prioritize efficiency and specialized functionality. These models require fewer computational resources than large language models while offering advantages in speed, cost, and on-device deployment capabilities. Companies in this market provide tools for developin…
Predibase named as Leader among 15 other companies, including Microsoft, Hugging Face, and Mistral AI.
Predibase's Products & Differentiators
Predibase Inference Engine
The Predibase Inference Engine is managed serverless infrastructure designed specifically for serving fine-tuned small language models (SLMs). Our Inference Engine dramatically improves SLM deployments by making them 3-5x faster, easily scalable, and more cost-effective for enterprises grappling with the complexities of productionizing AI. Built on Predibase’s innovations–Turbo LoRA and LoRA eXchange (LoRAX)–the Predibase Inference Engine is designed from the ground up to offer a best-in-class experience for serving fine-tuned SLMs.
Loading...
Research containing Predibase
Get data-driven expert analysis from the CB Insights Intelligence Unit.
CB Insights Intelligence Analysts have mentioned Predibase in 3 CB Insights research briefs, most recently on Jul 15, 2024.

Jul 2, 2024 team_blog
How to buy AI: Assessing AI startups’ potential
May 24, 2024
The generative AI market mapExpert Collections containing Predibase
Expert Collections are analyst-curated lists that highlight the companies you need to know in the most important technology spaces.
Predibase is included in 4 Expert Collections, including Hydrogen Energy Tech.
Hydrogen Energy Tech
3,196 items
Companies that are engaged in the production, utilization, or storage and distribution of hydrogen energy. This includes, but is not limited to, companies that manufacture hydrogen, those that convert hydrogen into usable energy, and those that store and distribute hydrogen.
Generative AI
1,299 items
Companies working on generative AI applications and infrastructure.
AI 100 (2024)
100 items
Artificial Intelligence
7,221 items
Latest Predibase News
Mar 19, 2025
“Blessed are the GPU poor, for they shall inherit the AGI.” by Siddharth Jindal Not long ago, running a large language model (LLM) meant relying on massive graphics processing units (GPUs) and expensive hardware. Now, however, things are starting to change. A new wave of smaller, more efficient LLMs is emerging, which are capable of running on a single GPU without compromising on performance. These models are making high-end AI more accessible, reducing dependency on large-scale infrastructure, and reshaping how AI is deployed. As Bojan Tunguz, former NVIDIA senior software system engineer, had quipped , “Blessed are the GPU poor, for they shall inherit the AGI.” In the past week, a series of announcements in AI has been made. Mistral’s latest model, Small 3.1, Google’s Gemma 3, and Cohere’s Command A all claim to match the performance of proprietary models while requiring fewer compute resources. These models enable developers, small businesses, and even hobbyists with consumer-grade hardware (e.g., a single NVIDIA RTX card) to run advanced AI models locally. Moreover, running LLMs locally on a single GPU reduces reliance on cloud providers like AWS or Google Cloud, giving businesses more control over their data and privacy. This is critical for industries handling sensitive information and regions with limited internet access. What Makes Them Special? Mistral Small 3.1 features improved text performance, multimodal understanding, and an expanded context window of up to 128k tokens. The company said the model outperforms comparable models like Google’s latest release, Gemma 3 , and GPT-4o mini while delivering inference speeds of 150 tokens per second. However, one of the most notable features of the model is that it can run on a single RTX 4090 or a Mac with 32 GB RAM, making it a great fit for on-device use cases. The company said that the model can be fine-tuned to specialise in specific domains, creating accurate subject matter experts. This is particularly useful in fields like legal advice, medical diagnostics, and technical support. On the other hand, Google claims that Gemma 3 outperforms Llama 3-405B, DeepSeek-V3, and o3-mini in preliminary human preference evaluations on the LMArena leaderboard. Like Mistral 3.1, it can also be run on a single GPU or a tensor processing unit (TPU). “Compare that to Mistral Large or Llama 3 405B, needing up to 32 GPUs—Gemma 3 slashes costs and opens doors for creators,” said a user on X. Notably, a single NVIDIA RTX or H100 GPU is far more affordable than multi-GPU clusters, making AI viable for startups and individual developers. Gemma 3 27B achieves its efficiency by running on a single NVIDIA H100 GPU at reduced precision, specifically using 16-bit floating-point (FP16) operations, which are common for optimising performance in modern AI models. LLMs typically use 32-bit floating-point (FP32) representations for weights and activations, requiring huge memory and compute power. Quantisation reduces this precision to 16-bit (FP16), 8-bit (INT8), or even 4-bit (INT4), significantly reducing model size and accelerating inference on GPUs and edge devices. Regarding the architecture, Gemma 3 employs a shared or tied language model (LM) head for its word embeddings, as indicated by its linear layer configuration, where the LM head weights are tied to the input embeddings. Similarly, Cohere recently launched Command A, a model that delivers top performance with lower hardware costs than leading proprietary and open-weight models like GPT-4o and DeepSeek-V3. According to the company, it is well-suited for private deployments, excelling in business-critical agentic and multilingual tasks while running on just two GPUs, whereas other models often require up to 32. “With a serving footprint of just two A100s or H100s, it requires far less compute than other comparable models on the market. This is especially important for private deployments,” the company said in its blog post. It offers a 256k context length—twice that of most leading models—allowing it to process much longer enterprise documents. Other key features include Cohere’s advanced retrieval-augmented generation (RAG) with verifiable citations, agentic tool use, enterprise-grade security, and strong multilingual performance. Microsoft recently launched Phi-4-multimodal and Phi-4-mini, the latest additions to its Phi family of small language models (SLMs). These models are integrated into Microsoft’s ecosystem, including Windows applications and Copilot+ PCs. Earlier this year, NVIDIA launched a compact supercomputer called DIGITS for AI researchers, data scientists, and students worldwide. It can run LLMs with up to 200 billion parameters locally, and with two units linked together, models twice the size can be supported, according to NVIDIA. Moreover, open-source frameworks facilitate running LLMs on a single GPU. Predibase’s open-source project, LoRAX, allows users to serve thousands of fine-tuned models on a single GPU, cutting costs without compromising speed or performance. LoRAX supports a number of LLMs as the base model including Llama (including Code Llama ), Mistral (including Zephyr ), and Qwen . It features dynamic adapter loading, instantly merging multiple adapters per request to create powerful ensembles without blocking concurrent requests. Heterogeneous continuous batching packs requests using different adapters into the same batch, ensuring low latency and stable throughput. Adapter exchange scheduling optimises memory management by asynchronously preloading and offloading adapters between GPU and CPU memory. High-performance inference optimisations, including tensor parallelism, pre-compiled CUDA kernels, quantisation, and token streaming, further improve speed and efficiency. Running LLMs Without a GPU? A few days ago, AIM spoke to John Leimgruber, a software engineer from the United States with two years of experience in engineering, who managed to run the 671-billion-parameter DeepSeek-R1 model without GPUs. He achieved this by running a quantised version of the model on a fast NVM Express (NVMe) SSD. Leimgruber used a quantised, non-distilled version of the model, developed by Unsloth AI—a 2.51 bits-per-parameter model, which he said retained good quality despite being compressed to just 212 GB. However, the model is natively built on 8 bits, which makes it efficient by default. Leimgruber ran the model after disabling his NVIDIA RTX 3090 Ti GPU on his gaming rig, with 96 GB RAM and 24 GB VRAM. He explained that the “secret trick” is to load only the KV cache into RAM while allowing llama.cpp to handle the model files using its default behaviour—memory-mapping (mmap) them directly from a fast NVMe SSD. “The rest of your system RAM acts as disk cache for the active weights,” he said. With LLMs now running on a single GPU—or even without one—AI is becoming more practical for everyone. As hardware improves and new techniques emerge, AI will become even more accessible, affordable, and powerful in the years ahead. 📣 Want to advertise in AIM? Book here
Predibase Frequently Asked Questions (FAQ)
When was Predibase founded?
Predibase was founded in 2021.
Where is Predibase's headquarters?
Predibase's headquarters is located at 1190 Mission Street, San Francisco.
What is Predibase's latest funding round?
Predibase's latest funding round is Series A - II.
How much did Predibase raise?
Predibase raised a total of $28.45M.
Who are the investors of Predibase?
Investors of Predibase include Felicis, Zoubin Ghahramani, Anthony Goldbloom, Ben Hamner, Varun Badhwar and 7 more.
Who are Predibase's competitors?
Competitors of Predibase include Assisterr, Unsloth AI, Adaptive ML, deepset, Replicate and 7 more.
What products does Predibase offer?
Predibase's products include Predibase Inference Engine and 2 more.
Who are Predibase's customers?
Customers of Predibase include Checkr, Marsh McLennan and Convirza.
Loading...
Compare Predibase to Competitors

Clarifai operates an artificial intelligence (AI) company. The company develops technology for operational scale, offering a platform for natural language processing, automatic speech recognition, and computer vision. It helps enterprises and public sector organizations transform video, images, text, and audio data into structured data. It serves electronic commerce, manufacturing, media and entertainment, retail, and transportation industries. The company was founded in 2013 and is based in Willmington, Delaware.

Chalk is a data platform that focuses on machine learning in the technology industry. The company offers services such as real-time data computation, feature storage, monitoring, and predictive maintenance, all aimed at enhancing machine learning processes. Chalk primarily serves sectors such as the credit industry, fraud and risk management, and predictive maintenance. It was founded in 2022 and is based in San Francisco, California.

Tri Sense focuses on leveraging artificial intelligence to address challenges in the field of business process optimization and event prediction. The company offers services in data analytics, machine learning, process optimization, event forecasting, automation, and computer vision. These services are designed to help traffic safety and the efficiency of business operations. It is based in Skofja Loka, Slovenia.
Fireworks AI specializes in generative artificial intelligence (AI) platform services, focusing on inference and model fine-tuning within the artificial intelligence sector. The company offers an inference engine for building production-ready AI systems and provides a serverless deployment model for generative AI applications. It was founded in 2022 and is based in Redwood City, California.

Accern operates in the natural language processing (NLP) sector, providing a no-code platform for content classification to support various industry applications. The company offers tools and models that automate enterprise research and classify content. Accern serves sectors such as financial services, government, and other industries with NLP solutions. It was founded in 2014 and is based in New York, New York.

2021.AI focuses on accelerating artificial intelligence (AI) implementations in various industries. The company offers a full-service solution for AI opportunities, from data capture to model deployment, with a strong emphasis on governance, risk, & compliance (GRC) for AI and related data. It primarily serves sectors such as finance and banking, software and tech, manufacturing, transportation, the public sector, and life science. It was founded in 2016 and is based in Copenhagen, Denmark.
Loading...