If you’re an AI engineer trying to understand and build with GenAI, RAG (Retrieval-Augmented Generation) is one of the most essential components to master. It’s the backbone of any LLM system that needs fresh, accurate, and context-aware outputs. Let’s break down how RAG works, step by step, from an engineering lens, not a hype one: 🧠 How RAG Works (Under the Hood) 1. Embed your knowledge base → Start with unstructured sources - docs, PDFs, internal wikis, etc. → Convert them into semantic vector representations using embedding models (e.g., OpenAI, Cohere, or HuggingFace models) → Output: N-dimensional vectors that preserve meaning across contexts 2. Store in a vector database → Use a vector store like Pinecone, Weaviate, or FAISS → Index embeddings to enable fast similarity search (cosine, dot-product, etc.) 3. Query comes in - embed that too → The user prompt is embedded using the same embedding model → Perform a top-k nearest neighbor search to fetch the most relevant document chunks 4. Context injection → Combine retrieved chunks with the user query → Format this into a structured prompt for the generation model (e.g., Mistral, Claude, Llama) 5. Generate the final output → LLM uses both the query and retrieved context to generate a grounded, context-rich response → Minimizes hallucinations and improves factuality at inference time 📚 What changes with RAG? Without RAG: 🧠 “I don’t have data on that.” With RAG: 🤖 “Based on [retrieved source], here’s what’s currently known…” Same model, drastically improved quality. 🔍 Why this matters You need RAG when: → Your data changes daily (support tickets, news, policies) → You can’t afford hallucinations (legal, finance, compliance) → You want your LLMs to access your private knowledge base without retraining It’s the most flexible, production-grade approach to bridge static models with dynamic information. 🛠️ Arvind and I are kicking off a hands-on workshop on RAG This first session is designed for beginner to intermediate practitioners who want to move beyond theory and actually build. Here’s what you’ll learn: → How RAG enhances LLMs with real-time, contextual data → Core concepts: vector DBs, indexing, reranking, fusion → Build a working RAG pipeline using LangChain + Pinecone → Explore no-code/low-code setups and real-world use cases If you're serious about building with LLMs, this is where you start. 📅 Save your seat and join us live: https://lnkd.in/gS_B7_7d
Improving Predictive Accuracy
Explore top LinkedIn content from expert professionals.
-
-
Training LLMs for spam classification: I added 14 experiments comparing different approaches: https://lnkd.in/gTNVvGcj - which token to train - which layers to train - different model sizes - LoRA - unmasking - and more! Any additional experiments you'd like to see? And here are the take aways for the table shown in the picture: 1. Training the Last vs. First Output Token (Row 1 vs. 2): Training the last output token results in substantially better performance compared to the first. This improvement is expected due to the causal self-attention mask. 2. Training the Last Transformer Block vs. Last Layer (Row 1 vs. 3): Training the entire last transformer block is also results in substantially better results than training only the last layer. 3. Training All Layers vs. Last Transformer Block (Row 1 vs. 4): Training all layers shows a modest improvement of ~2% over just training the last transformer block, but it requires almost three times longer in terms of training duration. 4. Using Larger Pretrained Models (Row 1 vs 5, and Row 1 vs. 6 and 7): Employing a 3x larger pretrained model leads to worse results. However, using a 5x larger model improves performance compared to the initial model, as was anticipated. Similarly, the 12x larger model improves the predictive performance even further. (The medium model was perhaps not well pretrained or the particular finetuning configuration works not as well for this model.) 5. Using a Model with Random Weights vs. Pretrained Weights (Row 1 vs. 8): Utilizing a model with random weights yields results that are only slightly worse by 1.3% compared to using pretrained weights. 6. Using LoRA (Low-Rank Adaptation) vs Training All Layers (Row 9 vs. 4): Keeping the model frozen and adding trainable LoRA layers (see Appendix E for details) is a viable alternative to training all model parameters and even improves the performance by 1% point. As it can be seen by the 1% lower gap between the training and validation accuracy when using LoRA, this is likely due to less overfitting. 7. Padding Input to Full Context Length vs. Longest Training Example (Row 1 vs. 10): Padding the input to the full supported context length results is significantly worse. 8. Padding vs no padding (Row 1 vs. 11 and 12): The `--no_padding` option disables the padding in the dataset, which requires training the model with a batch size of 1 since the inputs have variable lengths. This results in a better test accuracy but takes longer to train. In row 12, we additionally enable gradient accumulation with 8 steps to achieve the same batch size as in the other experiments. 9. Disabling the causal attention mask (Row 1 vs. 13): Disables the causal attention mask used in the multi-head attention module. This means all tokens can attend all other tokens. The model accuracy is slightly improved compared to the GPT model with causal mask.
-
You are in a Senior Machine Learning Interview at Google DeepMind. The interviewer sets a trap: "We have a 1:1000 class imbalance for fraud detection. We applied 𝘤𝘭𝘢𝘴𝘴_𝘸𝘦𝘪𝘨𝘩𝘵𝘴 to the 𝐂𝐫𝐨𝐬𝐬-𝐄𝐧𝐭𝐫𝐨𝐩𝐲 loss, but the model is still missing the hard edge cases. What do we do?" 90% of candidates walk right into the wall. Most candidates immediately suggest aggressive oversampling (𝘚𝘔𝘖𝘛𝘌) or tuning the class weights even higher (e.g., 1:5000). They think: "If the minority class is ignored, I just need to scream louder (higher weights) during backprop." ------ 𝐓𝐡𝐞 𝐑𝐞𝐚𝐥𝐢𝐭𝐲: You aren't losing because the weights are wrong. You are losing because of 𝐆𝐫𝐚𝐝𝐢𝐞𝐧𝐭 𝐃𝐫𝐨𝐰𝐧𝐢𝐧𝐠. Even with perfect class weights, your dataset likely contains 990,000 "easy" negatives (legitimate transactions that are obviously legit) and 1,000 "hard" positives. In standard 𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐂𝐫𝐨𝐬𝐬-𝐄𝐧𝐭𝐫𝐨𝐩𝐲 (𝐖𝐂𝐄), the gradients from those 990,000 easy examples, even if individually small, sum up to dominate the update step. The model spends all its capacity optimizing examples it has already learned, drowning out the signal from the difficult, subtle fraud cases. ------ The Solution: 𝐓𝐡𝐞 𝐄𝐚𝐬𝐲-𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝐒𝐮𝐩𝐩𝐫𝐞𝐬𝐬𝐢𝐨𝐧 You don't need to re-balance the counts. You need to re-balance the difficulty. The solution is switching from 𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐂𝐫𝐨𝐬𝐬-𝐄𝐧𝐭𝐫𝐨𝐩𝐲 to 𝐅𝐨𝐜𝐚𝐥 𝐋𝐨𝐬𝐬. Focal Loss adds a modulating factor (1 − pₜ)ᵞ to the standard loss equation. Here is what happens in production: - 𝘐𝘧 𝘵𝘩𝘦 𝘮𝘰𝘥𝘦𝘭 𝘪𝘴 𝘶𝘯𝘴𝘶𝘳𝘦 (𝘏𝘢𝘳𝘥 𝘌𝘹𝘢𝘮𝘱𝘭𝘦): The modulating factor stays near 1. The loss is unchanged. The model learns. - 𝘐𝘧 𝘵𝘩𝘦 𝘮𝘰𝘥𝘦𝘭 𝘪𝘴 𝘤𝘰𝘯𝘧𝘪𝘥𝘦𝘯𝘵 (𝘌𝘢𝘴𝘺 𝘌𝘹𝘢𝘮𝘱𝘭𝘦): The factor drops to near 0. The loss contribution is effectively "shut off." This forces the model to stop patting itself on the back for identifying the obvious negatives and focus 100% of its gradient descent budget on the edge cases. 𝐓𝐡𝐞 𝐀𝐧𝐬𝐰𝐞𝐫 𝐓𝐡𝐚𝐭 𝐆𝐞𝐭𝐬 𝐘𝐨𝐮 𝐇𝐢𝐫𝐞𝐝: "𝐖𝐞𝐢𝐠𝐡𝐭𝐞𝐝 𝐂𝐫𝐨𝐬𝐬-𝐄𝐧𝐭𝐫𝐨𝐩𝐲 solves for moderate imbalance (1:10) by balancing counts. 𝐅𝐨𝐜𝐚𝐥 𝐋𝐨𝐬𝐬 solves for extreme imbalance (1:1000+) by balancing hardness. In a fraud scenario, I would implement 𝐅𝐨𝐜𝐚𝐥 𝐋𝐨𝐬𝐬 with γ = 2 to down-weight the easy negatives that are currently dominating the gradient." #MachineLearning #DeepLearning #MLEngineering #AIEngineering #NeuralNetworks #ModelOptimization
-
Me and my colleagues at Google DeepMind and Google Research are sharing our latest work on tropical cyclone prediction, now available through a research tool, Weather Lab: https://lnkd.in/dNtjmiYq Over the past 50 years, tropical cyclones, also known as hurricanes or typhoons, have claimed more than 779,000 lives and caused $1.4 trillion in economic losses [WMO]. For the millions of people living in their path, the accuracy of weather forecasting is the most critical line of defense. In an effort to protect lives and property from this threat, we’ve built a powerful new machine learning (ML)-based ensemble weather model, deployed it operationally on Weather Lab, and partnered with experts from the U.S. National Hurricane Center (NHC) who will assess its live predictions alongside their established forecasting tools. The ensemble mean cyclone track of our new model gains about 1.5 days of position error advantage over ECMWF ENS in tests based on NHC protocols. And surprisingly, our model has a lower average intensity error than NOAA’s high-resolution hurricane model, HAFS-A, in more than 60 of the 74 cyclones evaluated in 2023 and 2024 in the East Pacific and North Atlantic basins. We achieved this by building a new kind of ML weather model, FGN [Ferran Alet Puig et al., 2025], which substantially outperforms GenCast on probabilistic metrics, and specialising it for cyclone tracking by training it on a record of nearly 5,000 tropical cyclones from the past 45 years. Most human forecasters do not trust a weather model until its performance is demonstrated in a real-time setting. That’s why we built Weather Lab, available globally, providing access to live and historical visualisations of tropical cyclone predictions from our new ML weather model, with WeatherNext and ECMWF models shown for comparison. We recently enabled live data downloads in CSV and ATCF format for experts to evaluate. This is a powerful new tool in the toolbox, but no single model is perfect. It will remain key that human forecasters evaluate a wide range of both ML and physics-based predictions when issuing public warnings for cyclone threats. And of course, ML weather models continue to depend on the historical and real-time availability of atmospheric analysis datasets produced by physical modelling centres, and the continued quality and coverage of the Earth’s observing system. Tropical cyclones will likely become more destructive over time [IPCC, 2023]. It is crucial we continue improving our monitoring, prediction, and understanding of these complex beasts of physics. Try Weather Lab: https://lnkd.in/dNtjmiYq Blog post: https://lnkd.in/dkj8cYan FGN (Alet et al., 2025): https://lnkd.in/dJhP9Kj2 WMO: https://lnkd.in/dPt94VX5 IPCC, 2023: https://lnkd.in/dj5n-Rqg
-
In the last three months alone, over ten papers outlining novel prompting techniques were published, boosting LLMs’ performance by a substantial margin. Two weeks ago, a groundbreaking paper from Microsoft demonstrated how a well-prompted GPT-4 outperforms Google’s Med-PaLM 2, a specialized medical model, solely through sophisticated prompting techniques. Yet, while our X and LinkedIn feeds buzz with ‘secret prompting tips’, a definitive, research-backed guide aggregating these advanced prompting strategies is hard to come by. This gap prevents LLM developers and everyday users from harnessing these novel frameworks to enhance performance and achieve more accurate results. https://lnkd.in/g7_6eP6y In this AI Tidbits Deep Dive, I outline six of the best and recent prompting methods: (1) EmotionPrompt - inspired by human psychology, this method utilizes emotional stimuli in prompts to gain performance enhancements (2) Optimization by PROmpting (OPRO) - a DeepMind innovation that refines prompts automatically, surpassing human-crafted ones. This paper discovered the “Take a deep breath” instruction that improved LLMs’ performance by 9%. (3) Chain-of-Verification (CoVe) - Meta's novel four-step prompting process that drastically reduces hallucinations and improves factual accuracy (4) System 2 Attention (S2A) - also from Meta, a prompting method that filters out irrelevant details prior to querying the LLM (5) Step-Back Prompting - encouraging LLMs to abstract queries for enhanced reasoning (6) Rephrase and Respond (RaR) - UCLA's method that lets LLMs rephrase queries for better comprehension and response accuracy Understanding the spectrum of available prompting strategies and how to apply them in your app can mean the difference between a production-ready app and a nascent project with untapped potential. Full blog post https://lnkd.in/g7_6eP6y
-
You might have seen news from our Google DeepMind colleagues lately on GenCast, which is changing the game of weather forecasting by building state-of-the-art weather models using AI. Some of our teams started to wonder – can we apply similar techniques to the notoriously compute-intensive challenge of climate modeling? General circulation models (GCMs) are a critical part of climate modeling, focused on the physical aspects of the climate system, such as temperature, pressure, wind, and ocean currents. Traditional GCMs, while powerful, can struggle with precipitation – and our teams wanted to see if AI could help. Our team released a paper and data on our AI-based GCM, building on our Nature paper from last year - specifically, now predicting precipitation with greater accuracy than prior state of the art. The new paper on NeuralGCM introduces 𝗺𝗼𝗱𝗲𝗹𝘀 𝘁𝗵𝗮𝘁 𝗹𝗲𝗮𝗿𝗻 𝗳𝗿𝗼𝗺 𝘀𝗮𝘁𝗲𝗹𝗹𝗶𝘁𝗲 𝗱𝗮𝘁𝗮 𝘁𝗼 𝗽𝗿𝗼𝗱𝘂𝗰𝗲 𝗺𝗼𝗿𝗲 𝗿𝗲𝗮𝗹𝗶𝘀𝘁𝗶𝗰 𝗿𝗮𝗶𝗻 𝗽𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻𝘀. Kudos to Janni Yuval, Ian Langmore, Dmitrii Kochkov, and Stephan Hoyer! Here's why this is a big deal: 𝗟𝗲𝘀𝘀 𝗕𝗶𝗮𝘀, 𝗠𝗼𝗿𝗲 𝗔𝗰𝗰𝘂𝗿𝗮𝗰𝘆: These new models have less bias, meaning they align more closely with actual observations – and we see this both for forecasts up to 15 days, and also for 20-year projections (in which sea surface temperatures and sea ice were fixed at historical values, since we don’t yet have an ocean model). NeuralGCM forecasts are especially performant around extremes, which are especially important in understanding climate anomalies, and can predict rain patterns throughout the day with better precision. 𝗖𝗼𝗺𝗯𝗶𝗻𝗶𝗻𝗴 𝗔𝗜, 𝗦𝗮𝘁𝗲𝗹𝗹𝗶𝘁𝗲 𝗜𝗺𝗮𝗴𝗲𝗿𝘆, 𝗮𝗻𝗱 𝗣𝗵𝘆𝘀𝗶𝗰𝘀: The model combines a learned physics model with a dynamic differentiable core to leverage both physics and AI methods, with the model trained directly on satellite-based precipitation observations. 𝗢𝗽𝗲𝗻 𝗔𝗰𝗰𝗲𝘀𝘀 𝗳𝗼𝗿 𝗘𝘃𝗲𝗿𝘆𝗼𝗻𝗲! This is perhaps the most exciting news! The team has made their pre-trained NeuralGCM model checkpoints (including their awesome new precipitation models) available under a CC BY-SA 4.0 license. Anyone can use and build upon this cutting-edge technology! https://lnkd.in/gfmAx_Ju 𝗪𝗵𝘆 𝗧𝗵𝗶𝘀 𝗠𝗮𝘁𝘁𝗲𝗿𝘀: Accurate predictions of precipitation are crucial for everything from water resource management and flood mitigation to understanding the impacts of climate change on agriculture and ecosystems. Check out the paper to learn more: https://lnkd.in/geqaNTRP
-
When L'Oréal uses AI to create new hair colors based on social media trends, they're in salons within weeks. Kraft Heinz—dead last in our study—still takes months to tweak a formula. After analyzing 26 major CPG companies at IMD's Center for Future Readiness, I discovered what separates winners from losers: The most future-ready companies treat consumer data like insider trading information. BACKGROUND: CPG in 2025 is brutal. Inflation persists. Gen-Z demands sustainability without premiums. Tariffs reshape supply chains daily. McKinsey & Company identified 150+ AI use cases for CPG transformation. Only 5 of 26 companies actually execute them. THE REVELATION: Coca-Cola didn't randomly launch Topo Chico Hard Seltzer. Their AI spotted the trend through social listening while competitors debated in boardrooms. By launch, they'd secured distribution nationwide. That's not innovation. That's prediction. What separates the top 5: L'Oréal (#1): 3.5% of sales to R&D. AI analyzes preferences real-time. Virtual try-on apps. Creates products from social trends. A 110-year company with startup velocity. The Coca-Cola Company (#2): Democratized AI internally. Every manager accesses demand forecasting. They analyze weather + social sentiment + sales simultaneously. These aren't tech companies selling beauty and beverages. They're prediction machines that happen to make products. THE WINNER'S FRAMEWORK: 1. AI at scale, not in pilots Winners integrate into workflows. Losers run demos. 2. Supply chains that anticipate Real-time visibility + AI forecasting = competitive firepower 3. D2C as intelligence goldmine 73% use multiple channels. Mine every interaction. 4. Disrupt yourself first Coca-Cola launched Costa Coffee, hard seltzers. Grew. Kraft Heinz protected legacy brands. Shrank. 5. Sustainable without premium Gen-Z spending hits $12T by 2030. They demand action at everyday prices. —— The inconvenient truth: Most CPG companies treat data like reporting instead of radar. Winners don't predict trends—they're already shipping products while competitors debate. Technological patience (knowing when to scale) + organizational agility (pivoting fast) = market domination. Three years from now, every CPG company operates like L'Oréal. Or they don't operate at all. P.S. Full Future Readiness Indicator here: https://bit.ly/3YTBzbX
-
Yesterday’s sales can’t see tomorrow’s storm, But AI can 😎 Most manufacturers still build demand forecasts based on one thing: 𝐡𝐢𝐬𝐭𝐨𝐫𝐢𝐜𝐚𝐥 𝐬𝐚𝐥𝐞𝐬. Which is fine… until the market shifts. Or weather changes. Or a social post goes viral. (Which is basically always.) That’s why AI is changing the forecasting game. Not by making predictions perfect—just a lot less wrong. And a little less wrong can mean a lot more profitable. According to the Institute of Business Forecasting, the average tech company saves $𝟗𝟕𝟎𝐊 per year by reducing under-forecasting by just 1%, and another $𝟏.𝟓𝐌 by trimming over-forecasting. For consumer product companies, those same 1% improvements are worth $𝟑.𝟓𝐌 (under-forecasting) and $𝟏.𝟒𝟑𝐌 (over-forecasting). (Source: https://lnkd.in/e_NJNevk) And were are only talking 1 improvement%!!! Let that sink in... All that money just from getting a little better at predicting what customers will actually buy. And yes, AI can help you get there: • By ingesting external signals (weather, social, events, IoT, etc.) • By recognizing nonlinear patterns that Excel never will • And by constantly learning—unlike your spreadsheet But it’s not just about tech. It’s about process: • Use Forecast Value-Added (FVA) to track which steps help (or hurt) • Get sales, marketing, and ops aligned in S&OP—not working in silos • Focus on data quality—AI is only as smart as your ERP is clean • Plan continuously—forecasting is not a set-it-and-forget-it task Bottom line: If you’re still relying on history to predict the future, you’re underestimating the cost of being wrong. Your competitors aren’t. ******************************************* • Visit www.jeffwinterinsights.com for access to all my content and to stay current on Industry 4.0 and other cool tech trends • Ring the 🔔 for notifications!
-
Stop oversampling for class imbalance! 🚫 So reads the title of one article that warns us on the use of oversampling to tackle class imbalance. While I am no defender of oversampling - it does come with its sets of drawbacks - I oppose these all-or-none statements: oversampling is the key, or stop oversampling… forever. I feel they invite us to follow. Instead of inviting us to think. 🤔 The article has one good main point, which I thought went without saying ⇒ oversampling by creating synthetic data points, as SMOTE does, can create samples that may not exist in the real population. Impossible observations so to speak. An obvious example occurs when interpolating between 2 discrete values ⇒ It will generate a non-discrete value, which is, well, not possible based on the variable’s nature. The article also argues that oversampling methods that create synthetic data almost always create synthetic data points that belong to the majority class. I mean, having tested that assumption only on 3 datasets, it feels to me a bit of a stretch to conclude that, well, that happens almost always. Having said this, I want to highlight another important fact hidden in this article: They made a summary of a huge number of oversampling methods, highlighting the machine learning model used and the metric evaluated. 🤓 It’s almost always decision trees, random forests, svms, or adaboost. So while the conclusions about SMOTE could be valid for those algorithms, we don’t know its value with respect to more powerful methods like xgboost or lightGBMs. In addition, in most cases the performance metrics used were threshold dependent, and it is unclear, if they optimized the threshold selection, most likely they used 0.5, which is not ideal for imbalanced datasets. And a more recent article suggests that if they had optimized the threshold, then, the value of SMOTE begins to disappear. What does this mean for us working with imbalanced datasets for binary classification? 1️⃣ First, we should use powerful models, like xgboost, lightGBM or catboost when possible 2️⃣ Second, we should optimize the probability threshold that we’ll use to predict the class (0.5 is not the one) 3️⃣ If you decide to do oversampling ⇒ check the quality of the synthetic data. If the new data don’t make sense, your model won’t make sense either. 4️⃣ Finally, test the model on recent data, to ensure that what you developed on synthetic data still performs as expected on real life data.
-
RAG stands for Retrieval-Augmented Generation. It’s a technique that combines the power of LLMs with real-time access to external information sources. Instead of relying solely on what an AI model learned during training (which can quickly become outdated), RAG enables the model to retrieve relevant data from external databases, documents, or APIs—and then use that information to generate more accurate, context-aware responses. How does RAG work? 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗲: The system searches for the most relevant documents or data based on your query, using advanced search methods like semantic or vector search. 𝗔𝘂𝗴𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻: Instead of just using the original question, RAG 𝗮𝘂𝗴𝗺𝗲𝗻𝘁𝘀 (enriches) the prompt by adding the retrieved information directly into the input for the AI model. This means the model doesn’t just rely on what it “remembers” from training—it now sees your question 𝘱𝘭𝘶𝘴 the latest, domain-specific context 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗲: The LLM takes the retrieved information and crafts a well-informed, natural language response. 𝗪𝗵𝘆 𝗱𝗼𝗲𝘀 𝗥𝗔𝗚 𝗺𝗮𝘁𝘁𝗲𝗿? Improves accuracy: By referencing up-to-date or proprietary data, RAG reduces outdated or incorrect answers. Context-aware: Responses are tailored using the latest information, not just what the model “remembers.” Reduces hallucinations: RAG helps prevent AI from making up facts by grounding answers in real sources. Example: Imagine asking an AI assistant, “What are the latest trends in renewable energy?” A traditional LLM might give you a general answer based on old data. With RAG, the model first searches for the most recent articles and reports, then synthesizes a response grounded in that up-to-date information. Illustration by Deepak Bhardwaj
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development