You searched for feed | DataRobot https://www.datarobot.com/ Deliver Value from AI Mon, 20 May 2024 10:01:16 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 DataRobot Recognized by Customers with TrustRadius Top Rated Award for Third Consecutive Year  https://www.datarobot.com/blog/datarobot-recognized-by-customers-with-trustradius-top-rated-award-for-third-consecutive-year/ Thu, 09 May 2024 17:42:48 +0000 https://www.datarobot.com/?post_type=blog&p=54953 We’re thrilled to share that our customers have recognized DataRobot in the TrustRadius Top Rated Award in the following categories: Data science, Machine learning, Predictive analytics. Learn more

The post DataRobot Recognized by Customers with TrustRadius Top Rated Award for Third Consecutive Year  appeared first on DataRobot AI Platform.

]]>
Our mission at DataRobot has been to help customers use AI to drive business value. 

Business value is built into our DNA, and nothing is better than hearing the success stories directly from our customers.

We’re thrilled to share that our customers have recognized DataRobot in the TrustRadius Top Rated Award for the third consecutive year in the following categories:

  • Data science
  • Machine learning
  • Predictive analytics

We are incredibly proud of this award — based solely on customer reviews.

About TrustRadius

TrustRadius is a buyer intelligence platform for business technology and its annual Top Rated Awards are based entirely on customer feedback – they aren’t influenced by outside opinion. TrustRadius looks at the recency of reviews, relevancy of products compared to others in the same category, and overall ratings. 

With a trScore of 8.8 out of 10 and nearly 60 verified reviews from our customers, we’re proven as one of the most valuable platforms in our industry, with demonstrated impact and results.

Why our customers trust DataRobot

In their own words, our customers share the wins they’ve experienced by using the DataRobot AI Platform:

When I spoke with our Chief Customer Officer, Jay Schuren, he shared his sincere appreciation for our brilliant customers and thanked them for this recognition. He said:

We continually strive to wow our customers. The Top Rated Award is only made possible by our customers’ success. When our customers win, we join them in celebrating the business transformations made possible with AI.
jay
Jay Schuren

Chief Customer Officer

Learn more

Hear how customers deliver AI value at FordDirect, Freddie Mac, 84.51°, and many more.  

Demo
See the DataRobot AI Platform in Action
Book a demo

The post DataRobot Recognized by Customers with TrustRadius Top Rated Award for Third Consecutive Year  appeared first on DataRobot AI Platform.

]]>
DataRobot Spring 2024 https://www.datarobot.com/platform/new/datarobot-spring-2024/ Thu, 02 May 2024 16:00:00 +0000 https://www.datarobot.com/?post_type=release&p=54750 Confidently Deploy and Govern GenAI Solutions Datarobot delivers testing, optimization and AI observability to enable customers to create production-grade AI applications, observe and intervene in real-time and govern and optimize the infrastructure. Create Production-Grade AI Applications Build safe and useful generative and predictive applications with advanced RAG testing and evaluation techniques. Observe and Intervene in...

The post DataRobot Spring 2024 appeared first on DataRobot AI Platform.

]]>
Confidently Deploy and Govern GenAI Solutions

Datarobot delivers testing, optimization and AI observability to enable customers to create production-grade AI applications, observe and intervene in real-time and govern and optimize the infrastructure.

Create Production-Grade AI Applications

Build safe and useful generative and predictive applications with advanced RAG testing and evaluation techniques.

  • Enterprise-grade open source LLM support: Leverage any open source foundation models — LLaMa, Hugging Face, Falcon or Mistral, and new models like Nematron-3-8B.
  • LLM Metrics Evaluation and Assessment: Quickly evaluate the quality of your RAG pipeline with metrics like correctness, faithfulness and effectiveness along with user feedback integration and guard model testing to ensure optimal performance and safety.
  • LLM Playground Advanced Testing: Advanced production-tested RAG workflow and customization tools let you test various embedding strategies, chunking strategies, and LLMs.
  • Notebook Codespaces: Seamlessly collaborate on AI projects accessible from anywhere, version control with Git, work on multiple notebooks simultaneously, all in a user-friendly interface for efficient code development and deployment.
  • Model Training on GPUs in Workbench: Accelerate model training and improve productivity with NVIDIA Rapids GPU accelerated libraries in DataRobot notebooks.
  • Q&A Chat App: Accelerate experimentation with ready to use, interactive, GenAI application that can be shared with stakeholders to test GenAI experiments created in the DataRobot playground
  • App Workshop: Eliminate complex deployment hurdles with a release-ready AI app workshop. Centralize how users register, catalog, deploy, and manage the full life cycle without tool hopping. 
  • Prompt Tracing: Pinpoint the source of a model’s performance problem and map it back to the place in your vector database causing the issue, then leverage user feedback to train predictive models, enhancing model performance and user experience.
image6 1

Observe and Intervene in Real-Time

Quickly detect and prevent unexpected and unwanted behaviors.

  • Unified Registry for GenAI Apps and ML Models: Standardize AI visibility, deployment, integration, and monitoring for individuals or groups of machine learning models and AI applications.
  • GenAI Guard Library: Control model performance with a full suite of out-of-the-box metrics, custom guard models, and methods, including resource consumption, PII detection, toxicity, faithfulness, and more.
  • Real-time LLM Intervention and Moderation: Create a strong, multilayered defense strategy and minimize risk with dynamic real-time oversight and intervention methods at prompt and response to prevent issues like hallucinations, prompt injection, and PII leakage.
  • Multi-Language GenAI Text Drift: Assess topic trends from user interactions and leverage data drift word cloud insights to augment vector databases, adjust RAG models, or fine-tune models for text generation projects.
  • Custom Governance Tests via Jobs: Validate model performance and behaviors, and create custom explainability insights to export charts and coefficients for compliance documentation.
Configure evaluation and moderation

Govern and Optimize Infrastructure Investments

Get more value from your existing infrastructure with a purpose-built AI platform.

  • NVIDIA Inference Triton Server Integration: Seamlessly deploy high-performance models with NVIDIA’s Triton Inference Server integration, with extra acceleration on all your GPU-based models, optimizing inference speed and resource efficiency.
  • Optimized AI Inference with NVIDIA Inference Microserves (NIMs): Enhance model training and remove the need for individual GPU-powered systems with NVIDIA Inference Microservices in DataRobot.
  • Cross-Cloud & Hybrid AI Observability: Effortlessly manage your AI portfolio across cloud and hybrid environments with comprehensive observability, cross-environment visibility, and unified governance.
  • Global Models: Ensure consistent security and performance monitoring across your AI assets with Open Source Deep Learning and NLP models and share the best performing models with contributors.
  • Registry Jobs and Notification Policies: Validate model performance and behavior and reduce time-to-detection and time-to-resolution with real-time notifications and highly customizable alerting. 
  • Custom Apps Sharing: Safely share custom GenAI apps with stakeholders inside or outside your organization while adhering to governance and security policies through granular RBAC and governance policies.
image3

Go the DataRobot Documentation Release Center for more information.

The post DataRobot Spring 2024 appeared first on DataRobot AI Platform.

]]>
Multi-Cloud Generative AI https://www.datarobot.com/platform/generative-ai/ Wed, 01 May 2024 21:00:00 +0000 https://www.datarobot.com/?page_id=49035 The post Multi-Cloud Generative AI appeared first on DataRobot AI Platform.

]]>

The End-to-End Generative AI Platform

Build, govern, and operate enterprise-grade generative AI solutions with confidence.

Request a Demo
Adapt as Your Needs Change

Enjoy the freedom to rapidly innovate and adapt with the best-of-breed components of your choice (LLMs, vector databases, embedding models), across cloud environments.

flexible way dark
Scale While Maintaining Security and Managing Cost

Safeguard proprietary data by extending your LLMs and monitor cost of your generative AI projects real-time to keep them under control.

data scientist info data search graph dark
Create Production-Grade AI Applications

Deploy and maintain safe, high-quality, generative AI applications and solutions in production.

model dark
Gain Visibility Across Your Entire AI Landscape

Unify your generative and predictive AI workflows to break down silos with one end-to-end experience.

monitor search find detect risk error fraud alert dark
Observe and Intervene in Real-Time

Quickly detect and prevent unexpected and unwanted behaviors.

Use the Best-of-Breed Components Across Any Cloud

Our API-first integrations let you stay in the driver’s seat for your generative AI initiatives and prevent vendor lock-in. Freely choose your generative AI components from vector databases, to embedding models to any LLM. With built-in access to all major LLMs from Google, Microsoft, or Amazon; and enterprise-grade hosting of open source LLMs, you can freely choose the right foundation model for your use-case.

ai models clouds
Generative AI Playground

Build Sophisticated Generative AI Applications in Hours

Rapidly innovate to deploy new generative AI use cases in hours, with an intuitive interface and a built-in suite of testing, evaluation and assessment metrics, you can easily experiment and compare in our Playground. While a suite of tools like Azure OpenAI-powered Code Assist and generative AI accelerators help you jumpstart your AI projects. With organized spaces for project management, hosted notebooks, and the ability to prototype and host generative AI apps, DataRobot centralizes your workflow so you can focus on creating valuable AI solutions.

Make Every App an AI-Powered App

Easily integrate generative AI into your organization’s existing operations and systems such as Slack, Salesforce, BI tools and more, with just a few lines of code. Quickly build, prototype and customize bespoke generative AI applications in a few clicks with a hosted Streamlit application sandbox that lets you volley between building and seeing to ensure you’re creating the best user experience, then easily move that application into production with our hosted apps

Bring your GenAI Projects to Life
DataRobot unified Registry and Console generative and predictive models

Manage All of Your Generative AI Assets in One Unified Experience

Prevent AI chaos. Unify your entire AI landscape into a single source of truth and system of record with our unified Registry and Console for generative AI app and predictive models, whether built on or off the DataRobot platform. Use Open Source Deep Learning and NLP models to enhance your AI flows with moderation and interventions. Enable your collaborators to make their own global models to share the best performing models across the organization. Manage your vector databases, LLMs, and prompt engineering strategies neatly together – no matter who built them or how they were built. Upgrade LLMs and keep track of changes with automated versioning that records all model changes, and lets you easily revert to earlier versions when needed. 

Confidently Scale and Continuously Optimize Generative AI 

Securely operate and govern your generative AI assets with enterprise-grade LLMOps capabilities. Protect your business reputation and trust your GenAI assets are going to perform as expected in production. 

Safely extend your LLMs with our Vector Database Builder to ensure your proprietary data stay private. Standardize and scale with universal governance and security policies and controls for all of your generative AI assets regardless of deployment or origin. 

Build a multilayered, real-time intervention and moderation strategy to prevent  prompt injections, hallucinations or PII leaks, and create user feedback loops to continuously improve your generative AI applications.

Confidently Scale and Continuously Optimize Generative AI
Keep Your GenAI Costs Under Control

Keep Your Costs Under Control

Keep the cost of generative AI under control as you scale, so you don’t get surprised with an oversized bill. Cost insight metrics let you easily observe the cost of your generative AI applications in real-time, to empower you with the information you need to make cost-performance trade-offs. Set alerts to notify you if costs exceed a designated threshold,  so you can quickly intervene.

Accelerate Your Path from Concept to Implementation and ROI

Fast-track your initiatives, build a clear roadmap for success, and raise your team’s generative AI expertise. Systematically identify and prioritize high-value opportunities with our GenAI Roadmapping Sessions, and quickly get all your leaders up to speed with our GenAI for Executives program. Our GenAI Catalyst program gives you workshops and hands-on-labs help you learn how to build, optimize, and monitor generative AI applications at scale.

Full scale program timeline V2

Global Enterprises Trust DataRobot to Deliver Speed, Impact, and Scale

  • “DataRobot is an indispensable partner helping us maintain our reputation both internally and externally by deploying, monitoring, and governing generative AI responsibly and effectively.”
    Tom Thomas
    Tom Thomas

    Vice President of Data & Analytics, FordDirect

  • “The generative AI space is changing quickly, and the flexibility, safety and security of DataRobot helps us stay on the cutting edge with a HIPAA-compliant environment we trust to uphold critical health data protection standards. We’re harnessing innovation for real-world applications, giving us the ability to transform patient care and improve operations and efficiency with confidence”
    Rosalia Tungaraza
    Rosalia Tungaraza

    Ph.D, AVP, Artificial Intelligence, Baptist Health South Florida

  • “The value of having DataRobot as a single platform that pulls all the components together can’t be underestimated.”
    Craig Civil
    Craig Civil

    Director of Data Science & AI

    • warner logo color
    • tampa logo color
    • tokiomarine logo color
    • Panasonic logo color
    • mars corp logo color
    • boston children hospital logo color
    • warner logo color
    • tampa logo color
    • tokiomarine logo color
    • Panasonic logo color
    • mars corp logo color
    • boston children hospital logo color
    • warner logo color
    • tampa logo color
    • tokiomarine logo color
    • Panasonic logo color
    • mars corp logo color
    • boston children hospital logo color
    cta module 1920px

    Take AI From Vision to Value

    See how a value-driven approach to AI can accelerate time to impact.

    The post Multi-Cloud Generative AI appeared first on DataRobot AI Platform.

    ]]>
    Belong @ DataRobot: Celebrating 2024 Women’s History Month with DataRobot AI Legends https://www.datarobot.com/blog/belong-datarobot-celebrating-2024-womens-history-month-with-datarobot-ai-legends/ Thu, 28 Mar 2024 16:18:44 +0000 https://www.datarobot.com/?post_type=blog&p=54191 As we celebrate Women’s History Month, we caught up with DataRobot AI Legends and asked them questions about confidence, resiliency, and career.

    The post Belong @ DataRobot: Celebrating 2024 Women’s History Month with DataRobot AI Legends appeared first on DataRobot AI Platform.

    ]]>
    The DataRobot Belong Community Women@DR was established to bring together women and allies at DataRobot for support, networking, encouragement, resources, and community. We’ve celebrated successes and accomplishments, created safe environments to support each other through difficulties, and created both space for vulnerability and a sounding board for ideas and action. 

    We developed a mission to serve our community thoughtfully.

    Women@DR seeks to create, promote and expand an inclusive culture that connects, educates and advances the needs, professional goals, and aspirations of female-identifying members and allies.

    As we celebrate Women’s History Month and the International Women’s Day in March, we look to our community of strong, resilient, and skillful leaders to set the tone. In our February 2024 global company kick off, 7 women were celebrated as DataRobot AI Legends. They were honored as the embodiment of DataRobot values and success.

    We caught up with some of these trailblazers and asked them questions that might inspire others to become legendary:

    • How have you built confidence and/or resiliency over the course of your career?
    • What advice would you give a female identifying person who is just starting their career?

    Here’s what they had to say …

    Ina Ko

    Ina Ko, Senior Product Director, Customer Engineering

    How have you built confidence and/or resiliency over the course of your career?

    Building confidence and resilience over the course of my career has involved embracing humility, seeking new challenges, using a supportive network for guidance and feedback, and focusing on the impact of my work. Recognizing the fluidity of life and career paths as growth opportunities rather than setbacks has been essential. This mindset has helped me navigate uncertainties, turn failures into lessons, and embrace continuous learning and adaptation.

    What advice would you give a female identifying person who is just starting their career?

    For women just starting their careers, my advice is to focus on your unique journey without comparing yourself to others. Embrace change, as it brings valuable lessons that refine your path to success. Cultivate a mindset of positive intent and seek to understand the motivations of your colleagues and peers. This approach will help you foster a collaborative environment by building deep and authentic connections with those you work alongside. Embrace every opportunity to grow, lean into changes, and remember that your unique contributions and resilience will define your path.

    Julia Townsend

    Julia Townsend, Director, Revenue Strategy & Execution

    How have you built confidence and/or resiliency over the course of your career?

    When I first started out back in the day, I had a big goal I wanted to achieve and everything I did was chosen to chip away at the goal.  This really helped me see a bigger picture and not get too down when stumbling blocks came along. I also have a personal mantra that ‘this is one moment, and will pass with time’, that really helps!

    Embracing new opportunities when they come up, even if you think you might not have all the skills you think are needed, has led to some interesting career experiences for me. This has made me feel confident that I can turn my hand to new projects even if I do not have all the answers right away.

    What advice would you give a female identifying person who is just starting their career?

    I would recommend really taking the time to invest in building relationships and champions across the business.  This is such a great career and life skill that is essential to master in order to get things done and have trust and influence.  This has a lasting impact for every new move you make.

    Brook Miller

    Brook Miller, Senior Director, Demand Planning, Creative, & APAC Marketing

    How have you built confidence and/or resiliency over the course of your career?

    For me, confidence comes from knowing that mastery isn’t the end goal for me. Instead, I choose to embrace a mindset of continual learning and build confidence through collaborating with people who can teach me new things as well as sharing my own experience with others. 

    In the times where I need to draw on resilience, I find that zooming out to the bigger picture to gain some perspective really helps. By viewing the tough times as just a chapter in the overall story of your career, you can keep moving forward without being consumed by what’s happening in the present moment. It’s also ok to ask for help, say you don’t know the answer or take some time out to reset when needed.

    What advice would you give a female identifying person who is just starting their career?

    Take the time to understand yourself, your strengths and where you can contribute and then put your hand up for something that scares you every year. The right mindset will take you very far while you’re building your skillset. Align yourself with the people you admire and want to learn from and don’t be afraid to ask them for guidance. One day you’ll be able to give that same gift to someone else.

    Katherine Stepanova

    How have you built confidence and/or resiliency over the course of your career?

    Building confidence and resilience is a process that requires a lot of patience, support from others, both professionally and personally, and lots of hard work. My confidence and resilience, first and foremost, come from my grandma, an incredibly strong woman who had gone through a myriad of challenges in her life and was able to embrace the challenges and overcome difficulties and failures. She taught me from early childhood that girls can do anything they set out to do if they get educated, work hard, practice commendable work ethic no matter what they do, and believe in themselves. She’s been my biggest cheerleader all my life, and I’m eternally grateful to her for that, and that’s something I will try to pass onto my daughter as well.    

    What advice would you give a female identifying person who is just starting their career?

    Build a strong support system, both professionally and personally. There will be good days and bad ones, bumps on the road, but with a strong support system you’ll never lose the desire and ability to move forward, reflect on the challenges, and overcome difficulties. Don’t forget to enjoy the journey! 

    Sahana Sundara Raj Sreenath

    Sahana Sreenath, Lead, Database Engineer

    How have you built confidence and/or resiliency over the course of your career?

    For me, it has been through stepping out of my comfort zone and taking on tasks I found challenging initially. Reaching out readily when in need of guidance and not being afraid of making mistakes in the beginning all make for a solid foundation to build on.  

    What advice would you give a female identifying person who is just starting their career?

    I found the following very helpful when starting out: 1) Remaining curious everyday will carry you far and help you to learn the intricacies of your roles and responsibilities better, 2) Frequently soliciting and incorporating feedback will enable you to become more efficient in the workplace, and 3) Continue to stay motivated and dedicated, develop a strong work ethic, challenge yourself whenever possible and your work will speak for itself!

    Teresa Gearin

    Teresa Gearin, Marketing Operations Project Manager

    How have you built confidence and/or resiliency over the course of your career?

    Embracing change! Change is inevitable and sometimes feels constant but learning to adapt and being open to learning and enhancing your skills makes change feel less cumbersome and more like an opportunity. I am always actively seeking ways to improve my environment whether it is processes, organization tactics, or just in the way I think and perceive any situation. This has helped me build the confidence needed to navigate male dominated industries/teams, tough situations, and advocate for myself. At the end of the day I am my greatest ally and worst critic, practicing balancing the two helps. 

    What advice would you give a female identifying person who is just starting their career?

    Your voice matters! Learn to embrace the power of advocating for yourself and others confidently and assertively. Your time, experiences, and thoughts are valuable. Always be coachable, proactive in your learning and development, and honest with yourself about what you truly value and strive for it. 

    DataRobot AI Platform
    Get Started with Free Trial

    Experience new features and capabilities previously only available in our full AI Platform product.

    The post Belong @ DataRobot: Celebrating 2024 Women’s History Month with DataRobot AI Legends appeared first on DataRobot AI Platform.

    ]]>
    Choosing the Right Vector Embedding Model for Your Generative AI Use Case https://www.datarobot.com/blog/choosing-the-right-vector-embedding-model-for-your-generative-ai-use-case/ Thu, 07 Mar 2024 15:33:37 +0000 https://www.datarobot.com/?post_type=blog&p=53883 When building a RAG application we often need to choose a vector embedding model, a critical component of many generative AI applications. Learn mor

    The post Choosing the Right Vector Embedding Model for Your Generative AI Use Case appeared first on DataRobot AI Platform.

    ]]>
    In our previous post, we discussed considerations around choosing a vector database for our hypothetical retrieval augmented generation (RAG) use case. But when building a RAG application we often need to make another important decision: choose a vector embedding model, a critical component of many generative AI applications. 

    A vector embedding model is responsible for the transformation of unstructured data (text, images, audio, video) into a vector of numbers that capture semantic similarity between data objects. Embedding models are widely used beyond RAG applications, including recommendation systems, search engines, databases, and other data processing systems. 

    Understanding their purpose, internals, advantages, and disadvantages is crucial and that’s what we’ll cover today. While we’ll be discussing text embedding models only, models for other types of unstructured data work similarly.

    What Is an Embedding Model?

    Machine learning models don’t work with text directly, they require numbers as input. Since text is ubiquitous, over time, the ML community developed many solutions that handle the conversion from text to numbers. There are many approaches of varying complexity, but we’ll review just some of them.

    A simple example is one-hot encoding: treat words of a text as categorical variables and map each word to a vector of 0s and single 1.

    image1

    Unfortunately, this embedding approach is not very practical, since it leads to a large number of unique categories and results in unmanageable dimensionality of output vectors in most practical cases. Also, one-hot encoding does not put similar vectors closer to one another in a vector space.

    Embedding models were invented to tackle these issues. Just like one-hot encoding, they take text as input and return vectors of numbers as output, but they are more complex as they are taught with supervised tasks, often using a neural network. A supervised task can be, for example, predicting product review sentiment score. In this case, the resulting embedding model would place reviews of similar sentiment closer to each other in a vector space. The choice of a supervised task is critical to producing relevant embeddings when building an embedding model.

    image2

    image3
    Word embeddings projected onto 2D axes

    On the diagram above we can see word embeddings only, but we often need more than that since human language is more complex than just many words put together. Semantics, word order, and other linguistic parameters should all be taken into account, which means we need to take it to the next level – sentence embedding models

    Sentence embeddings associate an input sentence with a vector of numbers, and, as expected, are way more complex internally since they have to capture more complex relationships.

    image4

    Thanks to progress in deep learning, all state-of-the-art embedding models are created with deep neural nets, since they better capture complex relationships inherent to a human language.

    A good embedding model should: 

    • Be fast since often it is just a preprocessing step in a larger application
    • Return vectors of manageable dimensions
    • Return vectors that capture enough information about similarity to be practical

    Let’s now quickly look into how most embedding models are organized internally.

    Modern Neural Networks Architecture

    As we just mentioned, all well-performing state-of-the-art embedding models are deep neural networks. 

    This is an actively developing field and most top performing models are associated with some novel architecture improvement. Let’s briefly cover two very important architectures: BERT and GPT.

    BERT (Bidirectional Encoder Representations from Transformers) was published in 2018 by researchers at Google and described the application of the bidirectional training of “transformer”, a popular attention model, to language modeling. Standard transformers include two separate mechanisms: an encoder for reading text input and a decoder that makes a prediction. 

    BERT uses an encoder that reads the entire sentence of words at once which allows the model to learn the context of a word based on all of its surroundings, left and right unlike legacy approaches that looked at a text sequence from left to right or right to left. Before feeding word sequences into BERT, some words are replaced with [MASK] tokens and then the model attempts to predict the original value of the masked words, based on the context provided by the other, non-masked words in the sequence.  

    Standard BERT does not perform very well in most benchmarks and BERT models require task-specific fine-tuning. But it is open-source, has been around since 2018, and has relatively modest system requirements (can be trained on a single medium-range GPU). As a result, it became very popular for many text-related tasks. It is fast, customizable, and small. For example, a very popular all-Mini-LM model is a modified version of BERT.

    GPT (Generative Pre-Trained Transformer) by OpenAI is different. Unlike BERT, It is unidirectional, i.e. text is processed in one direction and uses a decoder from a transformer architecture that is suitable for predicting the next word in a sequence. These models are slower and produce very high dimensional embeddings, but they usually have many more parameters, do not require fine-tuning, and are more applicable to many tasks out of the box. GPT is not open source and is available as a paid API.

    Context Length and Training Data

    Another important parameter of an embedding model is context length. Context length is the number of tokens a model can remember when working with a text. A longer context window allows the model to understand more complex relationships within a wider body of text. As a result, models can provide outputs of higher quality, e.g. capture semantic similarity better.

    To leverage a longer context, training data should include longer pieces of coherent text: books, articles, and so on. However, increasing context window length increases the complexity of a model and increases compute and memory requirements for training. 

    There are methods that help manage resource requirements e.g. approximate attention, but they do this at a cost to quality. That’s another trade-off that affects quality and costs: larger context lengths capture more complex relationships of a human language, but require more resources.

    Also, as always, the quality of training data is very important for all models. Embedding models are no exception. 

    Semantic Search and Information Retrieval

    Using embedding models for semantic search is a relatively new approach. For decades, people used other technologies: boolean models, latent semantic indexing (LSI), and various probabilistic models.

    Some of these approaches work reasonably well for many existing use cases and are still widely used in the industry. 

    One of the most popular traditional probabilistic models is BM25 (BM is “best matching”), a search relevance ranking function. It is used to estimate the relevance of a document to a search query and ranks documents based on the query terms from each indexed document. Only recently have embedding models started consistently outperforming it, but BM25 is still used a lot since it is simpler than using embedding models, it has lower computer requirements, and the results are explainable.

    Benchmarks

    Not every model type has a comprehensive evaluation approach that helps to choose an existing model. 

    Fortunately, text embedding models have common benchmark suites such as:

    The article “BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models” proposed a reference set of benchmarks and datasets for information retrieval tasks. The original BEIR benchmark consists of a set of 19 datasets and methods for search quality evaluation. Methods include: question-answering, fact-checking, and entity retrieval. Now anyone who releases a text embedding model for information retrieval tasks can run the benchmark and see how their model ranks against the competition.

    Massive Text Embedding Benchmarks include BEIR and other components that cover 58 datasets and 112 languages. The public leaderboard for MTEB results can be found here.

    These benchmarks have been run on a lot of existing models and their leaderboards are very useful to make an informed choice about model selection.

    Using Embedding Models in a Production Environment

    Benchmark scores on standard tasks are very important, but they represent only one dimension.

    When we use an embedding model for search, we run it twice:

    • When doing offline indexing of available data
    • When embedding a user query for a search request 

    There are two important consequences of this. 

    The first is that we have to reindex all existing data when we change or upgrade an embedding model. All systems built using embedding models should be designed with upgradability in mind because newer and better models are released all the time and, most of the time, upgrading a model is the easiest way to improve overall system performance. An embedding model is a less stable component of the system infrastructure in this case.

    The second consequence of using an embedding model for user queries is that the inference latency becomes very important when the number of users goes up. Model inference takes more time for better-performing models, especially if they require GPU to run: having latency higher than 100ms for a small query is not unheard of for models that have more than 1B parameters. It turns out that smaller, leaner models are still very important in a higher-load production scenario. 

    The tradeoff between quality and latency is real and we should always remember about it when choosing an embedding model.

    As we have mentioned above, embedding models help manage output vector dimensionality which affects the performance of many algorithms downstream. Generally the smaller the model, the shorter the output vector length, but, often, it is still too great for smaller models. That’s when we need to use dimensionality reduction algorithms such as PCA (principal component analysis), SNE / tSNE (stochastic neighbor embedding), and UMAP (uniform manifold approximation). 

    Another place we can use dimensionality reduction is before storing embeddings in a database. Resulting vector embeddings will occupy less space and retrieval speed will be faster, but will come at a price for the quality downstream. Vector databases are often not the primary storage, so embeddings can be regenerated with better precision from the original source data. Their use helps to reduce the output vector length and, as a result, makes the system faster and leaner.

    Making the Right Choice

    There’s an abundance of factors and trade-offs that should be considered when choosing an embedding model for a use case. The score of a potential model in common benchmarks is important, but we should not forget that it’s the larger models that have a better score. Larger models have higher inference time which can severely limit their use in low latency scenarios as often an embedding model is a pre-processing step in a larger pipeline. Also, larger models require GPUs to run. 

    If you intend to use a model in a low-latency scenario, it’s better to focus on latency first and then see which models with acceptable latency have the best-in-class performance. Also, when building a system with an embedding model you should plan for changes since better models are released all the time and often it’s the simplest way to improve the performance of your system.

    Closing the Generative AI Confidence Gap

    Discover how DataRobot helps you deliver real-world value with generative AI

    Learn more

    The post Choosing the Right Vector Embedding Model for Your Generative AI Use Case appeared first on DataRobot AI Platform.

    ]]>
    Anti-Money Laundering (AML) Alert Scoring https://www.datarobot.com/ai-accelerators/anti-money-laundering-aml-alert-scoring/ Thu, 22 Feb 2024 12:40:53 +0000 https://www.datarobot.com/?post_type=aiaccelerator&p=53640 Our primary goal with this accelerator is to develop a powerful predictive model that utilizes historical customer and transactional data, enabling us to identify suspicious activities and generate crucial Suspicious Activity Reports (SARs).

    The model will assign a suspicious activity score to future alerts, improving the effectiveness and efficiency of an AML compliance program by prioritizing alerts based on their ranking order according to the score.

    The post Anti-Money Laundering (AML) Alert Scoring appeared first on DataRobot AI Platform.

    ]]>

    The following outlines aspects of this use case.

    • Use case type: Anti-money laundering (false positive reduction)
    • Target audience: Data Scientist, Financial Crime Compliance Team
    • Desired outcomes:
      • Identify customer data and transaction activity indicative of a high risk for potential money laundering.
      • Detect anomalous changes in behavior or emerging money laundering patterns at an early stage.
      • Reduce the false positive rate for cases selected for manual review.
    • Metrics/KPIs:
      • Annual alert volume
      • Cost per alert
      • False positive reduction rate
    • Sample dataset

    A crucial aspect of an effective AML compliance program involves monitoring transactions to detect suspicious activity. This encompasses various types of transactions, such as deposits, withdrawals, fund transfers, purchases, merchant credits, and payments. Typically, monitoring begins with a rules-based system that scans customer transactions for signs of potential money laundering. When a transaction matches a predefined rule, an alert is generated, and the case is referred to the bank’s internal investigation team for manual review. If the investigators determine that the behavior is indicative of money laundering, a SAR is filed with FinCEN.

    However, the aforementioned standard transaction monitoring system has significant drawbacks. Most notably, the system’s rules-based and inflexible nature leads to a high rate of false positives, with as many as 90% of cases being incorrectly flagged as suspicious. This prevalence of false positives hampers investigators’ efficiency as they are required to manually filter out cases erroneously identified by the rules-based system.

    Financial institutions’ compliance teams may have hundreds or even thousands of investigators, and the current systems hinder their effectiveness and efficiency in conducting investigations. The cost of reviewing an alert ranges from $30 to $70. For a bank that receives 100,000 alerts per year, this amounts to a substantial sum. By reducing false positives, potential savings of $600,000 to $4.2 million per year can be achieved.

    Key takeaways:

    • Strategy/challenge: Facilitate investigators in focusing their attention on cases with the highest risk of money laundering, while minimizing time spent on reviewing false-positive cases.For banks dealing with a high volume of daily transactions, improving the effectiveness and efficiency of investigations ultimately leads to fewer unnoticed instances of money laundering. This enables banks to strengthen their regulatory compliance and reduce the prevalence of financial crimes within their network.
    • Business driver: Enhance the efficiency of AML transaction monitoring and reduce operational costs.By harnessing their capability to dynamically learn patterns in complex data, machine learning models greatly enhance the accuracy of predicting which cases will result in a SAR filing. Machine learning models for anti-money laundering can be integrated into the review process to score and rank new cases.
    • Model solution: Assign a suspicious activity score to each AML alert, thereby improving the efficiency of an AML compliance program.Any case exceeding a predetermined risk threshold is forwarded to investigators for manual review. Cases falling below the threshold can be automatically discarded or subject to a less intensive review. Once machine learning models are deployed in production, they can be continuously retrained using new data to detect novel money laundering behaviors, incorporating insights from investigator feedback.In particular, the model will employ rules that trigger an alert whenever a customer requests a refund of any amount. Small refund requests can be utilized by money launderers to test the refund mechanism or establish a pattern of regular refund requests for their account.

    Work with data

    The linked synthetic dataset illustrates a credit card company’s AML compliance program. Specifically the model is detecting the following money-laundering scenarios:

    • Customer spends on the card but overpays their credit card bill and seeks a cash refund for the difference.
    • Customer receives credits from a merchant without offsetting transactions and either spends the money or requests a cash refund from the bank.

    The unit of analysis in this dataset is an individual alert, meaning a rule-based engine is in place to produce an alert to detect potentially suspicious activity consistent with the above scenarios.

    Problem framing

    The target variable for this use case is whether or not the alert resulted in a SAR after manual review by investigators, making this a binary classification problem. The unit of analysis is an individual alert—the model will be built on the alert level—and each alert will receive a score ranging from 0 to 1. The score indicates the probability of the alert being a SAR.

    The goal of applying a model to this use case is to lower the false positive rate, which means resources are not spent reviewing cases that are eventually determined to not be suspicious after an investigation.

    In this use case, the False Positive Rate of the rules engine on the validation sample (1600 records) is:

    Number of SAR=0 divided by the total number of records = 1436/1600 = 90%.

    Data preparation

    Consider the following when working with data:

    • Define the scope of analysis: Collect alerts from a specific analytical window to start with; it’s recommended that you use 12–18 months of alerts for model building.
    • Define the target: Depending on the investigation processes, the target definition could be flexible. In this walkthrough, alerts are classified as Level1Level2Level3, and Level3-confirmed. These labels indicate at which level of the investigation the alert was closed (i.e., confirmed as a SAR). To create a binary target, treat Level3-confirmed as SAR (denoted by 1) and the remaining levels as non-SAR alerts (denoted by 0).
    • Consolidate information from multiple data sources: Below is a sample entity-relationship diagram indicating the relationship between the data tables used for this use case. 
    Consolidate information from multiple data sources

    Some features are static information—kyc_risk_score and state of residence for example—these can be fetched directly from the reference tables.

    For transaction behavior and payment history, the information will be derived from a specific time window prior to the alert generation date. This case uses 90 days as the time window to obtain the dynamic customer behavior, such as nbrPurchases90davgTxnSize90d, or totalSpend90d.

    Features and sample data

    The features in the sample dataset consist of KYC (Know-Your-Customer) information, demographic information, transactional behavior, and free-form text information from the customer service representatives’ notes. To apply this use case in your orgaization, your dataset should contain, minimally, the following features:

    • Alert ID
    • Binary classification target (SAR/no-SAR1/0True/False, etc.)
    • Date/time of the alert
    • “Know Your Customer” score used at time of account opening
    • Account tenure, in months
    • Total merchant credit in the last 90 days
    • Number of refund requests by the customer in the last 90 days
    • Total refund amount in the last 90 days

    Other helpful features to include are:

    • Annual income
    • Credit bureau score
    • Number of credit inquiries in the past year
    • Number of logins to the bank website in the last 90 days
    • Indicator that the customer owns a home
    • Maximum revolving line of credit
    • Number of purchases in the last 90 days
    • Total spend in the last 90 days
    • Number of payments in the last 90 days
    • Number of cash-like payments (e.g., money orders) in last 90 days
    • Total payment amount in last 90 days
    • Number of distinct merchants purchased at in the last 90 days
    • Customer Service Representative notes and codes based on conversations with customer (cumulative)

    Below is an example of one row in the training data after it is merged and aggregated (it is broken into multiple lines for a easier visualization). 

    Configure the Python client

    The DataRobot API offers a programmatic alternative to the web interface for creating and managing DataRobot projects. It can be accessed through REST or DataRobot’s Python and R clients, supporting Windows, UNIX, and OS X environments. To authenticate with DataRobot’s API, you will need an endpoint and token, as detailed in the documentation. Once you have configured your API credentials, endpoints, and environment, you can leverage the DataRobot API to perform the following actions:

    1. Upload a dataset.
    2. Train a model to learn from the dataset using the Informative Features feature list.
    3. Test prediction outcomes on the model using new data.
    4. Deploy the model.
    5. Predict outcomes using the deployed model and new data.

    Import libraries

    In [1]:
    
    # NOT required for Notebooks in DataRobot Workbench
    # *************************************************
    ! pip install datarobot --quiet
    # Upgrade DR to datarobot-3.2.0b0
    # ! pip uninstall datarobot --yes
    # ! pip install datarobot --pre
    
    ! pip install pandas --quiet
    ! pip install matplotlib --quiet
    
    import getpass
    
    import datarobot as dr
    
    endpoint = "https://app.eu.datarobot.com/api/v2"
    token = getpass.getpass()
    dr.Client(endpoint=endpoint, token=token)
    # *************************************************

    ········

    Out[1]:
    
    <datarobot.rest.RESTClientObject at 0x7fd37ba9fc40>
    
    In[2]:
    
    import datetime as datetime
    import os
    
    import datarobot as dr
    import matplotlib.pyplot as plt
    import pandas as pd
    
    params = {"axes.titlesize": "8", "xtick.labelsize": "5", "ytick.labelsize": "6"}
    plt.rcParams.update(params)

    Analyze, clean, and curate data

    Preparing data is an iterative process. Even if you have already cleaned and prepped your training data before uploading it, you can further enhance its quality by performing Exploratory Data Analysis (EDA).

    In [3]:
    
    # Load the training dataset
    df = pd.read_csv(
        "https://s3.amazonaws.com/datarobot-use-case-datasets/DR_Demo_AML_Alert_train.csv",
        encoding="ISO-8859-1",
    )
    df.info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 10000 entries, 0 to 9999
    Data columns (total 31 columns):
     #   Column                            Non-Null Count  Dtype  
    ---  ------                            --------------  -----  
     0   ALERT                             10000 non-null  int64  
     1   SAR                               10000 non-null  int64  
     2   kycRiskScore                      10000 non-null  int64  
     3   income                            9800 non-null   float64
     4   tenureMonths                      10000 non-null  int64  
     5   creditScore                       10000 non-null  int64  
     6   state                             10000 non-null  object 
     7   nbrPurchases90d                   10000 non-null  int64  
     8   avgTxnSize90d                     10000 non-null  float64
     9   totalSpend90d                     10000 non-null  float64
     10  csrNotes                          10000 non-null  object 
     11  nbrDistinctMerch90d               10000 non-null  int64  
     12  nbrMerchCredits90d                10000 non-null  int64  
     13  nbrMerchCreditsRndDollarAmt90d    10000 non-null  int64  
     14  totalMerchCred90d                 10000 non-null  float64
     15  nbrMerchCreditsWoOffsettingPurch  10000 non-null  int64  
     16  nbrPayments90d                    10000 non-null  int64  
     17  totalPaymentAmt90d                10000 non-null  float64
     18  overpaymentAmt90d                 10000 non-null  float64
     19  overpaymentInd90d                 10000 non-null  int64  
     20  nbrCustReqRefunds90d              10000 non-null  int64  
     21  indCustReqRefund90d               10000 non-null  int64  
     22  totalRefundsToCust90d             10000 non-null  float64
     23  nbrPaymentsCashLike90d            10000 non-null  int64  
     24  maxRevolveLine                    10000 non-null  int64  
     25  indOwnsHome                       10000 non-null  int64  
     26  nbrInquiries1y                    10000 non-null  int64  
     27  nbrCollections3y                  10000 non-null  int64  
     28  nbrWebLogins90d                   10000 non-null  int64  
     29  nbrPointRed90d                    10000 non-null  int64  
     30  PEP                               10000 non-null  int64  
    dtypes: float64(7), int64(22), object(2)
    memory usage: 2.4+ MB

    The sample data contains the following features:

    1. ALERT: Alert Indicator
    2. SAR: Target variable, SAR Indicator
    3. kycRiskScore: Account relationship (Know Your Customer) score used at time of account opening
    4. income: Annual income
    5. tenureMonths: Account tenure in months
    6. creditScore: Credit bureau score
    7. state: Account billing address state
    8. nbrPurchases90d: Number of purchases in last 90 days
    9. avgTxnSize90d: Average transaction size in last 90 days
    10. totalSpend90d: Total spend in last 90 days
    11. csrNotes: Customer Service Representative notes and codes based on conversations with customer
    12. nbrDistinctMerch90d: Number of distinct merchants purchased at in last 90 days
    13. nbrMerchCredits90d: Number of credits from merchants in last 90 days
    14. nbrMerchCreditsRndDollarAmt90d: Number of credits from merchants in round dollar amounts in last 90 days
    15. totalMerchCred90d: Total merchant credit amount in last 90 days
    16. nbrMerchCreditsWoOffsettingPurch: Number of merchant credits without an offsetting purchase in last 90 days
    17. nbrPayments90d: Number of payments in last 90 days
    18. totalPaymentAmt90d: Total payment amount in last 90 days
    19. overpaymentAmt90d: Total amount overpaid in last 90 days
    20. overpaymentInd90d: Indicator that account was overpaid in last 90 days
    21. nbrCustReqRefunds90d: Number refund requests by the customer in last 90 days
    22. indCustReqRefund90d: Indicator that customer requested a refund in last 90 days
    23. totalRefundsToCust90d: Total refund amount in last 90 days
    24. nbrPaymentsCashLike90d: Number of cash-like payments (e.g., money orders) in last 90 days
    25. maxRevolveLine: Maximum revolving line of credit
    26. indOwnsHome: Indicator that the customer owns a home
    27. nbrInquiries1y: Number of credit inquiries in the past year
    28. nbrCollections3y: Number of collections in the past year
    29. nbrWebLogins90d: Number of logins to the bank website in the last 90 days
    30. nbrPointRed90d: Number of loyalty point redemptions in the last 90 days
    31. PEP: Politically Exposed Person indicator
    In [4]:
    
    # Upload a dataset
    ct = datetime.datetime.now()
    file_name = f"AML_Alert_train_{int(ct.timestamp())}.csv"
    dataset = dr.Dataset.create_from_in_memory_data(data_frame=df, fname=file_name)
    dataset
    Out [4]:
    
    Dataset(name='AML_Alert_train_1687350171.csv', id='6492eb9c1e1e2e52c305e3ca')
    

    While a dataset is being registered in Workbench, DataRobot also performs EDA1 analysis and profiling for every feature to detect feature types, automatically transform date-type features, and assess feature quality. Once registration is complete, you can view the exploratory data insights uncovered while computing EDA1, as detailed in the documentation.

    Based on the exploratory data insights above, you can draw the following quick observations:

    1. The entire population of interest comprises only alerts, which aligns with the problem’s focus.
    2. The false positive alerts (SAR=0) account for approximately 90%, which is typical for AML problems.
    3. Some features, such as PEP, do not offer any useful information as they consist entirely of zeroes or have a single value.
    4. Certain features, like nbrPaymentsCashLike90d, exhibit signs of zero inflation.
    5. There is potential to convert certain numerical features, such as indOwnsHome, into categorical features.

    Additionally, DataRobot automatically detects and addresses common data quality issues with minimal or no user intervention. For instance, a binary column is automatically added within a blueprint to flag rows with excess zeros. This allows the model to capture potential patterns related to abnormal values. No further user action is required.

    Create and manage experiments

    Experiments are the individual “projects” within a Use Case. They allow you to vary data, targets, and modeling settings to find the optimal models to solve your business problem. Within each experiment, you have access to its Leaderboard and model insights, as well as experiment summary information.

    In [5]:
    
    # Create a new project based on a dataset
    ct = datetime.datetime.now()
    project_name = f"Anti Money Laundering Alert Scoring_{int(ct.timestamp())}"
    project = dataset.create_project(project_name=project_name)
    print(
        f"""Project Details
    Project URL: {project.get_uri()}
    Project ID: {project.id}
    Project Name: {project.project_name}
        """
    )
    Project Details
    Project URL: https://app.eu.datarobot.com/projects/6492ebd2b83ed3cc6ec5bb2e/models
    Project ID: 6492ebd2b83ed3cc6ec5bb2e
    Project Name: Anti Money Laundering Alert Scoring_1687350226

    Start modeling

    In [6]:
    
    # Select modeling parameters and start the modeling process
    project.analyze_and_model(target="SAR", mode=dr.AUTOPILOT_MODE.QUICK, worker_count="-1")
    
    project.wait_for_autopilot(check_interval=20.0, timeout=86400, verbosity=0)

    Evaluate experiments

    As you proceed with modeling, Workbench generates a model Leaderboard, a ranked list of models that facilitates quick evaluation. The models on the Leaderboard are ranked based on the selected optimization metrics, such as LogLoss in this case.

    Autopilot, DataRobot’s “survival of the fittest” modeling mode, automatically selects the most suitable predictive models for the specified target feature and trains them with increasing sample sizes. Autopilot not only identifies the best-performing models but also recommends a model that excels at predicting the target feature SAR. The model selection process considers a balance of accuracy, metric performance, and model simplicity. For a detailed understanding, please refer to the Model recommendation process Model recommendation process description.

    Within the Leaderboard, you can click on a specific model to access visualizations for further exploration, as outlined in the documentation.

    download 5

    Lift Chart

    The Lift Chart above shows how effective the model is at separating the SAR and non-SAR alerts. After an alert in the out-of-sample partition gets scored by the trained model, it will be assigned with a risk score that measures the likelihood of the alert being a SAR risk, or becoming a SAR.

    In the Lift Chart, alerts are sorted based on the SAR risk, broken down into 10 deciles, and displayed from lowest to the highest. For each decile, DataRobot computes the average predicted SAR risk (blue plus) as well as the average actual SAR event (orange circle) and depicts the two lines together. For the recommended model built for this false positive reduction use case, the SAR rate of the top decile is about 65%, which is a significant lift from the ~10% SAR rate in the training data. The top three deciles capture almost all SARs, which means that the 70% of alerts with very low predicted SAR risk rarely result in a SAR.

    ROC Curve

    Once you have confidence that the model is performing well, select an explicit threshold to make a binary decision based on the continuous SAR risk predicted by DataRobot. To pick up the optimal threshold, there are three important criteria:

    1. The false negative rate has to be as small as possible. False negatives are the alerts that DataRobot determines are not SARs which then turn out to be true SARs. Missing a true SAR is very dangerous and would potentially result in an MRA (matter requiring attention) or regulatory fine. This example takes a conservative approach to have a 0 false negative rate, meaning all true SARs are captured. To achieve this, the threshold has to be low enough to capture all the SARs.
    2. Keep the alert volume as low as possible to reduce enough false positives. In this context, all alerts generated in the past that are not SARs are the de-facto false positives; the machine learning model is likely to assign a lower score to those non-SAR alerts. Therefore, pick a high enough threshold to reduce as many false positive alerts as possible.
    3. Ensure the selected threshold is not only working on the seen data, but also on the unseen data. This is required so that when the model is deployed to the transaction monitoring system for on-going scoring, it can still reduce false positives without missing any SARs.

    From experimenting with different choices of thresholds using the cross-validation data (the data used for model training and validation), it seems that 0.03 is the optimal threshold since it satisfies the first two criteria. On one hand, the false negative rate is 0; on the other hand, the alert volume is reduced from 8000 to 2098 (False Positive + True Positive), meaning the number of investigations are reduced by 73% (5902/8000) without missing any SARs.

    For the third criterion—setting the threshold to work on unseen alerts—you can quickly validate it in DataRobot. By changing the Data Selection dropdown to Holdout, and applying the same threshold (0.03), the false negative rate remains 0 and the reduction in investigations is still 73% (1464/2000). This proves that the model generalizes well and will perform as expected on unseen data.

    ROC curve

    Model insights

    DataRobot offers a comprehensive suite of powerful tools and features designed to facilitate the interpretation, explanation, and validation of the factors influencing a model’s predictions. One such tool is Feature Impact, which provides a high-level visualization that identifies the features that have the strongest influence on the model’s decisions. A large impact indicates that removing this feature would significantly deteriorate the model’s performance. On the other hand, features with lower impact may have relatively less importance individually but can still contribute to the overall predictive power of the model.

    Predict and deploy

    Once you identify the model that best learns patterns in your data to predict SARs, DataRobot makes it easy to deploy the model into your alert investigation process. This is a critical step for implementing the use case, as it ensures that predictions are used in the real world to reduce false positives and improve efficiency in the investigation process. The following sections describe activities related to preparing and then deploying a model.

    The following applications of the alert-prioritization score from the false positive reduction model both automate and augment the existing rule-based transaction monitoring system.

    • If the FCC (Financial Crime Compliance) team is comfortable with removing the low-risk alerts (very low prioritization score) from the scope of investigation, then the binary threshold selected during the model building stage will be used as the cutoff to remove those no-risk alerts. The investigation team will only investigate alerts above the cutoff, which will still capture all the SARs based on what was learned from the historical data.
    • Often regulatory agencies will consider auto-closure or auto-removal as an aggressive treatment to production alerts. If auto-closing is not the ideal way to use the model output, the alert prioritization score can still be used to triage alerts into different investigation processes, hence improving the operational efficiency.

    See the deep dive at the end of this use case for information on decision process considerations.

    You can use the following code to return the Recommended for Deployment model to use for model predictions.

    In [7]:
    
    model = dr.ModelRecommendation.get(project.id).get_model()
    model
    Out [7]:
    
    Model('RandomForest Classifier (Gini)')

    Compute predictions before deployment

    By uploading an external dataset, you can ensure consistent performance in production prior to deployment. This new data will need to have the same transformations applied to the training data.

    You can use the UI and follow the five steps of the workflow for testing predictions. When predictions are complete, you can save prediction results to a CSV file.

    With the following code, you can obtain more detailed results including predictions, probability of class_1 (positive_probability), probability of class_0 (autogenerated), actual values of the target (SAR), and all features. Furthermore, you can compute Prediction Explanations on this external dataset (which was not part of training data).

    In [10]:
    
    # Load an alert dataset for predictions
    df_score = pd.read_csv(
        "https://s3.amazonaws.com/datarobot-use-case-datasets/DR_Demo_AML_Alert_pred.csv",
        encoding="ISO-8859-1",
    )
    
    # Get the recommended model
    model_rec = dr.ModelRecommendation.get(project.id).get_model()
    model_rec.set_prediction_threshold(0.03)
    
    # Upload a scoring data set to DataRobot
    prediction_dataset = project.upload_dataset(df_score.drop("SAR", axis=1))
    predict_job = model_rec.request_predictions(prediction_dataset.id)
    
    # Make predictions
    predictions = predict_job.get_result_when_complete()
    
    # Display prediction results
    results = pd.concat(
        [predictions.drop("row_id", axis=1), df_score.drop("ALERT", axis=1)], axis=1
    )
    results.head()
    01234
    prediction00011
    positive_probability0000.1209180.407422
    prediction_threshold0.030.030.030.030.03
    class_0.01110.8790820.592578
    class_1.00000.1209180.407422
    SAR00011
    kycRiskScore23322
    income54400100100598004110052100
    tenureMonths1437053
    creditScore681702681718704
    indCustReqRefund90d11111
    totalRefundsToCust90d30.8665.7225.342828.512778.84
    nbrPaymentsCashLike90d00024
    maxRevolveLine1000013000140001500011000
    indOwnsHome00111
    nbrInquiries1y41433
    nbrCollections3y00000
    nbrWebLogins90d86863
    nbrPointRed90d20112
    PEP00000

    Look at the results above. Since this is a binary classification problem:

    • As the positive_probability approaches zero, the row is a stronger candidate for class_0 with prediction value of 0 (the alert is not SAR).
    • As positive_probability approaches one, the outcome is more likely to be of class_1 with prediction value of 1 (the alert is SAR).

    From the KDE (Kernel Density Estimate) plot below, you can see that this sample of the data is weighted more strongly toward class_0 (the alert is not SAR); the Probability Density for predictions is close to actuals.

    In [11]:
    
    plt_kde = results[["positive_probability", "SAR"]].plot.kde(
        xlim=(0, 1), title="Prediction Distribution"
    )
    prediction distribution
    In [12]:
    
    # Prepare Prediction Explanations
    pe_init = dr.PredictionExplanationsInitialization.create(project.id, model_rec.id)
    pe_init.wait_for_completion()

    Computing Prediction Explanations is a resource-intensive task. You can set a maximum number of explanations per row and also configure prediction value thresholds to speed up the process.

    Considering the prediction distribution above, set the threshold_low to 0.2 and threshold_high to 0.5. This will provide Prediction Explanations only for those extreme predictions where positive_probability is lower than 0.2 or higher than 0.5.

    In [13]:
    
    # Compute Prediction Explanations with a custom config
    number_of_explanations = 3
    pe_comput = dr.PredictionExplanations.create(
        project.id,
        model_rec.id,
        prediction_dataset.id,
        max_explanations=number_of_explanations,
        threshold_low=0.2,
        threshold_high=0.5,
    )
    pe_result = pe_comput.get_result_when_complete()
    explanations = pe_result.get_all_as_dataframe().drop("row_id", axis=1).dropna()
    display(explanations.head())
    01235
    prediction00010
    class_0_label00000
    class_0_probability1110.8790820.98379
    class_1_label11111
    class_1_probability0000.1209180.01621
    explanation_0_featuretotalSpend90davgTxnSize90dtotalSpend90dnbrCustReqRefunds90davgTxnSize90d
    explanation_0_feature_value216.2514.92488.88495.55
    explanation_0_label11111
    explanation_0_qualitative_strength
    explanation_0_strength-3.210206-3.834376-3.20076-1.514812-0.402981
    explanation_1_featurenbrPaymentsCashLike90dtotalSpend90dnbrPaymentsCashLike90dcsrNotescsrNotes
    explanation_1_feature_value0775.840billing address plastic replace moneyordercustomer call statement moneyorder
    explanation_1_label11111
    explanation_1_qualitative_strength++
    explanation_1_strength-2.971257-3.261914-3.031098-0.7084360.390769
    explanation_2_featurecsrNotestotalMerchCred90dtotalMerchCred90davgTxnSize90dnbrPurchases90d
    explanation_2_feature_valuecard replace statement customer call statement80.471715.6196
    explanation_2_label11111
    explanation_2_qualitative_strength
    explanation_2_strength-2.819563-2.982999-2.990864-0.141831-0.329526

    The following code lets you see how often various features are showing up as the top explanation for impacting the probability of SAR.

    In [14]:
    
    from functools import reduce
    
    # Create a combined histogram of all the explanations
    explanations_hist = reduce(
        lambda x, y: x.add(y, fill_value=0),
        (
            explanations["explanation_{}_feature".format(i)].value_counts()
            for i in range(number_of_explanations)
        ),
    )
    
    plt_expl = explanations_hist.plot.bar()
    download 6

    Having seen the model’s Feature Impact insight earlier, the high occurrence of totalSpend90doverPaymentAmt90d, and totalMerchCred90d as Prediction Explanations is not entirely surprising. These were some of the top-ranked features in the impact chart.

    Deploy a model and monitor performance

    The DataRobot platform offers a wide variety of deployment methods, among which the most direct route is deploying a model from the Leaderboard. When you create a deployment from the Leaderboard, DataRobot automatically creates a model package for the deployed model. You can access the model package at any time in the Model Registry. For more details, see the documentation for deploying from the Leaderboard. The programmatic alternative to create deployments can be implemented by the code below.

    DataRobot will continuously monitor the model deployed on the dedicated prediction server. With DataRobot MLOps, the modeling team can monitor and manage the alert prioritization model by tracking the distribution drift of the input features as well as the performance deprecation over time.

    In [15]:
    
    pred_serv_id = dr.PredictionServer.list()[0].id
    deployment = dr.Deployment.create_from_learning_model(
        model_id=model_rec.id,
        label="Anti Money Laundering Alert Scoring",
        description="Anti Money Laundering Alert Scoring",
        default_prediction_server_id=pred_serv_id,
    )
    deployment
    Out [15]:
    
    Deployment(Anti Money Laundering Alert Scoring)
    

    When you select a deployment from the Deployments Inventory, DataRobot opens to the Overview page for that deployment, which provides a model and environment specific summary that describes the deployment, including the information you supplied when creating the deployment and any model replacement activity.

    The Service Health tab tracks metrics about a deployment’s ability to respond to prediction requests quickly and reliably. This helps identify bottlenecks and assess capacity, which is critical to proper provisioning.

    The Data Drift tab provides interactive and exportable visualizations that help identify the health of a deployed model over a specified time interval.

    Implementation risks

    When operationalizing this use case, consider the following, which may impact outcomes and require model re-evaluation:

    • Change in the transactional behavior of the money launderers.
    • Novel information introduced to the transaction, and customer records that are not seen by the machine learning models.

    Deep dive: Imbalanced targets

    In AML and Transaction Monitoring, the SAR rate is usually very low (1%–5%, depending on the detection scenarios); sometimes it could be even lower than 1% in extremely unproductive scenarios. In machine learning, such a problem is called class imbalance. The question becomes, how can you mitigate the risk of class imbalance and let the machine learn as much as possible from the limited known-suspicious activities?

    DataRobot offers different techniques to handle class imbalance problems. Some techniques:

    • Evaluate the model with different metrics. For binary classification (the false positive reduction model here, for example), LogLoss is used as the default metric to rank models on the Leaderboard. Since the rule-based system is often unproductive, which leads to very low SAR rate, it’s reasonable to take a look at a different metric, such as the SAR rate in the top 5% of alerts in the prioritization list. The objective of the model is to assign a higher prioritization score with a high risk alert, so it’s ideal to have a higher rate of SAR in the top tier of the prioritization score. In the example shown in the image below, the SAR rate in the top 5% of prioritization score is more than 70% (original SAR rate is less than 10%), which indicates that the model is very effective in ranking the alert based on the SAR risk.
    • DataRobot also provides flexibility for modelers when tuning hyperparameters which could also help with the class imbalance problem. In the example below, the Random Forest Classifier is tuned by enabling the balance_boostrap (random sample equal amount of SAR and non-SAR alerts in each decision trees in the forest); you can see the validation score of the new ‘Balanced Random Forest Classifier’ model is slightly better than the parent model.
    • You can also use Smart Downsampling (from the Advanced Options tab) to intentionally downsample the majority class (i.e., non-SAR alerts) in order to build faster models with similar accuracy.

    Deep Dive: Decision process

    A review process typically consists of a deep-dive analysis by investigators. The data related to the case is made available for review so that the investigators can develop a 360-degree view of the customer, including their profile, demographic, and transaction history. Additional data from third-party data providers, and web crawling, can supplement this information to complete the picture.

    For transactions that do not get auto-closed or auto-removed, the model can help the compliance team create a more effective and efficient review process by triaging their reviews. The predictions and their explanations also give investigators a more holistic view when assessing cases.

    Risk-based alert triage

    Based on the prioritization score, the investigation team could take different investigation strategies. For example:

    • No-risk or low-risk alerts can be reviewed on a quarterly basis, instead of monthly. The frequently alerted entities without any SAR risk can then be reviewed once every three months, which will significantly reduce the time of investigation.
    • High-risk alerts with higher prioritization scores can have their investigation fast-tracked to the final stage in the alert escalation path. This will significantly reduce the effort spent on level 1 and level 2 investigation.
    • Medium-risk alerts can use standard investigation process.

    Smart alert assignment

    For an alert investigation team that is geographically dispersed, the alert prioritization score can be used to assign alerts to different teams in a more effective manner. High-risk alerts can be assigned to the team with the most experienced investigators while low risk alerts can be handled by a less experienced team. This mitigates the risk of missing suspicious activities due to lack of competency with alert investigations.

    For both approaches, the definition of high/medium/low risk could be either a set of hard thresholds (for example, High: score>=0.5, Medium: 0.5>score>=0.3, Low: score<0.3), or based on the percentile of the alert scores on a monthly basis (for exxample, High: above 80th percentile, Medium: between 50th and 80th percentile, Low: below 50th percentile).

    Get Started with Free Trial

    Experience new features and capabilities previously only available in our full AI Platform product.

    The post Anti-Money Laundering (AML) Alert Scoring appeared first on DataRobot AI Platform.

    ]]>
    Tackling Churn with AI – Before Modelling https://www.datarobot.com/ai-accelerators/tackling-churn-with-ai-before-modelling/ Wed, 21 Feb 2024 14:57:11 +0000 https://www.datarobot.com/?post_type=aiaccelerator&p=53597 This accelerator will teach the problem framing and data management steps required before modelling begins. We will use two examples to illustrate concepts: a B2C retail example, and a B2B example based on DataRobot’s internal churn model.

    The post Tackling Churn with AI – Before Modelling appeared first on DataRobot AI Platform.

    ]]>
    Customer retention is central to any successful business and machine learning is frequently proposed as a way of addressing churn. It is tempting to dive right into a churn dataset, but improving outcomes requires correctly framing the problem. Doing so at the start will determine whether the business can take action based on the trained model and whether your hard work is valuable or not.

    This blog will teach the problem framing and data management steps required before modelling begins. We will use two examples to illustrate concepts: a B2C retail example and a B2B example based on DataRobot’s internal churn model.

    One of the fundamental misconceptions about modelling churn is that a good churn model will reduce churn. Even an excellent model will have no impact on churn by itself. It will just correctly identify at risk customers. It is the consumers of the model’s predictions who take action to retain customers. In fact, if a churn model perfectly predicts which customers will leave, it means the interventions had no impact on customer retention.

    Sometimes these interventions can be automated, like triggering an email with a discount. Often it is a person who decides whether and how to intervene. This means that as we build a churn model, we need our end users to trust the model and consider its recommendations in their actions. Keeping our end users in mind is a theme that will be present throughout our blog series as we demonstrate how to build a useful churn model.

    Problem Framing

    To frame this problem, we need to identify stakeholders, create three business definitions, and then decide on the consumption plan. Stakeholders will help with every section, which is why we need to identify them first.

    Identify stakeholders

    Because our end users are so critical to the success of the churn modelling project, it is important to identify the right people. To identify them, you can ask: who cares about this? Who is responsible for reducing churn? Who will take action once the model identifies a high-risk customer? Bring these stakeholders in early. They need to trust the results, or else they might ignore them. Their feedback can often provide ideas for feature engineering, improve data quality, and make the model actionable for the business.

    Define churn

    The next step is to define churn, which your stakeholders can help do. This tends to be segmented by the business model. Note that within a company, it is possible to have multiple business models for different revenue streams, so your definition of churn may need to vary by product or service offering!

    For subscription-based business models, this is typically whether a customer renews their subscription. In our B2B example, customers typically have annual subscriptions, so our churn definition uses whether they renew their contract. Revenue could also be a factor in this definition, where any downsell might also be considered churn.

    In retail-like business models, where customers make individual purchases (not limited to retail businesses, this applies to other industries too!), this is typically related to whether a customer makes a purchase in some window of time. In our B2C example, churn is defined as a customer who does not make any purchases in the next 3 months. This could be the next 6 months, or the next 30 days, etc. It also could use a revenue threshold, where a customer purchases at least $50 worth of products or services. Ultimately your stakeholders will have the best idea on what definition will be most valuable to the business, which is why it’s important that they sign off on this definition.

    Population

    We also need to define our population, which will impact who we end up training and making predictions on. Determining the boundaries of this population requires understanding the business goals and listening to stakeholders. There may be a specific group of customers that the business cares about retaining, such as mid-sized and large companies. Maybe there is one product that is particularly important. Or there could be no differentiation, and the business wants to predict on all customers. Ultimately, your stakeholders will make this decision, which is why it’s important to discuss this with them.

    In the B2C example, the focus is on retaining customers who had placed an order in the previous 3 months. This was a commonly used definition for various metrics throughout the business. Using a common definition pays dividends down the churn-modelling road because the model will align with existing processes and analyses.

    In the B2B example, the population of interest started as customers with a managed cloud subscription. We used this restriction because we had more data available for those customers. Over time, this definition was expanded to include all other customers. This approach allowed us to solve for the easier population first and prove value quickly. Then we could address the population which is harder to predict.

    Prediction point

    Finally, we need to define our prediction point. This is the time at which we will make predictions about churn. If you plan to operationalize your model (make predictions on new customers), it is important this aligns to the time at which you make predictions in production. Remember that the end goal is to prevent churn, so these prediction points need to be early enough for your stakeholders to intervene and prevent churn. These prediction points should also be spaced out far enough that there is a meaningful chance for churn risk to change. If your customers typically make one purchase a week, then making a new prediction every day is unlikely to add much value beyond a model which predicts once a week. The simpler your model is, the easier it will be to build and consume!

    In the B2C example, the model is used to make predictions every month. The prediction point is the first day of each month and on that date the model predicts the probability that each customer in the population of interest (those who had placed an order in the preceding three months) will not place an order in the following three months, and therefore will churn.

    In the B2B example, the prediction point is every four weeks, up to 36 weeks prior to the renewal date. This gives one prediction every month for the 9 months before renewal.

    Model consumption strategy

    Before diving into the data, it is important to have line of sight into how the business will use the churn model. This will impact modeling choices, such as how strictly we need to follow the prediction point and whether we can include features which would be difficult to use in production.

    One method of consumption, and a good first objective, is to surface insights in order to reduce churn across the entire customer base. As with most data science projects, it makes sense to begin with exploration. Often a thorough understanding of the problem immediately surfaces potential solutions. A good model will present insights which might, for example, uncover regular churn patterns. Presented to relevant stakeholders, these insights may lead to suggested changes to the product in order to divert customers from those patterns. In this way, model insights can be useful to understanding and addressing churn at an aggregate level. This approach is easier and faster to implement, but likely will have more limited ROI, as it does not provide individual churn predictions for each customer.

    Second is operationalizing the model to make new predictions in production. This gives each customer their own churn risk and allows end users to prioritize interventions for those which are more likely to churn. For this to be actionable, concrete and cost-effective churn prevention actions are needed. Can we add customer support to the account? Can we offer a promotion? This is why talking to stakeholders at the beginning is important. It teaches us what interventions are possible to prevent churn.

    Data Management

    With a firm understanding of the problem, we can begin building our training dataset. The first step is to set our prediction point and sampling strategy.

    Prediction point and sampling

    The most common mistake at this stage is to accidentally train the model on data from after the prediction point. This leads to look-ahead bias. A model trained on data from after the prediction point will have lower accuracy in production than it did in validation, because it no longer has access to data from the future (relative to the prediction point). This is why the first step is to create the relevant prediction points for each customer. For example, the B2B example uses a prediction point of every 4 weeks leading up to the renewal date, up to 36 weeks (9 months) prior to the renewal date. The SQL code below shows an example of how you can create each row in the dataset using this framing.

    In [ ]:
    
    #| code-fold: true
    #| code-summary: "Show code"
    #| output: false
    
    with weeks as (
        select 
            row_number() over (order by seq4()) * 4 as n
        from table(generator(rowcount => 9))
    )
    select
        r.opportunity_id,
        r.renewal_week,
        dateadd('week', -weeks.n, r.renewal_week) as pred_point,
        weeks.n as weeks_to_renewal
    from renewals as r
    cross join weeks
    

    If the dataset is large enough, the training dataset can be reduced to one row per customer. This is the recommended approach, as it will make partitioning easier and ensure each customer is equally weighted in the dataset. In this case, we randomly choose a valid prediction point for each customer. Using the B2B example, we would randomly choose one of the 9 months for each customer.

    In the B2C example, the prediction point is the start of every month. We chose to keep multiple rows per customer, as the dataset was not large enough to develop confident models without them. When using multiple rows per customer, it is important to either use grouped partitioning (grouping on customers such that all rows from one customer are in the same partition) or Out-of-Time Validation. This prevents leakage across the partitions, where a model can learn a specific customer’s behavior.

    Target creation

    Now we can pull in our definition of churn to create the target. Remember to use the definition relative to the prediction point. In the B2C example, the target is whether the customer made any purchases in the next quarter.

    In [ ]:
    
    #| code-fold: true
    #| code-summary: "Show code"
    #| output: false
    
    with customers as (
        select 
            customer_id,
            min(event_date) as first_purchase
        from events
        group by 1
    ),
    customer_months as (
        select
            c.customer_id,
            dc.date_actual as month_start
        from customers as c
        cross join daily_calendar as dc
        where dc.date_actual = dc.first_day_of_month
            and dc.date_actual > c.first_purchase
    ),
    customer_monthly_purchases as (
        select
            c.customer_id,
            c.month_start,
            count(e.id) as monthly_number_of_purchases
        from customer_months as c
        left join events as e on c.customer_id = e.customer_id
            and c.month_start = date_trunc('month', e.event_date)::date
        where c.month_start < current_date - interval '3 months'
        group by 1, 2 
    ),
    base_table as (
        select 
            customer_id, 
            month_start as pred_point,
            sum(monthly_number_of_purchases) over (
                partition by customer_id 
                order by month_start 
                rows between 3 preceding and 1 preceding
            ) as number_of_purchases_last_3_months,
            sum(monthly_number_of_purchases) over (
                partition by customer_id 
                order by month_start 
                rows between current row and 2 following
            ) as number_of_purchases_next_3_months,
        (number_of_purchases_next_3_months = 0)::int as churn
        from customer_monthly_purchases
    )
    select customer_id, pred_point, churn
    from base_table
    where number_of_purchases_last_3_months > 0
    limit 5;
    max sql output

    This code generates the primary dataset with CHURN=1 if the customer did not place an order in the upcoming three months, and 0 if they did. Including PREDICTION_POINT in this primary table is important because often the training dataset will be comprised of multiple prediction points. This is useful both to increase the size of the training dataset, as well as to help the model account for seasonality. DataRobot feature engineering will also rely on the PREDICTION_POINT field to avoid look-ahead bias.

    The B2B model was set up to predict whether a customer would sign a renewal on their renewal date. Again, creating prediction points was necessary to avoid look-ahead bias just like in the B2C example. In this case, though, predictions would be made more frequently and always in reference to that renewal date, e.g. 4 weeks from renewal or 32 weeks out. This way the model could be trained on how different features impact churn probability at different times in the customer lifecycle.

    Data sources

    It is not always obvious what data will be predictive of churn, so exploring multiple datasets is worthwhile. Data on product/services consumption are important. Some other datasets to consider are purchase history, customer demographic data, customer surveys, and interactions with customer support. In the B2C example, we used data on customer reviews as well as refunds issued.

    One way to uncover valuable insights is to include data on actions controllable by the business. If a promotion or a specific marketing campaign turns out to be predictive of churn or retention, that is a quick action item to share with stakeholders. Just ensure these actions were taken before the prediction point, rather than in response to a perceived churn risk.

    Listen to your stakeholders about their beliefs on what drives churn or retention and include that data when it is available. This can go a long way towards building their trust in the model. If your model validates their beliefs, it shows evidence that it is learning relevant behavior. On the contrary, if the model refutes one of their beliefs, this can spur a conversation about it. There might be bad data in your dataset, or maybe the feature you created does not accurately represent what they think is a driver. It could also be proof that their belief is wrong, which can foster a deeper understanding of churn risk at the company. These discussions and further data investigation are the key to finding out why.

    At the end of the day, start with whatever data is easily accessible and build models with that. Showing value to the business quickly is more important than exploring every dataset possible.

    Feature engineering

    Merging all of your disparate data into one table may sound daunting. DataRobot’s automated feature engineering can help in a number of ways. DataRobot feature engineering accelerates data preparation for churn modelling by joining data from disparate datasets, automatically generating a wide variety of features across these datasets, and removing features that have little/no relation to churn. Crucially, it also helps avoid the aforementioned look-ahead bias. DataRobot makes use of time-aware feature engineering to ensure we avoid this.

    If you prefer to build the dataset outside of DataRobot, make sure your joins are aware of the prediction point, not just the customer ID. In the B2B example, we made heavy use of window functions to create features over a specific period of time. For example, we can join a usage table once but create multiple feature derivation windows, such as number of projects created in the last 4 weeks, last 12 weeks, etc. The SQL below demonstrates how to do this.

    In [ ]:
    
    #| code-fold: true
    #| code-summary: "Show code"
    #| output: false
    
    with weekly_usage_data as (
           select
                  a.account_id,
                  date_trunc('week', c.date_actual)::date as week_start,
                  sum(u.projects_created) as projects_created
           from accounts as a
           inner join daily_calendar as c on a.customer_since_date <= c.date_actual
                  and current_date >= c.date_actual
           left join usage_data as u on a.account_id = u.account_id
                  and c.date_actual = u.activity_date
           group by 1, 2
    )
    select
           account_id,
           week_start,
           sum(projects_created) over (partition by account_id
                                       order by week_start
                                       rows between 12 preceding and 1 preceding) as projects_created_last_12_weeks,
           sum(projects_created) over (partition by account_id
                                       order by week_start
                                       rows between 4 preceding and 1 preceding) as projects_created_last_4_weeks
    from weekly_usage_data
    safer

    Ultimately you can be as creative as you want. Just make sure your features are interpretable to the business. They will have ownership of making decisions from what the model recommends, so it is important that they understand how the model makes its predictions.

    With our problem well-framed and our dataset created, we are in good shape to begin modelling. Look for Part 2 in this 3 part series for a discussion of model training and evaluation.

    Get Started with Free Trial

    Experience new features and capabilities previously only available in our full AI Platform product.

    The post Tackling Churn with AI – Before Modelling appeared first on DataRobot AI Platform.

    ]]>
    GenAI: Automating Product Feedback Reports Using Generative AI and DataRobot https://www.datarobot.com/ai-accelerators/genai-automating-product-feedback-reports-using-generative-ai-and-datarobot/ Fri, 16 Feb 2024 16:10:22 +0000 https://www.datarobot.com/?post_type=aiaccelerator&p=53460 This accelerator shows how to use Predictive AI models in tandem with Generative AI models and overcome the limitation of guardrails around automating summarization/segmentation of sentiment text. In a nutshell, it consumes product reviews and ratings and outputs a Design Improvement Report.

    The post GenAI: Automating Product Feedback Reports Using Generative AI and DataRobot appeared first on DataRobot AI Platform.

    ]]>
    Going through customer review comments to generate insights for product development teams is a time intensive and costly affair. This notebook illustrates how to use DataRobot and generative AI to derive critical insights from customer reviews and automatically create improvement reports to help product teams in their development cycles.

    DataRobot provides robust Natural Language Processing capabilities. Using DataRobot models instead of plain summarization on customer reviews allows you to extract keywords that are strongly correlated with feedback. Using this impactful keyword list, Generative AI can generate user-level context around it in the user’s own lingua franca for the benefit of end users. DataRobot AI Platform acts as a guardrail mechanism which traditional text summarization lacks.

    Setup

    Install required libraries and dependencies

    In [ ]:
    !pip install "langchain==0.0.244" \
                 "openai==0.27.8" \
                 "datasets==2.11.0" \
                 "fpdf==1.7.2"

    Import libraries

    In [ ]:
    
    import json
    import os
    import warnings
    
    import datarobot as dr
    from fpdf import FPDF
    from langchain.chains import LLMChain
    from langchain.chat_models import AzureChatOpenAI
    from langchain.prompts.chat import (
        ChatPromptTemplate,
        HumanMessagePromptTemplate,
        SystemMessagePromptTemplate,
    )
    from langchain.schema import BaseOutputParser
    import numpy as np
    import pandas as pd
    
    warnings.filterwarnings("ignore")

    Configuration

    Set up the configurations reuired for secure connection to the generative AI model. This notebook assumes you have an OpenAI API key, but you can modify it to work with any other hosted LLM as the process remains the same.

    In [ ]:
    
    OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
    OPENAI_ORGANIZATION = os.environ["OPENAI_ORGANIZATION"]
    OPENAI_API_BASE = os.environ["OPENAI_BASE"]
    OPENAI_DEPLOYMENT_NAME = os.environ["OPENAI_DEPLOYMENT_NAME"]
    OPENAI_API_VERSION = os.environ["OPENAI_API_VERSION"]
    OPENAI_API_TYPE = os.environ["OPENAI_API_TYPE"]
    In [ ]:
    
    """with open("/home/notebooks/storage/settings.yaml", 'r') as stream:
        config = yaml.safe_load(stream)
    OPENAI_API_KEY = config['OPENAI_API_KEY']
    OPENAI_ORGANIZATION = config['OPENAI_ORGANIZATION']
    OPENAI_API_BASE = config['OPENAI_BASE']
    OPENAI_DEPLOYMENT_NAME = config['OPENAI_DEPLOYMENT_NAME']
    OPENAI_API_VERSION = config['OPENAI_API_VERSION']
    OPENAI_API_TYPE = config['OPENAI_API_TYPE']"""
    Out [ ]:
    
    'with open("/home/notebooks/storage/settings.yaml", \'r\') as stream:\n    config = yaml.safe_load(stream)\nOPENAI_API_KEY = config[\'OPENAI_API_KEY\']\nOPENAI_ORGANIZATION = config[\'OPENAI_ORGANIZATION\']\nOPENAI_API_BASE = config[\'OPENAI_BASE\']\nOPENAI_DEPLOYMENT_NAME = config[\'OPENAI_DEPLOYMENT_NAME\']\nOPENAI_API_VERSION = config[\'OPENAI_API_VERSION\']\nOPENAI_API_TYPE = config[\'OPENAI_API_TYPE\']'
    

    Functions and utilities

    The cell below outlines the functions to accomplish the following:

    • Extract high impact review keywords from product reviews using DataRobot.
    • During keyword extraction, implement guardrails for selecting models with higher AUC to make sure keywords are robust and correlated to the review sentiment.
    • Generate product development recommendations for the final report.

    LLM Parameters: Read the reference documentation for all Azure OpenAI parameters and how they affect output.

    In [ ]:
    
    class JsonOutputParser(BaseOutputParser):
        """Parse the output of an LLM call to a Json list."""
    
        def parse(self, text: str):
            """Parse the output of an LLM call."""
            return json.loads(text)
    
    
    def get_review_keywords(product_id):
        """Parse the Word Cloud from DataRobot AutoML model and generate the text input for the LLM."""
    
        keywords = ""
        product = product_subset[product_subset.product_id == product_id]
        product["review_text_full"] = (
            product["review_headline"] + " " + product["review_body"]
        )
        product["review_class"] = np.where(product.star_rating < 3, "bad", "good")
        project = dr.Project.create(
            product[["review_class", "review_text_full"]],
            project_name=product["product_title"].iloc[0],
        )
    
       """Creates a DataRobot AutoML NLP project with review text"""
        project.analyze_and_model(
            target="review_class",
            mode=dr.enums.AUTOPILOT_MODE.QUICK,
            worker_count=20,
            positive_class="good",
        )
        project.wait_for_autopilot()
        model = project.recommended_model()
        """logic to accept word ngram models and not char ngram models."""
        if max([1 if proc.find("word") != -1 else 0 for proc in model.processes]) == 0:
            models = project.get_models(order_by="-metric")
            for m in models:
                if max([1 if proc.find("word") != -1 else 0 for proc in m.processes]) == 1:
                    model = m
                    break
        word_cloud = model.get_word_cloud()
        word_cloud = pd.DataFrame(word_cloud.ngrams_per_class()[None])
        word_cloud.sort_values(
            ["coefficient", "frequency"], ascending=[True, False], inplace=True
        )
        # keywords = '; '.join(word_cloud.head(50)['ngram'].tolist())
    
        """Guardrail to accept higher accuracy models, as it means the wordclouds contain \
        impactful and significant terms only """
        if model.metrics["AUC"]["crossValidation"] > 0.75:
            keywords = "; ".join(word_cloud[word_cloud.coefficient < 0]["ngram"].tolist())
        return keywords
    
    template = f"""
        You are a product designer. A user will pass top keywords from negative customer reviews. \
        Using the keywords list, \
        provide multiple design recommendations based on the keywords to improve the sales of the product.
        Use only top 10 keywords per design recommendation.\
        
        Output Format should be json with fields recommendation_title, recommendation_description, keyword_tags"""
    
    system_message_prompt = SystemMessagePromptTemplate.from_template(template)
    human_template = "{text}"
    human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)
    
    chat_prompt = ChatPromptTemplate.from_messages(
        [system_message_prompt, human_message_prompt]
    )
    chain = LLMChain(
        llm=AzureChatOpenAI(
            deployment_name=OPENAI_DEPLOYMENT_NAME,
            openai_api_type=OPENAI_API_TYPE,
            openai_api_base=OPENAI_API_BASE,
            openai_api_version=OPENAI_API_VERSION,
            openai_api_key=OPENAI_API_KEY,
            openai_organization=OPENAI_ORGANIZATION,
            model_name=OPENAI_DEPLOYMENT_NAME,
            temperature=0,
            verbose=True,
        ),
        prompt=chat_prompt,
        output_parser=JsonOutputParser(),
    )

    Import data

    This accelerator uses the publicly available Amazon Reviews dataset in this workflow. This example uses a subset of products from the Home Electronics line. The full public dataset can be found here.

    You can also the individual parquet files:

    In [ ]:
    
    product_subset1 = pd.read_parquet(
        "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/amazon_us_reviews-train-00000-of-00002.parquet"
    )
    product_subset2 = pd.read_parquet(
        "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/amazon_us_reviews-train-00001-of-00002.parquet"
    )
    product_subset = pd.concat([product_subset1, product_subset2], axis=0)
    product_subset.info()
    Out [ ]:
    
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 705889 entries, 0 to 31888
    Data columns (total 15 columns):
     #   Column             Non-Null Count   Dtype 
    ---  ------             --------------   ----- 
     0   marketplace        705889 non-null  object
     1   customer_id        705889 non-null  object
     2   review_id          705889 non-null  object
     3   product_id         705889 non-null  object
     4   product_parent     705889 non-null  object
     5   product_title      705889 non-null  object
     6   product_category   705889 non-null  object
     7   star_rating        705889 non-null  int32 
     8   helpful_votes      705889 non-null  int32 
     9   total_votes        705889 non-null  int32 
     10  vine               705889 non-null  int64 
     11  verified_purchase  705889 non-null  int64 
     12  review_headline    705889 non-null  object
     13  review_body        705889 non-null  object
     14  review_date        705889 non-null  object
    dtypes: int32(3), int64(2), object(10)
    memory usage: 78.1+ MB
    In [ ]:
    
    product_subset.head()
    
    

    Out [ ]:

    product_subset.head()

    Out[ ]:

    01234
    marketplaceUSUSUSUSUS
    customer_id1798863729376983321214705496223413911
    review_idRY01SAV7HZ8QOR1XX8SDGJ4MZ4LR149Q3B5L33NN5R2ZVD69Z6KPJ4OR1DIKG2G33ZLNP
    product_idB00NTI0CQ2B00BUCLVZUB00RBX9D5WB00UJ3IULOB0037UCTXG
    product_parent667358431621695622143071132567816707909557698
    product_titleAketek 1080P LED Protable Projector HD PC AV V…TiVo Mini with IR Remote (Old Version)Apple TV MD199LL/A Bundle including remote and…New Roku 3 6.5 Foot HDMI – Bundle – v1Generic DVI-I Dual-Link (M) to 15-Pin VGA (F) …
    product_categoryHome EntertainmentHome EntertainmentHome EntertainmentHome EntertainmentHome Entertainment
    star_rating45514
    helpful_votes00000
    total_votes00020
    vine00000
    verified_purchase10111
    review_headlinegood enough for my purposeTell the Cable Company to Keep their Boxes!Works perfectly!It doesn’t work. Each time I try to use …As pictured
    review_bodynot the best picture quality but surely suitab…Not only do my TiVo Minis replace the boxes th…Works perfectly! Very user friendly!It doesn’t work. Each time I try to use it, th…I received the item pictured. I am unsure why…
    review_date2015-08-312015-08-312015-08-312015-08-312015-08-31

    Report generation loop

    This programmatic loop runs through the product list and generates the final report.

    In [ ]:
    from datetime import datetime
    
    In [ ]:
    product_list = ["B000204SWE", "B00EUY59Z8", "B006U1YUZE", "B00752R4PK", "B004OF9XGO"]
    
    pdf = FPDF()
    for product_id in product_list:
        print(
            "product id:",
            product_id,
            "started:",
            datetime.now().strftime("%m-%d-%y %H:%M:%S"),
        )
        keywords = get_review_keywords(product_id)
        """ Guardrail to generate report only if there are enough \
            Keywords to provide results"""
        if len(keywords) > 10:
            # report = chain.run(keywords)['recommendations']
            report = chain.run(keywords)
            if type(report) != list:
                report = chain.run(keywords)["recommendations"]
            product_name = product_subset[product_subset.product_id == product_id][
                "product_title"
            ].iloc[0]
            print("Adding to report")
            pdf.add_page()
            pdf.set_font("Arial", "B", 20)
            pdf.multi_cell(w=0, h=10, txt=product_name)
            for reco in report:
                pdf.cell(w=0, h=7, txt="\n", ln=1)
                pdf.set_font("Arial", "B", 14)
                pdf.multi_cell(w=0, h=7, txt=reco["recommendation_title"])
                pdf.set_font("Arial", "", 14)
                pdf.multi_cell(w=0, h=7, txt=reco["recommendation_description"])
                pdf.set_font("Arial", "I", 11)
                pdf.multi_cell(
                    w=0, h=5, txt="Review Keywords: " + ", ".join(reco["keyword_tags"])
                )
     print(
            "product id:",
            product_id,
            "completed:",
            datetime.now().strftime("%m-%d-%y %H:%M:%S"),
        )
    pdf.output(f"/home/notebooks/storage/product_development_insights.pdf", "F")

    Download the report

    Download the pdf named “product_development_insights.pdf” at “/home/notebooks/storage/” or from the notebook files tab in the UI Panel.

    In [ ]:
    
    from IPython import display
    
    display.Image(
        "https://s3.amazonaws.com/datarobot_public_datasets/ai_accelerators/images/image_report.jpg",
        width=800,
        height=400,
    )

    Out [ ]:

    Output

    Conclusion

    This accelerator demonstrates how you can use DataRobot and generative AI to identify key patterns in customer reviews and create reports or work items that can be used by product development teams to improve their products and offerings. Using various prompts you can steer the LLM into much more complex outputs like Agile stories, development plans, and more.

    Get Started with Free Trial

    Experience new features and capabilities previously only available in our full AI Platform product.

    The post GenAI: Automating Product Feedback Reports Using Generative AI and DataRobot appeared first on DataRobot AI Platform.

    ]]>
    6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them https://www.datarobot.com/blog/6-reasons-why-generative-ai-initiatives-fail-and-how-to-overcome-them/ Thu, 08 Feb 2024 14:17:53 +0000 https://www.datarobot.com/?post_type=blog&p=53330 There are six common roadblocks to proving business value with generative AI — and we’ll show you how to steer clear of each one.

    The post 6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them appeared first on DataRobot AI Platform.

    ]]>
    If you’re an AI leader, you might feel like you’re stuck between a rock and a hard place lately. 

    You have to deliver value from generative AI (GenAI) to keep the board happy and stay ahead of the competition. But you also have to stay on top of the growing chaos, as new tools and ecosystems arrive on the market. 

    You also have to juggle new GenAI projects, use cases, and enthusiastic users across the organization. Oh, and data security. Your leadership doesn’t want to be the next cautionary tale of good AI gone bad. 

    If you’re being asked to prove ROI for GenAI but it feels more like you’re playing Whack-a-Mole, you’re not alone. 

    According to Deloitte, proving AI’s business value is the top challenge for AI leaders. Companies across the globe are struggling to move past prototyping to production. So, here’s how to get it done — and what you need to watch out for.  

    6 Roadblocks (and Solutions) to Realizing Business Value from GenAI

    Roadblock #1. You Set Yourself Up For Vendor Lock-In 

    GenAI is moving crazy fast. New innovations — LLMs, vector databases, embedding models — are being created daily. So getting locked into a specific vendor right now doesn’t just risk your ROI a year from now. It could literally hold you back next week.  

    Let’s say you’re all in on one LLM provider right now. What if costs rise and you want to switch to a new provider or use different LLMs depending on your specific use cases? If you’re locked in, getting out could eat any cost savings that you’ve generated with your AI initiatives — and then some. 

    Solution: Choose a Versatile, Flexible Platform 

    Prevention is the best cure. To maximize your freedom and adaptability, choose solutions that make it easy for you to move your entire AI lifecycle, pipeline, data, vector databases, embedding models, and more – from one provider to another. 

    For instance, DataRobot gives you full control over your AI strategy — now, and in the future. Our open AI platform lets you maintain total flexibility, so you can use any LLM, vector database, or embedding model – and swap out underlying components as your needs change or the market evolves, without breaking production. We even give our customers the access to experiment with common LLMs, too.

    Roadblock #2. Off-the-Grid Generative AI Creates Chaos 

    If you thought predictive AI was challenging to control, try GenAI on for size. Your data science team likely acts as a gatekeeper for predictive AI, but anyone can dabble with GenAI — and they will. Where your company might have 15 to 50 predictive models, at scale, you could well have 200+ generative AI models all over the organization at any given time. 

    Worse, you might not even know about some of them. “Off-the-grid” GenAI projects tend to escape leadership purview and expose your organization to significant risk. 

    While this enthusiastic use of AI can be a recipe for greater business value, in fact, the opposite is often true. Without a unifying strategy, GenAI can create soaring costs without delivering meaningful results. 

    Solution: Manage All of Your AI Assets in a Unified Platform

    Fight back against this AI sprawl by getting all your AI artifacts housed in a single, easy-to-manage platform, regardless of who made them or where they were built. Create a single source of truth and system of record for your AI assets — the way you do, for instance, for your customer data. 

    Once you have your AI assets in the same place, then you’ll need to apply an LLMOps mentality: 

    • Create standardized governance and security policies that will apply to every GenAI model. 
    • Establish a process for monitoring key metrics about models and intervening when necessary.
    • Build feedback loops to harness user feedback and continuously improve your GenAI applications. 

    DataRobot does this all for you. With our AI Registry, you can organize, deploy, and manage all of your AI assets in the same location – generative and predictive, regardless of where they were built. Think of it as a single source of record for your entire AI landscape – what Salesforce did for your customer interactions, but for AI. 

    Roadblock #3. GenAI and Predictive AI Initiatives Aren’t Under the Same Roof

    If you’re not integrating your generative and predictive AI models, you’re missing out. The power of these two technologies put together is a massive value driver, and businesses that successfully unite them will be able to realize and prove ROI more efficiently.

    Here are just a few examples of what you could be doing if you combined your AI artifacts in a single unified system:  

    • Create a GenAI-based chatbot in Slack so that anyone in the organization can query predictive analytics models with natural language (Think, “Can you tell me how likely this customer is to churn?”). By combining the two types of AI technology, you surface your predictive analytics, bring them into the daily workflow, and make them far more valuable and accessible to the business.
    • Use predictive models to control the way users interact with generative AI applications and reduce risk exposure. For instance, a predictive model could stop your GenAI tool from responding if a user gives it a prompt that has a high probability of returning an error or it could catch if someone’s using the application in a way it wasn’t intended.  
    • Set up a predictive AI model to inform your GenAI responses, and create powerful predictive apps that anyone can use. For example, your non-tech employees could ask natural language queries about sales forecasts for next year’s housing prices, and have a predictive analytics model feeding in accurate data.   
    • Trigger GenAI actions from predictive model results. For instance, if your predictive model predicts a customer is likely to churn, you could set it up to trigger your GenAI tool to draft an email that will go to that customer, or a call script for your sales rep to follow during their next outreach to save the account. 

    However, for many companies, this level of business value from AI is impossible because they have predictive and generative AI models siloed in different platforms. 

    Solution: Combine your GenAI and Predictive Models 

    With a system like DataRobot, you can bring all your GenAI and predictive AI models into one central location, so you can create unique AI applications that combine both technologies. 

    Not only that, but from inside the platform, you can set and track your business-critical metrics and monitor the ROI of each deployment to ensure their value, even for models running outside of the DataRobot AI Platform.

    Roadblock #4. You Unknowingly Compromise on Governance

    For many businesses, the primary purpose of GenAI is to save time — whether that’s reducing the hours spent on customer queries with a chatbot or creating automated summaries of team meetings. 

    However, this emphasis on speed often leads to corner-cutting on governance and monitoring. That doesn’t just set you up for reputational risk or future costs (when your brand takes a major hit as the result of a data leak, for instance.) It also means that you can’t measure the cost of or optimize the value you’re getting from your AI models right now. 

    Solution: Adopt a Solution to Protect Your Data and Uphold a Robust Governance Framework

    To solve this issue, you’ll need to implement a proven AI governance tool ASAP to monitor and control your generative and predictive AI assets. 

    A solid AI governance solution and framework should include:

    • Clear roles, so every team member involved in AI production knows who is responsible for what
    • Access control, to limit data access and permissions for changes to models in production at the individual or role level and protect your company’s data
    • Change and audit logs, to ensure legal and regulatory compliance and avoid fines 
    • Model documentation, so you can show that your models work and are fit for purpose
    • A model inventory to govern, manage, and monitor your AI assets, irrespective of deployment or origin

    Current best practice: Find an AI governance solution that can prevent data and information leaks by extending LLMs with company data.

    The DataRobot platform includes these safeguards built-in, and the vector database builder lets you create specific vector databases for different use cases to better control employee access and make sure the responses are super relevant for each use case, all without leaking confidential information.

    Roadblock #5. It’s Tough To Maintain AI Models Over Time

    Lack of maintenance is one of the biggest impediments to seeing business results from GenAI, according to the same Deloitte report mentioned earlier. Without excellent upkeep, there’s no way to be confident that your models are performing as intended or delivering accurate responses that’ll help users make sound data-backed business decisions.

    In short, building cool generative applications is a great starting point — but if you don’t have a centralized workflow for tracking metrics or continuously improving based on usage data or vector database quality, you’ll do one of two things:

    1. Spend a ton of time managing that infrastructure.
    2. Let your GenAI models decay over time. 

    Neither of those options is sustainable (or secure) long-term. Failing to guard against malicious activity or misuse of GenAI solutions will limit the future value of your AI investments almost instantaneously.

    Solution: Make It Easy To Monitor Your AI Models

    To be valuable, GenAI needs guardrails and steady monitoring. You need the AI tools available so that you can track: 

    • Employee and customer-generated prompts and queries over time to ensure your vector database is complete and up to date
    • Whether your current LLM is (still) the best solution for your AI applications 
    • Your GenAI costs to make sure you’re still seeing a positive ROI
    • When your models need retraining to stay relevant

    DataRobot can give you that level of control. It brings all your generative and predictive AI applications and models into the same secure registry, and lets you:  

    • Set up custom performance metrics relevant to specific use cases
    • Understand standard metrics like service health, data drift, and accuracy statistics
    • Schedule monitoring jobs
    • Set custom rules, notifications, and retraining settings. If you make it easy for your team to maintain your AI, you won’t start neglecting maintenance over time. 

    Roadblock #6. The Costs are Too High – or Too Hard to Track 

    Generative AI can come with some serious sticker shock. Naturally, business leaders feel reluctant to roll it out at a sufficient scale to see meaningful results or to spend heavily without recouping much in terms of business value. 

    Keeping GenAI costs under control is a huge challenge, especially if you don’t have real oversight over who is using your AI applications and why they’re using them. 

    Solution: Track Your GenAI Costs and Optimize for ROI

    You need technology that lets you monitor costs and usage for each AI deployment. With DataRobot, you can track everything from the cost of an error to toxicity scores for your LLMs to your overall LLM costs. You can choose between LLMs depending on your application and optimize for cost-effectiveness. 

    That way, you’re never left wondering if you’re wasting money with GenAI — you can prove exactly what you’re using AI for and the business value you’re getting from each application. 

    Deliver Measurable AI Value with DataRobot 

    Proving business value from GenAI is not an impossible task with the right technology in place. A recent economic analysis by the Enterprise Strategy Group found that DataRobot can provide cost savings of 75% to 80% compared to using existing resources, giving you a 3.5x to 4.6x expected return on investment and accelerating time to initial value from AI by up to 83%. 

    DataRobot can help you maximize the ROI from your GenAI assets and: 

    • Mitigate the risk of GenAI data leaks and security breaches 
    • Keep costs under control
    • Bring every single AI project across the organization into the same place
    • Empower you to stay flexible and avoid vendor lock-in 
    • Make it easy to manage and maintain your AI models, regardless of origin or deployment 

    If you’re ready for GenAI that’s all value, not all talk, start your free trial today. 

    Webinar
    Reasons Why Generative AI Initiatives Fail to Deliver Business Value

    (and How to Avoid Them)

    Watch on-demand

    The post 6 Reasons Why Generative AI Initiatives Fail and How to Overcome Them appeared first on DataRobot AI Platform.

    ]]>
    GenAI: Observability Starter for HuggingFace Models https://www.datarobot.com/ai-accelerators/observability-starter-for-huggingface-models/ Fri, 26 Jan 2024 12:57:14 +0000 https://www.datarobot.com/?post_type=aiaccelerator&p=53017 This accelerator shows how users can quickly and seamlessly enable LLMOps or Observability in their HuggingFace-based generative AI solutions without the need of code refactoring.

    The post GenAI: Observability Starter for HuggingFace Models appeared first on DataRobot AI Platform.

    ]]>
    This accelerator shows how you can easily enable observability in your HuggingFace-based AI Solutions with the DataRobot LLMOps feature tools. It outlines an example of a byte-sized solution in its current state and then uses DataRobot tools to enable observability almost instantly for the solution.

    DataRobot provides tools to enable the observability of external generative models. All the hallmarks of DataRobot MLOps are now available for LLMOps.

    Setup

    Install the prerequisite libraries

    This notebook will be using one of the many publicly available LLMs on Huggingface’s hub. HuggingFace’s models are available via the popular Transformers library. The library provides high level APIs for most common language modeling tasks and solutions.

    In[1]:
    
    !pip install transformers torch py-readability-metrics nltk
    
    In[2]:
    
    !pip install datarobotx[llm] datarobot-mlops datarobot-mlops-connected-client

    Current state

    The following cells outline the current state of a simple Google T5 text generation model implementation. Google’s T5 is a text sequence to text sequence model and this accelerator uses a distilled version of T5 available on the Huggingface Hub.

    This accelerator uses the pipeline object from the Transformers library to build a text generation example. The pipeline object simplifies model inference by abstracting out most of the low level code. To enable observability on this implementation on your own, you would have to write code to take measurements, enable infrastructure to record all the measurements, and codify rules for interventions. This also introduces a lot of technical debt in the organization.

    In[ ]:
    
    from transformers import pipeline
    
    checkpoint = "MBZUAI/LaMini-T5-223M"
    model = pipeline("text2text-generation", model=checkpoint)
    
    parameters = {"max_length": 512, "do_sample": True, "temperature": 0.1}
    
    
    def get_completion(user_input, parameters):
        answer = model(user_input, **parameters)
        return answer
    
    
    response = get_completion("What is Agile in software development?", parameters)
    print(response[0]["generated_text"])
    Out[ ]:
    
    Downloading (…)lve/main/config.json:   0%|       | 0.00/1.48k [00:00<?, ?B/s]
    
    Downloading pytorch_model.bin:   0%|       | 0.00/892M [00:00<?, ?B/s]
    
    Downloading (…)neration_config.json:   0%|       | 0.00/142 [00:00<?, ?B/s]
    
    Downloading (…)okenizer_config.json:   0%|       | 0.00/2.32k [00:00<?, ?B/s]
    
    Downloading (…)/main/tokenizer.json:   0%|       | 0.00/2.42M [00:00<?, ?B/s]
    
    Downloading (…)cial_tokens_map.json:   0%|       | 0.00/2.20k [00:00<?, ?B/s]
    
    Agile is a software development methodology that emphasizes the use of software development as a continuous process, with the goal of delivering high-quality, efficient, and cost-effective software products to customers. It involves breaking down tasks into smaller, more manageable tasks, and delivering them in short, incremental increments, with the goal of delivering high-quality, reliable, and user-friendly software.

    Observability with DataRobot

    To enable observability on the above T5 model from Huggingface, you first need to create a deployment in DataRobot. This can be done from the GUI or the API based on your preference.

    Connect to DataRobot

    In[ ]:
    
    # Initialize the DataRobot Client if you are not running this code outside DataRobot platform.
    # import datarobot as dr
    # dr.Client(endpoint=ENDPOINT,token=TOKEN)
    
    from utilities import create_external_llm_deployment
    
    deployment_id, model_id = create_external_llm_deployment(checkpoint + " External")
    deployment_id
    Out[ ]:
    
    nltk_data] Downloading package punkt to /home/notebooks/nltk_data...
    [nltk_data]   Unzipping tokenizers/punkt.zip.
    
    Downloading (…)lve/main/config.json:   0%|       | 0.00/811 [00:00<?, ?B/s]
    
    Downloading pytorch_model.bin:   0%|       | 0.00/438M [00:00<?, ?B/s]
    
    Downloading (…)okenizer_config.json:   0%|       | 0.00/174 [00:00<?, ?B/s]
    
    Downloading (…)solve/main/vocab.txt:   0%|       | 0.00/232k [00:00<?, ?B/s]
    
    Downloading (…)cial_tokens_map.json:   0%|       | 0.00/112 [00:00<?, ?B/s]
    
    '651298b51e720eee4bfdda27'

    Initiate monitoring configuration

    The cells below declare and initialize monitoring configuration. The monitoring configuration lets DataRobot understand how to interpret the inputs and outputs of the external model. The pipeline object expects text input and named parameters which are configured in the MonitoringConfig object as seen below.

    The input_parser object helps capture and store the entire input or just the essential parts that you prefer.

    In[ ]:
    
    from datarobotx.llm.chains.guard import aguard, MonitoringConfig
    
    monitor_config = MonitoringConfig(
        deployment_id=deployment_id,
        model_id=model_id,
        inputs_parser=lambda prompt, parameters: {**{"prompt": prompt}, **parameters},
        output_parser=lambda x: {"answer": x[0]["generated_text"]},
        target="answer",
    )
    In[ ]:
    
    @aguard(monitor_config)
    async def get_completion(user_input, parameters):
        answer = model(user_input, **parameters)
        return answer
    
    
    response = await get_completion("What is Agile in software development?", parameters)
    print(response[0]["generated_text"])
    Out[ ]:
    
    Agile is a software development methodology that emphasizes the use of software development as a continuous process, with the goal of delivering high-quality, efficient, and cost-effective software products to customers. It involves breaking down tasks into smaller, more manageable tasks, and delivering them in short, incremental increments, rather than a fixed-size, standardized schedule.
    
    In[ ]:
    
    response = await get_completion("What is a kulbit maneuver?", parameters)
    print(response[0]["generated_text"])
    Out[ ]:
    
    A kulbit maneuver is a type of maneuver where a person lands on a surface with a curved or curved surface, and then lands on a surface with a curved or curved surface.
    

    Custom metrics

    Observability with DataRobot also supports custom user metrics. The following cells show how you can start capturing toxicity in user prompts and readability in generative model responses. Add the custom metrics in the cell below that you want to record to your deployment. Again, this step can be done using the GUI or the API based on user preference.

    • Toxicity in the user prompt
    • Readability (Flesch Score) of the model response
    In[ ]:
    
    from utilities import create_custom_metric
    
    TOX_CUSTOM_METRIC_ID = create_custom_metric(
        deployment_id=deployment_id,
        name="Prompt Toxicity",
        baseline="0.1",
        type="average",
        directionality="lowerIsBetter",
    )
    
    READ_CUSTOM_METRIC_ID = create_custom_metric(
        deployment_id=deployment_id,
        name="Response Readability",
        baseline="30",
        type="average",
        directionality="higherIsBetter",
    )

    Update the Huggingface completion endpoint

    Modify the prediction function to add code that calculates the metrics and submits them to the deployment. Now, whenever the prediction is requested from the distilled T5 model, the metrics are calculated and submitted on the deployment enabling you to monitor and intervene as necessary.

    In[ ]:
    
    from utilities import get_flesch_score, get_text_texicity, submit_custom_metric
    
    
    @aguard(monitor_config)
    async def get_completion(user_input, parameters):
        answer = model(user_input, **parameters)
        try:
            submit_custom_metric(
                deployment_id,
                READ_CUSTOM_METRIC_ID,
                get_flesch_score(answer[0]["generated_text"]),
            )
            submit_custom_metric(
                deployment_id, TOX_CUSTOM_METRIC_ID, get_text_texicity(user_input)
            )
        except Exception as e:
            print(e)
            pass
        return answer
    
    
    response = await get_completion(
        "What is Agile methodology in software development in detail?", parameters
    )
    print(response[0]["generated_text"])
    Out[ ]:
    
    Agile methodology is a software development approach that emphasizes the use of software development as a continuous process, with the goal of delivering software development in a timely and efficient manner. It involves breaking down the project into smaller, manageable tasks, focusing on the most critical aspects of the project, and allowing for flexibility and adaptability to change. Agile is often used in software development to ensure that the software is delivered on time, within budget, and with the right tools and techniques. It also emphasizes the importance of continuous learning and feedback, and the need for continuous improvement and continuous improvement.
    

    Conclusion

    Using the DataRobot tools for LLMOps, you can implement observability for Huggingface based applications easily with less friction while avoiding increased technical debt.

    Get Started with Free Trial

    Experience new features and capabilities previously only available in our full AI Platform product.

    The post GenAI: Observability Starter for HuggingFace Models appeared first on DataRobot AI Platform.

    ]]>