Machine Learning Use Cases | DataRobot AI Platform https://www.datarobot.com/use-cases/ Deliver Value from AI Fri, 08 Sep 2023 12:41:38 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.3 Classify Content Topics into Appropriate Categories https://www.datarobot.com/use-cases/classify-content-topics-into-appropriate-categories/ Sat, 12 Aug 2023 05:00:11 +0000 https://www.datarobot.com/?post_type=usecase&p=49585 Improve your user experience by classifying your content under their appropriate topics.

The post Classify Content Topics into Appropriate Categories appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

For media companies, poor content management has a detrimental impact on user experience. Users have the expectation that a website or app will intuitively lead them to the content they are searching for, and misclassifying content will prohibit them from doing just that. In addition, it has become increasingly difficult for the human eye to spot key trends across the tremendous amounts of content being generated on a daily basis. Both of these challenges currently require heaps of human annotators or analysts, but such solutions are unsustainable in the long run as content is ever increasing.

Intelligent Solution

AI helps you improve your user experience by classifying your content under their appropriate topics. AI can bucket content to up to 99 distinct categories, which can be based on predefined categories or topics your publishers have identified as key segments. Unlike existing solutions, AI is both fast and accurate. It is able to classify content in a fraction of the time and is able to do so with a reported 95% accuracy from one US firm. Text mining also allows you to discover the top trends or topics that exist not only among your content, but also among the conversations you have with users both in-app and on social media. This enables you to have a granular understanding of the content that customers like and dislike, and offers insight on ideas you should add to your content pipeline based on what’s trending.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Classify Content Topics into Appropriate Categories appeared first on DataRobot AI Platform.

]]>
Score Incoming Job Applicants https://www.datarobot.com/use-cases/score-incoming-job-applicants/ Sun, 06 Aug 2023 08:46:33 +0000 https://www.datarobot.com/?post_type=usecase&p=49745 Identify the most-qualified candidates from a broader pool of job applicants.

The post Score Incoming Job Applicants appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

An organization’s personnel are key to its success, but the right people can be hard to find. SHRM (the Society for Human Resource Management) estimated in 2017 that between applicants, referrals, and agencies, there are 100 applicants to every hire.

Recruiters dealing with high numbers of applicants are forced to process them extremely quickly—the notorious “six seconds” per resume rule—rather than spending time going deeper with the best candidates and using their time to craft a compelling value proposition for the candidate. Automated screenings can also give applicants results more quickly and dramatically speed up the hiring process.

Intelligent Solution

With AI, organizations can identify candidates who have the right background and credentials to be successful in the role. The explainable insights from AI models (e.g., the relative importance of education vs job experience for new entry-level hires) can provide valuable guidance for recruiters and hiring managers. Prediction Explanations (e.g., individual callouts highlighting what makes someone a particularly strong candidate) could also be used to inform a new hire’s onboarding process to proactively address any relative weaknesses.

Finally and most importantly, models are explainable, consistent, and can be documented to ensure compliance with regulatory guidelines and ensure fairness to applicants.

IMPORTANT: Many countries have laws in place to protect employees from discrimination in regards to hiring or employment decisions. Besides that, fairness is the right thing to do. It is incredibly important that you work closely and proactively with your organization’s HR and Legal and Compliance teams to ensure that the models you build will pass legal and ethical scrutiny before models are put into production (View the business implementation tab to learn more about this use case and Trusted AI).

Value Estimation

How would I measure ROI for my use case? 

Most HR departments track average cost per hire. This metric combines both internal and external factors (recruiter time, agency fees, recruiting bonuses, employee travel, etc.) and represents the total amount of money and time spent to bring a new hire into the organization. 

SHRM’s 2016 Human Capital Report benchmarked average cost per hire at $4,179. (This will vary by industry and job role, e.g., entry level roles will be lower.) If a machine learning algorithm can reduce the total candidate pool by 30% at the beginning of the hiring process, that can save recruiters time and dramatically reduce cost per hire. A 10% reduction in total cost per hire by reducing the demands on recruiters’ time would equate to over $400 saved per hire brought into the organization.

Technical Implementation

Problem Framing

A typical target variable for this use case is to predict whether an applicant will pass a recruiter screen, which is a binary classification problem. This prediction is usually a preliminary review done by the recruiter before passing an applicant to the hiring manager for consideration. 

However, defining a target can become complex and will need to be adapted to your process as it depends on the data your organization may or may not have. While many organizations ultimately want to predict a hire decision or even on-the-job performance, there may be data limitations based on how many people were actually hired into that role.

The target, i.e., the “end result” you are trying to predict, will define what features are included in the model; if the goal is to predict which new applicants will get passed by a recruiter to a hiring manager, then we cannot use hiring manager feedback in a model because, in practice, that feedback won’t be available yet. If the target is instead a hiring decision, then the model will do best when hiring manager feedback is included. The model should be trained on the available data at the time the decision is made.

These are some of the recommended features needed to build the model; you can add or remove features based on the nature of the data available and the requirements of the model you are trying to build. 

  • Numeric and categorical features from a structured application (e.g., previous job history, employers, education credentials)
  • Resume data, if available
  • Source of application
  • If required, external tools or resume parsers can be used to do pre-processing and provide additional structure to raw applicant data.

These datasets usually come from Greenhouse or a similar ATS (Applicant Tracking System). For jobs in which candidates don’t generally provide a resume, any kind of completed job application can be used provided it’s in machine-readable format (e.g., avoid scanned PDFs).

Sample Feature Set
Feature NameData TypeDescriptionData SourceExample
Pass_ScreenBinary (Target)Whether the applicant passes the hiring manager screen for a given roleATSTrue
Application SourceCategoricalSource of the application ATSEmployee referral
Highest degree attainedCategoricalHighest educational credentialATS2-year college degree
Previous employersTextList of previous employersATSBilly Jo’s Pizza
Educational studiesText or CategoricalDropdown or user-entered text describing educational studyATSBusiness Management
ResumeTextRaw resume text (if available)ATS (may need to be converted from PDF)
Questions asked on a job pageNumeric or Categorical“How many years experience do you have working directly with customers?”ATS
Job descriptionTextDescription of the position being hired forJob postings
Data Preparation 

To prepare the data, applicant data from an ATS is converted to machine readable as needed (e.g., text fields are extracted from a PDF document). Each row of the training data represents an application rather than an applicant, as applicants may apply to different positions or to the same position multiple times. Any external data sources are considered and added in as new features.

For an applicant scoring model to be accurate, it should be specific. Similar roles can be grouped together, but fundamentally different roles should be trained with different models. This is where automation and iteration are helpful. For instance, a model trained on hires within a specific geography might reveal more concrete insights (e.g., a certain university is a good feeding program for new Analysts) than a national model. 

We should also be careful to exclude people from our training data who “failed” the recruiter screen but were actually qualified. Recruiters may decide not to interview applicants for a variety of reasons unrelated to their qualifications, including because the candidates themselves expressed that they weren’t interested. This data can usually be found in an Applicant Tracking System the (ATS). 

Model Training

DataRobot Automated Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.

While we will jump straight to deploying the model, you can take a look here to see how DataRobot works from start to finish and to understand the data science methodologies embedded in its automation. 

A few key modeling decisions for this use case:

  • Partitioning: Hiring practices change over time in response to both the macroeconomic environment and organizational initiatives / hiring practices. An OTV (out of time validation) partitioning scheme will evaluate model performance on the most recent data and give a more accurate benchmark to how well the model will perform when deployed.
  • Setting a threshold:  If the model is to be used as a pass/fail screen, explore the false positive and negative rates at different thresholds. Ultimately, organizational demand will also help determine the threshold. For example, if the hiring pipeline is sparse, the needs of the organization might necessitate a lower threshold (e.g., more candidates passed) than the optimal one determined in training. 
  • Accuracy metrics: If the model is being used to stack-rank applicants, consider using AUC in addition to LogLoss as a measure of performance for binary classification. 

Business Implementation

Decision Environment 

After you finalize a model, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the methods by which predictions will ultimately be used for decision making.

Decision Maturity 

Automation | Augmentation | Blend 

There are many ways to implement a hiring model in practice. Some organizations use a hiring model as a pass/fail screening tool to cut down on the number of applications that recruiters are required to read and review. This has the advantage of giving candidates an answer more quickly.

Other organizations use the score as a way to stack-rank applicants, allowing recruiters to focus on the most promising candidates first. The most sophisticated is a blended approach: set a relatively low pass/fail barrier so that the “automatic no” values are removed from the pipeline up front. From there, provide recruiters the scores and the Prediction Explanations to help them make better decisions faster. 

Model Deployment

All new applicants to a role should be scored on a batch basis (e.g., one batch request per hour). Predictions and Prediction Explanations should be returned and saved in the database underlying the Applicant Tracking System. 

Decision Stakeholders
  • Decision Executors: Recruiters are the most direct consumers and will use the predictions on a daily or weekly basis.
  • Decision Managers: Hiring managers and ultimately Chief Human Resources Officers and their teams are responsible for making sure that decisions are being made correctly.
  • Decision Authors: Data scientists, HR analysts, and industrial/organizational psychologists are all well-positioned to build the models in DataRobot. IT Support or vendors can be brought in if there are particular data processing challenges (e.g., PDFs).
Decision Process

Here are some examples of decisions you can take using the predictions generated from the model.

  • Remove candidates who don’t meet minimum qualifications from the hiring pipeline.
  • Review the most promising candidates first.
  • Accelerate the review process for recruiters and hiring managers by leveraging explainable insights to highlight each candidate’s comparative strengths and weaknesses.
Model Monitoring 

Models should be retrained when data drift tracking shows significant deviations between the scoring and training data. In addition, if there are significant changes to the role (e.g., a change in the requirements of a position), the model will need to be refreshed. In that case, teams may have to manually rescore historical applicants against the updated requirements before model retraining can occur. 

Finally, think carefully about how to evaluate model accuracy. If a model imposes a pass/fail requirement but failing applicants are never evaluated by the recruiters, then we will track False Positives (applicants predicted to pass who did not) but not False Negatives (applicants rejected who would have passed). In a blended scenario (stack-ranking + scores), the model is directly influencing the recruiters’ decision making, which would tend to make the model seem more accurate than it is. 

The best way to evaluate accuracy is to have recruiters score a certain number of applicants independently and evaluate the model accuracy based on those cases.

Trusted AI 

In addition to traditional risk analysis, AI Trust is essential for this use case. 

Bias & Fairness: HR decision makers need to be aware of the risks that come with automating decision making within HR. Specifically, models trained on historically biased hiring practices can learn and reflect those same biases. It is incredibly important to make sure that your organization involves the right decision makers and content experts when building models to ensure that they remain fair.

However, there is also opportunity here. Upturn (a think tank focused on justice in technology) published guidelines for ethical AI in hiring. They note both the risks and opportunities of using AI in this space, suggesting that “with more deliberation, transparency, and oversight, some new hiring technologies might be poised to help improve on [our current ethical] baseline.”

The key, they argue, is explainability. Machine learning in this space must be transparent, documented, and explainable. Using the suite of explainability tools in DataRobot, non-data scientist HR teams can understand:

  • what features are important in a model
  • model performance for key segments or demographic groups
  • how the models are actually working in practice
  • individual-specific explanations of *why* a model is returning the score it does 

This is particularly important for free text fields, where it is essential to understand and actively control what words and phrases the model is allowed to learn from. Importantly, these models are not “black boxes;” rather, they are fully transparent and controlled by the organization.

In addition to explainability, bias testing should be part of model evaluation. One bias test that may be appropriate is statistical parity. With statistical parity, your goal is to measure if different demographic groups have an equal probability of achieving a favorable outcome. In this case, that would mean testing whether protected groups (e.g., race, gender, ethnicity) pass the recruiting screen at equivalent rates. In US law, the four-fifths rule is generally used to determine if a personnel process demonstrates adverse impact. A selection rate for a group less than four-fifths (80%) of the rate for another comparable group is evidence of adverse impact.

Note: Leaders interested in ethics and bias should also consider attending DataRobot’s course on Ethical AI, which teaches executives how to identify ethical issues within machine learning and develop an ethics policy for AI. 

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Score Incoming Job Applicants appeared first on DataRobot AI Platform.

]]>
Predict Policy Churn For New Customers https://www.datarobot.com/use-cases/predict-policy-churn-for-new-customers/ Thu, 03 Aug 2023 09:04:44 +0000 https://www.datarobot.com/?post_type=usecase&p=49591 Ensure the long term profitability of incoming members by predicting whether they will churn within the first 12 months of a policy application.

The post Predict Policy Churn For New Customers appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

Insurers across all policy types and geographies know that, if there’s anything they have in common, they operate in highly competitive markets. Regardless of whether insurers rely on agents, customer acquisition remains costly when it comes to sourcing prospects and turning them into paying members. A competitive market and complex sales environment translates into high costs of customer acquisition where P&C Insurers pay up to 15 percent in commissions on the first year of premiums sold. For Life Insurers, this can go up to more than 100 percent. While these costs are justified when they are able to develop long term relationships with customers, insurers lose out if the member churns within the first year because the cost of acquiring them would exceed their lifetime value. Alongside churn for renewals, churn for first year members is a large expense for any insurer.

Intelligent Solution

AI helps you ensure the long term profitability of incoming members by predicting in advance whether a prospect will churn within the first 12 months of a policy application. This allows your underwriters to thoroughly review the quality of prospects that are being generated by agents. Insurance companies are deploying AI into the field where their underwriters and agents are now able to prioritize the prospects they invest their time on, focusing first on the prospects that have the highest value and lowest risk of churning prematurely. Based on the data available on each prospect, AI also informs them of the top reasons why a prospect is likely to churn or be retained, equipping them with the ability to personalize their approach with every prospect.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Predict Policy Churn For New Customers appeared first on DataRobot AI Platform.

]]>
Improve Patient Satisfaction Scores https://www.datarobot.com/use-cases/improve-patient-satisfaction-scores/ Sun, 30 Jul 2023 22:07:08 +0000 https://www.datarobot.com/?post_type=usecase&p=49711 Increase patient satisfaction scores by predicting which patients are likely to submit poor scores and the primary reasons. Design interventions to improve their satisfaction.

The post Improve Patient Satisfaction Scores appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

To operate, Federally Qualified Health Centers (FQHC) rely on funding through programs such as Medicare and Medicaid. They are required to provide medical services on a sliding scale and are often governed by a board that includes patients. Success of an FQHC is measured in part on the satisfaction of patients having received care; this feedback plays a direct role in a center’s ability to receive funding and continue serving the local community.

In addition to understanding how hospital operations have led to poor satisfaction historically, it’s also important to have the ability to flag potential risks in real time. This can provide insight into which patients are potentially not having a great experience and can be used by hospital administration to intervene, talk to the patient, and rectify any issues. Increased patient satisfaction is a positive outcome for the community that the FQHC serves, and ensures that the provider can continue to operate to its fullest.

Intelligent Solution

With AI, hospital administrators can, in real time, understand which patients are likely to leave the hospital with a bad experience. Knowing which patients may be unsatisfied with their care, coupled with the primary reasons why, provides useful information to hospital administrators who can seek out patients, discuss the quality of their stay, and take steps to improve satisfaction. AI informs hospital administrators with the information they need to increase patient satisfaction scores: an outcome that serves the community and ensures that the provider can continue to operate.

There are two primary parts involved in this particular solution:

  • Predict which patients are likely to respond to a survey request.
  • Predict if that patient will respond with a positive or negative review of the care.
Value Estimation

How would I measure ROI for my use case? 

The value of increased patient satisfaction comes in two primary ways.  First, patient satisfaction is a key indicator of how well an FQHC is meeting the healthcare needs of the community that it serves.  Therefore, taking steps in increasing satisfaction is a positive outcome for the surrounding community.  Additionally, a one percent increase in satisfaction scores can substantially impact that funding that an FQHC receives, allowing it to expand services, hire more clinicians and continue providing quality care for those that need it most.

Technical Implementation

Problem Framing 

The target variable for this use case should be aligned to the survey output by which the provider is measured.  In this case, we’ll consider the Press Ganey patient satisfaction survey that buckets responses in three categories: top, middle, and bottom box, with top box being the best and bottom box being the worst.  Therefore, the target will be if the patient’s survey response was in the top, middle, or bottom box.  Furthermore, this means that we can frame this problem in two ways: 

  • Multiclass with three classes: top, middle, and bottom box.  This will be the approach we’ll consider during this tutorial.
  • It may also make sense to have three different binary models, one for each box score.

Additionally, it might be useful to also develop an initial model that predicts the likelihood of a response to a survey request. In this case, the target variable can be whether or not patients responded to surveys historically. 

Key features that are important in predicting patient satisfaction are listed below. They encompass information about the patient, their stay, diagnosis, and the interaction with clinicians.

  • Demographic information
  • Clinical data including diagnoses codes
  • Clinician notes
  • Clinician performance data and demographics
  • Patient history information such as number of previous visits, recency, and frequency of visits and reasons
  • Patient location information such as distance to provider

Beyond the feature categories listed above, we suggest incorporating additional data your organization may collect that could be relevant to readmissions. As you will see later, DataRobot is able to quickly differentiate important vs. unimportant features. 

Many of these features are generally stored across proprietary data sources available in an EMR system: Patient Data, Diagnosis Data, Clinician notes, Admissions Data. Examples of EMR systems are Epic and Cerner. 

Sample Feature List
Feature NameData TypeDescriptionData SourceExample
ResponseMulticlass (target)Top, middle or bottom box as 3,2 and 1, respectivelyProvided by survey vendor3
AgeNumericPatient age groupPatient Data40
Clinical NotesTextNotes from nurse or doctorClinical Data
Number of past visitsNumeric Count of the prior visits within a specific period.  Could also include all prior visits.Patient Data10
DistanceNumericDistance in miles between home and provider locationPatient Data20
Diagnosis CodeText (alpha numeric)Code indicating the diagnosis upon arrivalClinical DataE10.9
Model Training

DataRobot Automated Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.

Take a look here to see how to use DataRobot from start to finish and how to understand the data science methodologies embedded in its automation.

There are a couple of key modeling decisions for this use case:

  • Partitioning: It is possible that care procedures may change over time due to organizational or personnel changes.  Therefore, an OTV (out of time validation) partitioning scheme would be a good choice.  This approach will evaluate model performance on the most recent encounters and give a more accurate benchmark of how well the model will perform when deployed.
  • Accuracy metrics: LogLoss

Business Implementation

Decision Flow

After you are able to find the right model that best learns patterns in your data to predict patient satisfaction, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization, and how these stakeholders will make decisions using the predictions to impact the overall process. 

Decision Maturity 

Automation | Augmentation | Blend

In practice, the output of these models can be consumed by a team focused on ensuring care satisfaction during a patient’s stay. A daily report or dashboard of patients and their predicted satisfaction scores can provide a guide for senior hospital administration and an understanding, in real time, of which patients are potentially dissatisfied with their care.  Furthermore, leveraging Prediction Explanations from DataRobot can be a useful way to understand the primary drivers of the dissatisfaction and allow the decision maker to address those issues directly.   

Model Deployment

All new applicants should be scored on a daily batch basis. Predictions and Prediction Explanations should be returned and saved in the database, which can be integrated into a BI dashboard or tool available to the hospital administrators. This allows them to continuously monitor which patients are likely to be dissatisfied with their stay. 

Decision Stakeholders

Decision Executors

Hospital administrators focused on patient experience. They are often under the organization of the Chief Patient Experience Officer of a provider or hospital.

Decision Managers

The Chief Patient Experience Officer is ultimately responsible for executing the strategy to improve patient experience and responsible for the use of the model output.

Decision Authors

Analytics professionals and data scientists with strong understanding of the patient data and processes are best positioned to develop these models as well as develop a meaningful representation of the output in the form of dashboards or reports. Data engineers and IT support is needed to ensure that the stakeholders can receive timely predictions reliably since these will require daily action from hospital administration.

Decision Process
  • Coaching staff and clinicians are provided with information about potential issues.
  • Prediction Explanations can guide staff on the top drivers for dissatisfaction.
  • Staff can also speak directly to patients who are predicted to score low to understand what issues they may have regarding their experience and addressing those issues directly. 
Model Monitoring

Models should be retrained when data drift tracking shows significant deviations between the scoring and training data. 

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Improve Patient Satisfaction Scores appeared first on DataRobot AI Platform.

]]>
Predict Optimal Marketing Attribution https://www.datarobot.com/use-cases/predict-optimal-marketing-attribution/ Fri, 28 Jul 2023 01:03:44 +0000 https://www.datarobot.com/?post_type=usecase&p=49632 Optimize your marketing attribution by discovering which combination of touch points will lead to the highest amount of conversions.

The post Predict Optimal Marketing Attribution appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

Consumer touch points are now spread across social media, search engines, print advertisements, email, podcasts, and more. Since consumers are exposed to thousands of messages on a weekly basis, it has become essential for companies to stand out amid all the noise by investing in the touch points that resonate most with their customer base. But greater exposure has also come with a rise in costs-per-clicks. The Harvard Business Review estimates that global spending on media ballooned to $2.1 trillion in 2019. This reflects the difficulties companies face in understanding which touch points matter and which ones don’t. Since customer acquisition is multidimensional, where consumers are led to a brand not due to only one touch point but many, the aggregate analysis companies make today to determine future uplift is ineffective in the face of having to deliver individualized consumer preferences. The array of marketing touch points are too complex for marketers to optimize through a manual analysis.

Intelligent Solution

AI enables your marketers to optimize their marketing attribution by discovering which combination of touch points will lead to the highest amount of conversions. Unlike A/B testing where companies experiment different combinations and collect lagging data, AI saves marketers time by predicting in advance which combination of investments towards their 50+ touch points will generate the highest lift in responses. Using advanced algorithms, AI trains itself by learning the data from your past campaigns to discover underlying patterns that suggest what outcomes you’ll see in the future based on similarities and differences in the combinations of your marketing attribution. AI allows you to develop a unique marketing mix tailored to the purchasing decisions of your customers. You will not only increase your conversions but also do so more efficiently with as few resources as possible. Greater exposure does not always require greater marketing costs, but it does require greater personalization and utilization of the marketing you already deploy.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Predict Optimal Marketing Attribution appeared first on DataRobot AI Platform.

]]>
Predict Airline Customer Complaints https://www.datarobot.com/use-cases/predict-airline-customer-complaints/ Thu, 27 Jul 2023 06:46:37 +0000 https://www.datarobot.com/?post_type=usecase&p=49643 Predict customer complaints in order to discover drivers and take preventative action.

The post Predict Airline Customer Complaints appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

The airline industry, like many others, faces tremendous threats from customer dissatisfaction. In the first 9 months of 2014 alone, the Department of Transportation’s Aviation Consumer Protection Division received over 12,000 complaints including: flight delays, overbooking, mishandling of baggage, and poor customer servicing to name a few. Furthermore, given regulation in certain geographies like EU261 which mandates compensation for pre-defined service failures, this can be very costly. Whether it is a polite email notifying the customer about repatriation of lost luggage, a proactive callout to apologize for a recently delayed flight or financial compensation for a cancellation, there are actions an airline can take to delight the customer.

Intelligent Solution

AI can help empower your airline by predicting customer complaints and their severity. Your organization will be able to predict this information by using past complaints data in order to determine when a complaint is likely. Understanding when and why a complaint may occur will allow your organization to underpin a more effective service recovery strategy. The most common use cases include: forecasting volumes of complaints to inform call center staffing, predicting each customer’s propensity to complain, recommending the best service recovery solution; and where financial compensation is involved, recommending the depth of compensation. A lot of benefit can be measurably demonstrated through AB tests by operationalizing this insight and switching to a proactive service recovery approach, as opposed to being reactive to the customers’ calls. The service recovery paradox demonstrates that with the right targeting methodology, customers can actually be more delighted with proactive service recovery over-and-above if they were not service failed in the first place.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Predict Airline Customer Complaints appeared first on DataRobot AI Platform.

]]>
Predict Suicide Warning Signs https://www.datarobot.com/use-cases/predict-suicide-warning-signs/ Mon, 24 Jul 2023 02:12:25 +0000 https://www.datarobot.com/?post_type=usecase&p=49618 Provide a supplementary assessment that helps prevent suicides and save lives by predicting ahead of time who is likely to commit suicide.

The post Predict Suicide Warning Signs appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

The CDC reports that suicide is the 10th leading cause of death in the United States. For those who served in the military, suicide has an even more tragic impact where it is the second leading cause of death. 16.8 veterans die due to suicide every day, as shared by the US Department of Veteran Affairs. While suicide may sometimes appear as if they arose out of nowhere, 90% of the people who commit suicide had a diagnosable mental health condition that could have been treated. In particular, veterans who are more at risk of suicide, compared to the regular individual, may show signs of post-traumatic stress disorders from their time in deployment. Unfortunately, while efforts from government and healthcare institutions to create safe spaces for suicidal individuals to be treated have had some impact on reducing suicides, they are limited by their reactive nature. By then, it is frequently too late as their mental health has deteriorated for long periods of time without receiving the care and help they need.

Intelligent Solution

AI can provide a supplementary assessment that helps prevent suicides and save lives by predicting ahead of time who is likely to commit suicide. In discovering how AI can help the Federal Government reduce the number of warfighters who commit suicide, early results have shown that AI can predict the likelihood a warfighter is at risk of suicide with an overall accuracy of 74%. AI also revealed the reasons behind why warfighters were at risk, showing that thirty-five percent of warfighters who consumed an anxiolytic prescription in the previous six months attempted or committed suicide. As AI produces explainability in its predictions, it allows the government and healthcare workers to understand how they can potentially reduce the likelihood of suicide for each individual at risk, providing personalized mitigation approaches depending on the individual’s background and drivers.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Predict Suicide Warning Signs appeared first on DataRobot AI Platform.

]]>
Predict Which Patients Will Admit https://www.datarobot.com/use-cases/predict-which-patients-will-admit/ Sun, 23 Jul 2023 18:56:39 +0000 https://www.datarobot.com/?post_type=usecase&p=49609 Predict which patients are likely to be admitted to proactively improve their health.

The post Predict Which Patients Will Admit appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

The transition to value-based reimbursements compels providers to understand ways in which they can improve patient health outcomes while at the same time reduce the cost of delivery. Amongst the strategies to do so providers are educating their patients on the value of ambulatory clinics and home care in an effort to reduce the volume of avoidable hospital and emergency department admissions that are tied to higher cost and disruption in care. Fortunately, care coordination programs that focus on helping patients with prior admissions and comorbidities with home care have been shown to reduce the reliance on inpatient admittance. That said, not all patients may be enrolled in these programs and complex challenges remain in being able to identify which patients are likely to admit into acute care treatment.

Intelligent Solution

AI empowers your care managers by predicting in advance which patients are likely to be admitted. Unlike existing evaluations of admission risk that are based on limited factors, AI is able to evaluate admission risk with much higher accuracy by finding hidden patterns across outpatient, inpatient, emergency department, and care management data. Based on each patient’s medical history and interactions, AI will reveal which factors contribute to their risk of admission, offering care managers with an understanding of which intervention strategies they should apply depending on each patient’s conditions. Furthermore, AI maximizes the care manager’s impact as they can triage their patients by their probability of admissions. Adding additional focus on those at the highest risk of admissions will maximize the care manager’s utilization of resources. Intervention strategies care managers could apply once they have identified which patients have a high risk of admissions include enrolling them into programs that improve care coordination, home care, transportation, and medication adherence.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Predict Which Patients Will Admit appeared first on DataRobot AI Platform.

]]>
Predict Whether a Parts Shortage Will Occur https://www.datarobot.com/use-cases/predict-whether-a-parts-shortage-will-occur/ Thu, 20 Jul 2023 16:30:30 +0000 https://www.datarobot.com/?post_type=usecase&p=49873 Predict part shortages or late shipments in a supply chain network so that businesses can prepare for foreseeable delays and take data-driven corrective action

The post Predict Whether a Parts Shortage Will Occur appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

A critical component of any supply chain network is to prevent parts shortages, especially when they occur at the last minute. Parts shortages not only lead to underutilized machines and transportation, but also cause a domino effect of late deliveries through the entire network. In addition, the discrepancies between the forecasted and actual number of parts that arrive on time prevent supply chain managers from optimizing their materials plans.

Parts shortages are often caused by delays in their shipment. To mitigate the impact delays will have on their supply chain, manufacturers adopt approaches such as holding excess inventory, optimizing product designs for more standardization, and moving away from single-sourcing strategies. However, most of these approaches add up to unnecessary costs for parts, storage, and logistics.

In many cases, late shipments persist until supply chain managers can evaluate root cause and then implement short term and long term adjustments that prevent them from occurring in the future. Unfortunately, supply chain managers have been unable to efficiently analyze historical data available in MRP systems because of the time and resources required.

Intelligent Solution

AI helps supply chain managers reduce parts shortages by predicting the occurrence of late shipments, giving them time to intervene. By learning from past cases of late shipments and their associated features, AI applies these patterns to future shipments to predict the likelihood that those shipments will also be delayed. Unlike complex MRP systems, AI provides supply chain managers with the statistical reasons behind each late shipment in an intuitive but scientific way. For example, when AI notifies supply chain managers of a late shipment, it will also explain why, offering reasons such as the shipment’s vendor, mode of transportation, or country.

Then, using this information, supply chain managers can apply both short term and long term solutions to preventing late shipments. In the short term, based on their unique characteristics, shipment delays can be prevented by adjusting their transportation or delivery routes. In the long term, supply chain managers can conduct aggregated root-cause analyses to discover and solve the systematic causes of delays. They can use this information to make strategic decisions, such as choosing vendors located in more accessible geographies or reorganizing their shipment schedules and quantities.

Value Estimation

How would I measure ROI for my use case? 

The ROI for implementing this solution can be estimated by considering the following factors: 

  1. Starting with the manufacturing company and production line stoppage, the cycle time of the production process can be used to understand how much of the production loss relates to part shortages. For example, if the cycle time (time taken to complete one part) is 60 seconds and each day 15 minutes of production are lost to part shortages, then total production loss is equivalent to 15 products, which can be translated to loss in profit of 15 products in a day. A similar calculation can be used to estimate annual loss due to part shortage.
  2. For a logistic provider, predicting part shortages early can increase savings in terms of reduced inventory. This can be roughly measured by capturing the difference in maintaining parts stock before and after implementation of the AI solution. The difference in stock when multiplied with holding and inventory cost per unit gives the overall ROI. Furthermore, in cases when the demand for parts is left unfulfilled (because of part shortages), the opportunity cost related to the unsatisfied demand would directly result in loss of respective business opportunity.

Technical Implementation

About the Data

For illustrative purposes, we use a sample dataset provided by the President’s Emergency plan for AIDS relief (PEPFAR), which is publicly available on Kaggle. This dataset provides supply chain health commodity shipment and pricing data. Specifically, the dataset identifies Antiretroviral (ARV) and HIV lab shipments to supported countries. In addition, the dataset provides the commodity pricing and associated supply chain expenses necessary to move the commodities to other countries for use. We use this dataset to represent how a manufacturing or logistics company can leverage AI models to improve their decision making.

Problem Framing

The target variable for this use case is whether or not the shipment would be delayed (Binary; True or False, 1 or 0, etc.). This choice in target (Late_delivery) makes this a binary classification problem. The distribution of the target variable is imbalanced, with 11.4% being 1 (late delivery) and 88.6% being 0 (on time delivery). (See here for more information about imbalanced data in machine learning.)

The features below represent some of the factors that are important in predicting delays. The feature list encompasses all of the information in each purchase order sent to the vendor, which would eventually be used to make predictions of delays when new purchase orders are raised.

Beyond the features listed below, we suggest incorporating any additional data your organization may collect that could be relevant to delays. As you will see later, DataRobot is able to quickly differentiate important/unimportant features. 

These features are generally stored across proprietary data sources available in the ERP systems of the organization. 

Sample Feature List
Feature NameData TypeDescriptionData SourceExample
Supplier nameCategoricalName of the vendor who would be shipping the deliveryPurchase orderRanbaxy, Sun Pharma etc.
Part descriptionTextThe details of the part/item that is being shippedPurchase order30mg HIV test kit, 600mg Lamivudine capsules
Order quantityNumericThe amount of item that was ordered Purchase order1000, 300 etc.
Line item valueNumericThe unit price of the line item orderedPurchase order0.39, 1.33
Scheduled delivery dateDateThe date at which the order is scheduled to be deliveredPurchase order2-Jun-06
Delivery recorded dateDateThe date at which the order was eventually deliveredERP system2-Dec-06
Manufacturing siteCategoricalThe site of the vendor where the manufacturing was done since the same vendor can ship parts from different sitesInvoiceSun Pharma, India
Product GroupCategoricalThe category of the product that is orderedPurchase orderHRDT, ARV
Mode of delivery CategoricalThe mode of transport for part deliveryInvoiceAir, Truck
Late DeliveryTarget (Binary)Whether the delivery was late or on-timeERP System, Purchase Order0 or 1
Data Preparation 

The dataset contains historical information on procurement transactions. Each row of analysis in the dataset is an individual order that is placed and whose delivery needs to be predicted. Every order has a scheduled delivery date and actual delivery date, and the difference between these were used to define the target variable (Late_delivery). If the delivery date surpassed the scheduled date, then the target variable had a value 1, else 0. Overall, the dataset contains about 10,320 rows and 26 features, including the target variable. 

Model Training

DataRobot Machine Learning automates many parts of the modeling pipeline. Instead of hand-coding and manually testing dozens of models to find the one that best fits your needs, DataRobot automatically runs dozens of models and finds the most accurate one for you, all in a matter of minutes. In addition to training the models, DataRobot automates other steps in the modeling process such as processing and partitioning the dataset.

While we will jump straight to interpreting the model results, you can take a look here to see how DataRobot works from start to finish, and to understand the data science methodologies embedded in its automation. 

Something to highlight is, since we are dealing with an imbalanced dataset, DataRobot automatically recommends using LogLoss as the optimization metric to identify the most accurate model, being that it is an error metric which penalizes wrong predictions

For this dataset, DataRobot found the most accurate model to be Extreme Gradient Boosting Tree Classifier with unsupervised learning features using open source XGboost library.

Interpret Results

To give transparency on how the model works, DataRobot provides both global and local levels of model explanations. In broad terms, the model can be understood by looking at the Feature Impact graph, which shows the relative importance of the features in the dataset in relation to the selected target variable. The technique adopted by DataRobot to build this plot is called Permutation Importance

As you can see, the model identified Pack Price, Country, Vendor, Vendor INCO Term, and Line item Insurance as some of the most critical factors affecting delays in the parts shipments. 

Feature impact - DataRobot AI Platform

Moving to the local view of explainability, DataRobot also provides Prediction Explanations that enable you to understand the top 10 key drivers for each prediction generated. This offers you the granularity you need to tailor your actions to the unique characteristics behind each part shortage. 

For example, if a particular country is a top reason for a shipment delay, such as Nigeria or South Africa, you can take actions by reaching out to vendors in these countries and closely monitoring the shipment delivery across these routes.

Similarly, if there are certain vendors that are amongst the top reasons for delays, you can reach out to these vendors upfront and take corrective actions to avoid any delayed shipments which would affect the supply chain network. These insights help businesses make data-driven decisions to improve the supply chain process by incorporating new rules or alternative procurement sources.

Prediction explanations - DataRobot AI Platform

For text variables, such as Part description (included in the dataset), we can look at Word Clouds to discover the words or phrases that are highly associated with delayed shipments. Text features are generally the most challenging and time consuming to build models for, but with DataRobot each individual text column is automatically fitted as an individual classifier and is directly preprocessed with NLP techniques (tf-idf, n grams, etc.) In this case, we can see that the items described as nevirapine 10 mg are more likely to get delayed in comparison to other items.

Word cloud - DataRobot AI Platform
Evaluate Accuracy

To evaluate the performance of the model, DataRobot by default ran five-fold cross validation and the resulting AUC score (for ROC Curve) was around 0.82. Since the AUC score on the holdout set (unseen data) was also around 0.82, we can be reassured that the model is generalizing well and is not overfitting. The reason we look at the AUC score for evaluating the model is because AUC ranks the output (i.e., the probability of delayed shipment) instead of looking at actual values. The Lift Chart below shows how the predicted values (blue line) compared to actual values (red line) when the data is sorted by predicted values. We see that the model has slight under-predictions for the orders which are more likely to get delayed. But overall, the model does perform well. Furthermore, depending on the problem being solved, you can review the confusion matrix for the selected model and, if required, adjust the prediction threshold to optimize for precision and recall. 

Lift chart - DataRobot

Business Implementation

Decision Environment 

After the right model has been chosen, DataRobot makes it easy to deploy the model into your desired decision environment. Decision environments are the ways in which the predictions generated by the model will be consumed by the appropriate stakeholders in your organization, and how these stakeholders will make decisions using the predictions to impact the overall process. 

Decision Maturity 

Automation | Augmentation | Blend

The predictions from this use case can augment the decisions of the supply chain managers as they foresee any upcoming delays in logistics. It acts as an intelligent machine that, combined with the decisions of the managers, help improve your entire supply chain network. 

Model Deployment 

The model can be deployed using the DataRobot Prediction API. A REST API endpoint which would be used to bounce back predictions in near real time when new scoring data from new orders are received. 

Once the model has been deployed (in whatever way the organization decides), the predictions can be consumed in several ways. For example, a front-end application that acts as the supply chain’s reporting tool can be used to deliver new scoring data as an input to the model, which then bounces back predictions and Prediction Explanations in real time.

Decision Stakeholders

The predictions and Prediction Explanations would be used by supply chain managers or logistic analysts to help them understand the critical factors or bottlenecks in the supply chain.

Decision Executors

Decision executors are the supply chain managers and procurement teams who are empowered with the information they need to ensure that the supply chain network is free from bottlenecks. These personnel have strong relationships with vendors and the ability to take corrective action using the model’s predictions.

Decision Managers

Decision managers are the executive stakeholders such as the Head of Vendor Development who manage large scale partnerships with key vendors. Based on the overall results, these stakeholders can perform quarterly reviews of the health of their vendor relationships to make strategic decisions on long-term investments and business partnerships.

Decision Authors

Decision authors are the business analysts or data scientists who would build this decision environment. These analysts could be the engineers/analysts from the supply chain, engineering, or vendor development teams in the organization who usually work in collaboration with the supply chain managers and their teams.

Decision Process

The decisions that the managers and executive stakeholders would take based on the predictions and Prediction Explanations for identifying potential bottlenecks would be reaching out and collaborating with appropriate vendor teams in the supply chain network based on data-driven insights. The decisions could be both long- and short-term based on the severity of the impact of shortages on the business.

Model Monitoring 

One of the most critical components in implementing an AI is having the ability to track the performance of the model for data drift and accuracy. With DataRobot MLOps, you can deploy, monitor and manage all models across the organization through a centralized platform. Tracking model health is very important for proper model lifecycle management, similar to product lifecycle management. 

Implementation Risks

One of the major risks in implementing this solution in the real world is adoption at the ground level. Having strong and transparent relationships with vendors is also critical in taking corrective action. The risk is that vendors may not be ready to adopt a data-driven strategy and trust the model results. 

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Predict Whether a Parts Shortage Will Occur appeared first on DataRobot AI Platform.

]]>
Reduce Avoidable Returns https://www.datarobot.com/use-cases/reduce-avoidable-returns/ Sun, 16 Jul 2023 12:49:58 +0000 https://www.datarobot.com/?post_type=usecase&p=49587 Predict which products will be returned and conduct a root cause analysis to prevent avoidable returns.

The post Reduce Avoidable Returns appeared first on DataRobot AI Platform.

]]>

Overview

Business Problem

Product returns are goods that retailers or consumers send back to manufacturers due to reasons that are either non-preventable or preventable. Both are collectively bucketed under a manufacturer’s total warranty returns (TWR). While returns are often an afterthought, they impact the average manufacturer’s profitability by 3.8%, a significant cost item that incrementally chips away at their profit margins. Manufacturers often lack the information needed to deeply understand the volume of expected returns and why returns occur. While there are numerous solutions manufacturers can establish to learn about the past, what remains to be true is that there is a wide gap in the manufacturer’s ability to have forward-looking insights that can help reduce the financial impact of returns.

Intelligent Solution

AI analyzes historical data that you collect on product returns to learn patterns that can help it predict which products are likely to be returned in the future. With advancements in interpretability, AI will also offer your supply chain managers the top reasons why each individual product is likely to be returned. Using these insights, your supply chain managers can conduct a root cause analysis to prevent avoidable returns and to make iterations to their product or manufacturing processes. For products that have a high risk of being returned, supply chain managers can conduct a cost and benefit analysis on whether they will incur a net loss with the shipment required to deliver the product to and back from the customer. Product returns may also reveal insights on other challenges such as identifying quality defects. Identifying returns can not only help on the product level but can also enable financial analysts to embed forecasted returns into their cash flow projections, ensuring that your organization is prepared for worst-case scenarios.

banner purple waves bg

Experience the DataRobot AI Platform

Less Friction, More AI. Get Started Today With a Free 30-Day Trial.

Sign Up for Free

The post Reduce Avoidable Returns appeared first on DataRobot AI Platform.

]]>