Module 2: Machine learning in business

Unit 1: Key features of machine learning

Machine learning as a discipline seeks to design, understand, and use computer programs that learn from experience (i.e., data), without being explicitly programmed for specific modeling, prediction, or control tasks. In his interview with Professor Malone, he suggested three requirements for using machine learning:

  1. The problem can be formulated as a machine learning problem.
  2. There is much relevant data available that could be used by machine learning algorithms.
  3. The system has enough regularity in it that there are patterns to be learned.

Understanding machine learning

Professor Jaakkola provides a brief definition of machine learning, and discusses how it can be used to solve prediction problems. He also gives examples of the types of prediction problems that can be addressed, and touches on supervised learning and convolutional neural networks. He then highlights the recent progress made in the machine learning field, discussing deep learning architectures, with DeepMind AlphaGo used as an example.

  • programmers use tools to abstract out the complexity of writing machine code
  • what about creating general learning program that learn from experience, data
  • Tyopes of Prediction problems
    • future events
    • predict properties not know yet.
  • formulate ml problem, by learning from examples
  • Labeled data is a group of samples that have been tagged with one or more labels. Labeling typically takes a set of unlabeled data and augments each piece of that unlabeled data with meaningful tags that are informative. For example, labels might be indicate whether a photo contains a horse or a cow, which words were uttered in an audio recording, what type of action is being performed in a video, what the topic of a news article is, what the overall sentiment of a tweet is, whether the dot in an x-ray is a tumor, etc. Labels can be obtained by asking humans to make judgments about a given piece of unlabeled data (e.g., “Does this photo contain a horse or a cow?”), and are significantly more expensive to obtain than the raw unlabeled data. After obtaining a labeled dataset, machine learning models can be applied to the data so that new unlabeled data can be presented to the model and a likely label can be guessed or predicted for that piece of unlabeled data.

Supervised learning –  Supervised: given explicitly input examples and target label, that I wish to predict: … is the machine learning task of inferring a function from labeled training data.[1] The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a “reasonable” way (see inductive bias).

  • Typically had to program in detail what you wanted a machine to do, that is  then translated in to lower level code. But now instead, you can write a general program that included instruction to learn from experience (i.e. data).. then teach them what to do by giving them more data and direction….

Formulate by learning from examples.

  • Training set: Label examples of movies I do and don’t like +1 / -1
    • Translate in to feature vector, based on questions we would ask…
      • Genre, famous lead, etc?  Then calculate in to a binary feature vector
  • Test set: Then I go through the same process for movies that I have not seen.
  • Training set give us clues, to predict the label in the test set. Then the algorithm classifies new movies presented in to like and dislike.

In computer vision

  • Learn from idneitifyed featured in multiple passes.
  • Layeers of transfomariont, then you get a deep model … Convulution Neural networks
  • Movie recommendation
    • Take descriptions and translate in to a feature vector 
    • Ask questions about the movies, then calculate those answers in to a vector… What is the genre, famous lead, etc… then calculate in to a binary vector….
    • Then calculate the same vector for movies you have note seen…
    • In the training set you have labeled vectors; in the teat set you just have the vectors that you need to provide the label for…
    • Move into a geometric form and put the vectors as a point in space, based on labels.
    • problem, how to classify test set, training set gives me clues to classification of data generally.
    • You can divide data in to halves of -/+ then bring back test set and classify them in to the two halves, based on the learning form the training set.
  • Convolution neural network,
    • Classify images in to content categories
    •  progressive learning of what an Item is by looking at a fraction (small features) of the pixels then gradually taking in more and more of the image.
  • Recent progress
      • 1- Error has gone down by 50% to 100% per year… 2012 -> 2014…
      • 2 – The red represents deep learning approaches, and you can see how they have taken over computer vision
      • We also see advancements in machine translations, captures semantics, etc… You are giving an example of the correct behavior, then you are trying to automate the process of finding the solutions
  • Learning to act: Playing Games
    • GO – Looked at game board and learned to match it to what actions you could take, instead of thinking about all of the possible actions upfront. it did this by watching human experts play. However, you could probably do better by watching a computer play. Deepmind alpha go 

Professor Jaakkola provides four main reasons for the recent advances made in machine learning, as follows: 

  1. The accumulation of huge amounts of data 
  1. Advances in computational power
  1. The growing complexity of models
    • Large models are easier to train
  1. The new possibilities created by deep learning architectures
    • Flexible neural “lego pieces”
    • Common Vector representaiton
      • Allows you to take an image and generate a word, or sentence, or take a word or sentence and generate an image… Because this model has been put in to vectors that are cross referenced. Allows the transfer from one domain to another in a very simple way…
    • Recurrent neural networks, takes known information and new information, to derive a new vector representation. and see how it functions for the classification task you will want to use it for.
    • Google translate works in this way…
    • Takes a lot of data to get examples of behavior. 
    • Amount of data: What data i have prior to trying to solve a problem… What additional data do I need.
    • Types of machine learning:
      • Unsupervised learning is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data.
      • Semi-supervised learning is a class of supervised learning tasks and techniques that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Many machine-learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy.
      • Active learning is a special case of semi-supervised machine learning in which a learning algorithm is able to interactively query the user (or some other information source) to obtain the desired outputs at new data points.[1] [2] In statistics literature it is sometimes also called optimal experimental design. [3] There are situations in which unlabeled data is abundant but manually labeling is expensive. In such a scenario, learning algorithms can actively query the user/teacher for labels. This type of iterative supervised learning is called active learning.
      • Transfer learning or inductive transfer is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.[1] For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although formal ties between the two fields are limited.
    • Interpretable modeling
      • deep learning is pretty opaque, so we need to make them interpretable… highlight the rationale to the solution/predictions.  This will help us guide them.

Professor Jaakkola explains, at a high level, the difference between 

Shallow learning and 

  • say in image prediction (recognition) it would be taking all pixels and bouncing off of a simple classification model to see if you can determine what it is… based on all available pixels. Pixels -> Cat or human
    • “C” or “D”

Deep learning architecture 

  • layers of processing, features, combination of features  breakdown of data in to discrete parts to understand them, then bringing together to provide the prediction. Pixels -> arm -> hands -> fingers -> eyes -> edges  -> wiskers -> etc etc etc etc ->humam or cat
    • “Dog” or “Car”

and talks about 

neural networks and 

convolutional neural networks

  • detectors for each area of an image… Spatial analysis

The two professors discuss what is currently both easy and difficult to do with machine learning algorithms. Professor Jaakkola poses three questions to help you decide whether machine learning is possible as an application to solve a problem. 

  • Can you formulate in to a machine learning problem? 
    • You can use supervised learning to predict many things. Provide training data and expected results, and this can be used to find answers in target data sets.
  • Do I already have tools available to address that formulation?
    • Classifying images
    • Interpretating natural language
    • Business decision (stock picking)
    • etc.
  • Is it likely to work well. as a solution to the problem?
    • With formulation and tools, still unable to develop high confidence results…
    • Example: Planetary motion can be chaotic…

They talk about the differences between 


  • labeled data with target outcomes.

unsupervised, and 

  • Trying to understand based on observation/how things work.. what are the regular structures… Then when you get feedback, it increasing the learning.

reinforcement learning

  • used to learn to action, based on changes to the sate of the world, you need to act differently…

as well as the amount of training data or examples that machines need to make predictions.

  • Supervised: Do you have historical data, actions and outcomes? featurize data.. attributes being tracked.
  • You can use unsupervised if you dont have a lot of data, just to start.
  • If you have feature representation exists, then very little data one example will allow fpor a prediction, but that is based on a lot of data to train the system
  • Starting from scratch – Tabula rasa refers to the epistemological idea that individuals are born without built-in mental content and that therefore all knowledge comes from experience or perception.
    • if it’s easy (high correlation) could be few 100 examples…
    • if it’s hard (low correlation) could be much more… 
    • exactly how much data is based on the complexity of the formulation problem, and can be assessed by a data scientist.E

machine learning systems currently can’t properly explain how they came up with an answer and whether this will change in the future. 

  • Active area of research. Interpretability. How and why a decision was made.
  • No commercially available solutions at this time. Possibly next year or two. Need will drive this… Think medical fields.

Lastly, he explains how formulation is key to understanding machine learning.

  • Formulation is the key. That is… In this case, this is what I want…and here are illustration of what I want. to then run through ML.
  • What are the inputs, and what are the outputs you are looking for… 
  • Recognize machine learning problems all over the place…

Three requirements for using machine learning:

  1. The problem can be formulated as a machine learning problem. Formulation is the key. That is… In this case, this is what I want…and here are illustration of what I want. to then run through ML.
  2. There is much relevant data available that could be used by machine learning algorithms.
  3. The system has enough regularity in it that there are patterns to be learned.

Unit 2: Business applications of machine learning [± 2 hour 30 minutes]

An executive’s guide to machine learning:\

In an article from McKinsey & Company, machine learning is explored from an executive’s perspective. The article discusses how organizations are using machine learning for insights, the importance of strategy in getting started with machine learning, and the role of senior executives in leading such initiatives.


  • Perceiving large amounts of data from sensors in the world, and learning what to recognize what there.
  • Recognizing movement, sounds, temps, light, vibrations, faces, retina, finger-prints, scenes for autonomous cars…
  • Key points and strategy
    • Applications like Shoegazer could be used to reduce marketing costs (customers point the app to a pair of shoes and click to buy a pair for themselves) or to support the development of new features for shoes (customers use the app to show features they like). This kind of machine learning can be used not only for images, but also for sounds (for example, Shazam is a mobile phone app that can identify songs)

Image analysis

  • Key points and strategy
    • People play a role in improving the accuracy of the scene analysis by identifying the objects that the machines have flagged as unknown, and then feed those images into the training set for the machine, so that future versions become smarter. Mobileye’s customers use the company’s AI technology to differentiate their products.


  • What will happen in the fraud, disease, mechanical failure, crop yields.
  • can customize these models by firm by customer base to get very customized based on the situation.
  • Key points and strategy
    • As Professor Lo explained, machine learning algorithms examine different parameters from those which traditional credit-scoring models examine. The insights gained from the machine learning techniques provide banks with a new lens that enables banks to predict consumer delinquency with a higher, finer-grained accuracy. Using this type of machine learning application, banks can pursue a cost reduction strategy that reduces their losses from non-payment.
    • As this example shows, AXA uses machine learning to predict which customers are most likely to cause accidents that would cost AXA US$10,000 or more. Such predictions support a focus-based strategy, because they enable AXA to write policies only for lower-risk customers.
    • PayPal uses machine learning to support a low-cost strategy of being efficient in customer service while avoiding fraud and customer inconvenience. Specifically, PayPal uses machine learning to determine whether a visitor to the site is a trustworthy customer. If the user seems suspicious, the system will ask for additional verification. The combination of neural networks, vast quantities of data, and deep learning have greatly improved fraud detection, but the machines cannot do it alone; people need to decide which data is relevant for the machine to use.

Personalizing product offerings

  • Key points and strategy
    • Pursuing a strategy of differentiation, Netflix combines machine learning with human curation to tailor its offerings precisely to the tastes of each individual customer.
    • Like Netflix, Stitch Fix uses machine learning to pursue its strategy of differentiation, by personalizing its product offerings for each customer. Furthermore, human stylists and the company’s algorithms work hand in hand. The algorithms augment the human stylists’ productivity by doing tedious tasks such as matching client measurements to different brands and products. The stylists meanwhile read the personal notes that customers have sent and analyze their Pinterest boards to determine each customer’s nuances of style and taste. Stitch Fix’s approach illustrates three lessons about how to combine human expertise with AI systems. 
      1. It’s important to keep humans in the business-process loop; machines can’t do it alone. 
      2. Companies can use machines to supercharge the productivity and  effectiveness of workers in unprecedented ways. And 
      3. Various machine-learning techniques should be combined to effectively identify insights and foster innovation.

Improving product performance using better predictions

  • Prof. Randall Davis… Digital Cognition Technologies (DTCclock)
  • Analyze the product and the process, to screen people for cognitive problems.
  • Benefits
    • Reliability – test always done the same way from person to person.
    • Detailed and informative measure – accuracy of screener
    • Early indications of cognitive problems 
  • You can do the same in other aspect of business. Know it before it breaks  
    • Good sensors
    • good human interpreters of sensors, and 
    • good machine learning tools
  • Not just about the technology, it’s an interplay between human and computers
  • Humans: Development of initial system, and then dealing with exceptiopns.
  • Analyze pen-strokes for classification, then take in geometric and temporal data for assessment, looking for infirm and normal.
    •  sketch understanding
    • recognize most indicative features of data gathered.
  • Buyer considerations for AI applications
    • Does it work, show me tests, and performance, show me failures.
    • Do you have the right kind of expertise in-house to develop, improve, and maintain?
    • Keep the no magic principal
  • Key points and strategy
    • As Professors Malone and Davis discussed, the machine learning aspect of the DCTclock can be generalized to many different applications of a screening test to provide early indication of a potential problem. In the case of the DCTclock, the machine looks for patterns that indicate cognitive impairment. The machine was trained using an expert’s knowledge (a neuroscientist) about the features that are early indicators of potential problems. The machine learned to find the patterns that indicate the early onset of impairment. Human and computer work together in that the machine can suggest which patients a clinician may want to watch or follow up with, and the clinician then provides the diagnosis and treatment therapy. The same process applies to the early screening of any kind of problem, such as a factory machine or a jet engine. Computers can alert people to an engine that may need repair, for example. The people then use their expertise to pinpoint the cause of the upcoming problem and take preventative action. Systems like these can support a differentiation strategy by allowing for more accurate predictions about future problems.
  • Professor Alex Pentland from MIT talks about honest signals and how they can be used to interpret human intention and emotion. He talks about the “second language” that people use, consisting of body language and signaling behavior, as a result of some of our basic neural processes. Professor Pentland uses the example of Cogito’s software, which has been used to analyze conversations between call center agents and clients in real time. Computers and people work together, as the machine learning system interprets whether the conversation is going well or not, and then provides feedback to the agent to adjust their behavior for a more engaging and effective conversation. 
    • Listening to the non-linguistic aspect of a conversation to make all center reps more effective.
    • Think about the signals we ca not see/hear, as we focus just on words.
    • Cogito…slow down, stop talking, redirect…
      • increase customer engagement
      • more effective communication
      • less conflict
      • lower call center turnover
    • Key points and strategy
      • Professor Pentland’s research has found that “unspoken” language – the nonlinguistic signaling of interest, attention, dominance, and so on – accounts for 40–50% of the outcome of a conversation. People tend to focus consciously on words, so it is more difficult for them to tap into these signals, but computers can help them out. Professor Pentland used supervised machine learning in creating Cogito to “listen” to these unspoken features and predict how the conversation between a customer service representative and a customer is going. Cogito’s software supports a differentiation strategy by facilitating higher-quality customer service interactions by letting representatives know if the customer is paying attention or is getting angry, for example.

Module Artifacts:

MIT AI M2U1 Video 1 Transcript.pdfMIT AI M2U1 Video 2 Transcript.pdfMIT AI M2U1 Video 3 Transcript.pdfMIT AI M2U1 Video 4 Transcript.pdfMIT AI M2U2 Casebook Video 1 Transcript.pdfMIT AI M2U2 Casebook Video 2 Transcript.pdfMIT AI M2U2 Casebook Video 3 Transcript.pdfMIT AI M2U2 Casebook Video 4 Transcript.pdfMIT AI_M2 U2 Casebook.pdf