Using Machine Learning to Improve UX

PredictionIO and Drupal
Mug shot of Chris Z

@aczietlow

Drupal Lead Engineer, Mindgrub
Mug shot of Colin C

@ccrampton

Software Engineer, Mindgrub

Mindgrub

Technical Agency, that also happens to specialize in Drupal

Mindgrub office and culture ping pong table

How'd we get here?

1997 - IBM Deep Blue defeats Garry Kasparov at Chess

IBM deep blue chess victor

2011 - IBM's Watson crushes it on Jeopardy

IBM watson winning jeopardy

2012 – Google’s X Lab builds ML algorithm to identify cat videos

Cat gif

2014 - Sergey Bin announces Google is working on self driving cars

Google's self driving car

Machine Learning

Tweet about machine learning

Machine Learning

What it isn't

  • Magic
  • AI
  • A replacement for anyone's job
  • Skynet ... yet
skynet

Machine Learning

What it is

  • Predictive analysis
  • Data Modeling
  • Uses existing algorithms to discover patterns
  • It learns
  • Machine Learning

    Learns

    Supervised
    All data is labeled, and algorithms learn to predict outcomes.
    Unsupervised
    Data is not labeled and the algorithm learns to structure the data.
    semi-supervised
    Some of the data is labeled, but most is not.

    Supervised

    UnSupervised

    Machine Learning

    Simpsons with great power comes great responsibility

    PredictionIO

    • Open source (apache-2.0)
    • Has library of algorithms (engines) available
    • Easy to wire to web apps
    • Has an awesome animal mascot
    PredictionIO Frog logo

    PredictionIO

    • Communicates with your web apps via Rest
    • Event Server
    • Data Storage
    • Machine learning Engines

    PredictionIO: Event Server

    • Apache HBase, MySQL, PostgreSQL
    • Passes data on to ML Engines

    PredictionIO: Engine

    • The 'brain' of the stack
    • Contains 1 (or more) algorithms to build training models
    • Can be queried to make predictions
    • More data === better
    PredictionIO Event Server
    experimental

    Demo #1

    Product Recommendations

    Initial setup

    • Have PredictionIO & dependencies installed
      • We have a self contained Docker image
      • Works beautifully with Docksal
    • Install desired engines
    • PredictionIO documentation is your friend

    Build Pio

    pio build --verbose
    
    	[INFO] [Engine$] Using command '/PredictionIO-0.12.0-incubating/sbt/sbt' at /engines/similar-text to build.
    	[INFO] [Engine$] If the path above is incorrect, this process will fail.
    	[INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.12.0-incubating.jar is absent.
    	[INFO] [Engine$] Going to run: /PredictionIO-0.12.0-incubating/sbt/sbt  package assemblyPackageDependency in /engines/similar-text
    	[INFO] [Engine$] [info] Loading project definition from /engines/similar-text/project
    	[INFO] [Engine$] [info] Set current project to pio-template-text-similarity (in build file:/engines/similar-text/)
    	[INFO] [Engine$] [success] Total time: 2 s, completed Aug 18, 2018 3:30:58 PM
    	[INFO] [Engine$] [info] Including from cache: scala-library-2.11.8.jar
    	[INFO] [Engine$] [info] Checking every *.class/*.jar file's SHA-1.
    	[INFO] [Engine$] [info] Merging files...
    	[INFO] [Engine$] [warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
    	[INFO] [Engine$] [warn] Strategy 'discard' was applied to a file
    	[INFO] [Engine$] [info] Assembly up to date: /engines/similar-text/target/scala-2.11/pio-template-text-similarity-assembly-0.1-SNAPSHOT-deps.jar
    	[INFO] [Engine$] [success] Total time: 12 s, completed Aug 18, 2018 3:31:09 PM
    	[INFO] [Engine$] Compilation finished successfully.
    	[INFO] [Engine$] Looking for an engine...
    	[INFO] [Engine$] Found pio-template-text-similarity_2.11-0.1-SNAPSHOT.jar
    	[INFO] [Engine$] Found pio-template-text-similarity-assembly-0.1-SNAPSHOT-deps.jar
    	[INFO] [Engine$] Build finished successfully.
    	[INFO] [Pio$] Your engine is ready for training.
    						

    Pio train

    pio train
    
    	[INFO] [CoreWorkflow$] Inserting persistent model
    	[INFO] [CoreWorkflow$] Updating engine instance
    	[INFO] [CoreWorkflow$] Training completed successfully.
    						

    Demo

    For when the live demo doesn't work

    
    	{
    	  "itemScores":[
    		{"item":"i0","score":0.7071067811865475},
    		{"item":"i1","score":0.7071067811865475},
    		{"item":"i2","score":0.5773502691896258},
    		{"item":"i3","score":0.5773502691896258}
    	  ]
    	}
    						

    Prediction Engine

    You can build your own

    
      def sumArray (m: Array[Double], n: Array[Double]): Array[Double] = {
        for (i <- 0 until m.length) {m(i) += n(i)}
        return m
      }
    
      def divArray (m: Array[Double], divisor: Double) : Array[Double] = {
        for (i <- 0 until m.length) {m(i) /= divisor}
        return m
      }
    
      def wordToVector (w:String, m: Word2VecModel, s: Int): Vector = {
        try {
          return m.transform(w)
        } catch {
          case e: Exception => return Vectors.zeros(s)
        }
      }
    
      def normalizet(line: String) = java.text.Normalizer.normalize(line,java.text.Normalizer.Form.NFKD).replaceAll("\\p{InCombiningDiacriticalMarks}+","").toLowerCase
    
    					

    Prediction Engine

    Proudly invented else where

    To the library!

    • Recommendations
      • The Universal Recommender - Cross-Occurrence (CCO) algorithm
      • Recommendation - Collaborative Filtering
      • E-Commerce Recommendation - MLLib ALS E-commerce algorithm fork
      • E-Commerce Recommendation (Java) - Like before but now with more Java!
      • Similar Product - recommends based on user's actions
      • Product Ranking
      • Music Recommendations
      • Complementary Purchase
      • Viewed This Bought That
      • Frequent Pattern Mining - FP Growth algorithm
      • Similar Product with Rating
    • Classification
      • Classification - Naive Bayes algorithm
      • Lead Scoring - likelihood user will convert
      • Text Classification - uses OpenNLP library for text vectorization
      • Churn Prediction - Attrition rate users across cell phone carriers
      • Classification Deeplearning4j - Uses Deeplearning4j library
      • Document Classification
      • Circuit End Use Classification - predicts circuit energy consumption
      • GBRT_Classification
      • classifier-kafka-streaming-template
      • sentiment analysis
      • Classification template for Iris
    • Regression
      • Survival Regression - Profiling customers who has a higher survival rate
      • Deep Learning Energy Forecasting
      • Electric Load Forecasting
      • Linear Regression BFGS
      • Regression template for Boston House Prices
    • Natural Language Processing
      • Text similarity engine
      • Topic Labelling with Wikipedia - auto tagging
      • Text Classification
      • OpenNLP Sentiment Analysis Template
      • Recursive Neural Networks (Sentiment Analysis)
    • Clustering
      • Topc Model (LDA)
      • KMeans-Clustering-Template - KMeans Algorithm
    • Similarity
      • Content Based SVD Item Similarity Engine
      • Similar Product with Rating
    Demo #2 (Mindgrub.com)

    Stay tuned

    Releasing Official PredictionIO Docker Images

    drupal commerce demo store

    Drupal Module(s) (?) maybe

    Resources

    fin project stop