wysiwyg site design software

Damien SOUKHAVONG


MSc in Auditing, Management Accounting, and Information Systems

"If you fail at feature engineering, it is because you did not try enough."

Data Science
Example Projects


Laurae Education: Data Science / Artificial Intelligence Training
Ubisoft: intern training: without seeing data, without being physical present physically?

Bosch Production Line Performance: Reduce manufacturing failures
Draper Satellite Image Chronology: Can you put order to space and time?

Laurae Education
Data Science / Artificial Intelligence Training

As the years passed and my "clients" wanted to pay me for my services, I started my own company.
Unofficially, I was already teaching from year 2010.


I currently trained:

  • over 2,000 people for over 10,000 cumulated hours during the last 6 years
  • in reputable classes (ENOES, SKEMA Business School...), one-to-one, or remotely
  • in many domains (Data Science, IT, Taxation, Business, Accounting...)
  • for diverse backgrounds (Design, Arts, Astrophysics, Media, Business, Finance...)
  • for any academic level (Professionals, PhD / Doctorate, Master, Bachelor...)
  • for many purposes (High Frequency Trading, Research, Artificial Intelligence, Poker, Auditing, Fraud Management....)

Key features:

  • Small scale training
  • Numerous demand
  • Handle diverse backgrounds
  • Adapt to the purpose

Key achievements:

  • Overly happy customers (95%+) who want to pay for free service
  • Sustainability of teaching (6 years, over 3,000 clients and counting)
  • More & more data science -centric (over 1,000 trained in the last 2 years)
  • Share & Get knowledge from atypical domains (High Frequency Training, Poker...)

Key skills used:

  • Teaching / Training
  • Interpersonal Communication
  • Time Management
  • Flexibility
  • Adaptability
  • Self-Confidence

Ubisoft
Intern training, without seeing data, without being present physically

Ubisoft hired an intern during 2015 to predict its future games' sales from the past sales of previous games. However, the issue is the need to have a correct workflow to get from data to predictions, in a 2-3 month project.


Being contacted for helping the intern (without any disagreement from Ubisoft's management team), I helped the intern going from analyzing data to making a model with an appropriate validation scheme.


The major challenge was not training an intern for simple Data Science skills: it was more about not simultaneously working remotely, and not being able to see any data. Therefore, I had to go from my own judgement and exact understandable questions to assess whether a quantitative/qualitative value reported by the intern is an expected result or a wrong one.


The Business Intelligence Manager and the management at Ubisoft Germany were impressed by the presentation done at the end of the project by the intern, and have had never seen such thing yet..

Key features:

  • Remote working challenge
  • Inability to see the data
  • Training an intern on Data Science
  • Relying on the intern interpretations to assess whether expectations were met or not

Key achievements:

  • The Business Intelligence Manager and the management of Ubisoft Germany were impressed

  • Trained remotely the intern successfully
  • Made the intern produce an excellent Data Science work
  • The presentation was so good that the intern was invited, along with the school program manager, to present at the Top Management of Ubisoft (worldwide)

Key skills used:

  • R
  • Teaching / Training
  • Knowledge Transfer
  • Data Science
  • Machine Learning
  • Model Validation
  • Statistical Testing / Inference
  • Presentation
  • Self-Confidence

Bosch Production Line Performance
Reduce manufacturing failures

A good chocolate soufflé is decadent, delicious, and delicate. But, it is a challenge to prepare. When you pull a disappointingly deflated dessert out of the oven, you instinctively retrace your steps to identify at what point you went wrong. Bosch, one of the world's leading manufacturing companies, has an imperative to ensure that the recipes for the production of its advanced mechanical components are of the highest quality and safety standards. Part of doing so is closely monitoring its parts as they progress through the manufacturing processes.


Because Bosch records data at every step along its assembly lines, they have the ability to apply advanced analytics to improve these manufacturing processes. However, the intricacies of the data and complexities of the production line pose problems for current methods.


In this competition, Bosch challenge is to predict internal failures using thousands of measurements and tests made for each component along the assembly line. This would enable Bosch to bring quality products at lower costs to the end user.

Keys features:

  • Extremely unbalanced Binary Classification (approximately 1:200)
  • Big data set (500GB+ in memory in Python when fully loaded, about 10 billion elements)
  • Very slow modeling / data mining speed along with unreliable performance metric

Keys achievements:

  • 8th (top 1%), gold medal
  • Data Science project –based mode with a multicultural team of 8 people (from anywhere in the world) joining exclusively to learn from me
  • Laborious screening of candidates for my team (too many candidates)
  • Managed a team of 8 to work towards success
  • Earned Master tier at Kaggle in Data Science competitions / discussions, which was requested by many for me for months
  • Created the first (and fastest, at release vs Python) R package for a LightGBM wrapper

Key skills used:

  • R / Python / SQL / JMP / SPSS
  • Cross-validation
  • Team Management
  • Project Management
  • Teaching / Training
  • Knowledge Transfer
  • Reporting
  • Gradient Boosted Trees
  • Random Forests
  • Neural Networks

Draper Satellite Image Chronology
Can you put order to space and time?

Imagine a world where we can use satellite images to help find better access to clean water, prevent poaching of wildlife, predict storms more efficiently, optimize traffic patterns more readily, and inform human behaviors to mitigate the spread of disease.


Thanks to a marked increase of satellites in orbit, we will be able to capture images – and the information contained within – of nearly every place on Earth, every day by 2017. However, our ability to analyze datasets of these images has not advanced as quickly. Changes from day to day in images of the same location are subtle, can be hard to detect, and are difficult to understand in terms of their significance.


In this competition, Draper provides a unique dataset of images taken at the same locations over 5 days. The challenge is to predict the chronological order of the photos taken at each location. Accurately doing so could uncover approaches that have a global impact on commerce, science, and humanitarian works.

Keys features:

  • Image-based ordering (ranking)
  • Any wrong image order (even if correct in the right order) provides the inverse score
  • Extremely small data set (70 sets of 5 pictures for training)
  • Very large picture prone to overfitting (3200x2400)
  • Hand labeling authorized due to the difficulty of the task

Keys achievements:

  • 2nd (top 8%) in 2 hours only
  • Showcased on Kaggle's Data Science blog (Interview Draper / Kaggle)
  • Image-based feature extraction
  • Transformation of Ranking to Regression task to counteract overfitting

Key skills used:

  • R / Python / ImageJ / Tableau
  • Cross-validation
  • Image Registration
  • Image Masking
  • Image Feature Extraction
  • Gradient Boosted Trees
  • Deep Learning (Neural Networks)
  • Transformation of Ranking to Regression task to avoid overfitting

Educational Background

- SKEMA Business School (2013-2016), top 6-9 business school in France, 26th worldwide:
   - MSc Auditing, Accounting Management, Information Systems, obtained with Distinctions
   - Thesis Major (1st) using data science, data visualization, statistics, and machine learning

- ENOES (2010-2013), the top accounting school in France for decades: double degree & certification:
   - Diplôme de Comptabilité et de Gestion (Accounting and Management Degree)
   - Diplôme Supérieur d'Audit et de Comptabilité (Auditing, Finance, and Accounting Higher Degree), obtained with Distinctions
   - Certification: Payroll Management Certification

SKEMA Business School (13-16)

Graduate degree: MSc Auditing, Management Accounting, Information Systems

Although not directly related to data science, several curriculum components were (in)directly linked to data science:
  • Business Intelligence / SQL
  • Statistics for Finance
  • Data Analysis / VBA / Interactive Reporting
  • Thesis


My thesis (Reporting visuals in both diagnosing and forecasting and its impact on decision-making in small businesses) covers several data science themes and related themes in 172 pages, such as:

  • Data Visualization / Business Intelligence
  • Machine Learning
  • Statistics
  • Artificial Intelligence


    A non-exhaustive list of technical elements used is below:
            • Research Design: blind trial model, randomized trial model, crossover trial model, controlled trial model, convenience purposive nonrandom sampling, partial counter-balancing within-subject design, cross-validation
            • Data Visualization/Statistical values: Descriptive Statistics, Pearson / Spearman Correlation Coefficient, Kendall tau-b, Gamma / Chi-Square / Inertia / Communality statistic, Kruskal's stress type-1, Shepard Diagram, Lebert's test-values, Correlation / Similarity / Dissimilarity Matrices, Q-Q plots, Scree plot, Correlation chart / map, (Standardized) Regression residuals
            • Statistical Tests: Monte-Carlo / .632 Bootstrapping simulation, Mann-Whitney U-test, Fisher's exact test, Friedman test, Breusch-Pagan test, White's test, Shapiro-Wilk test, Anderson-Darling test, Lilliefors test, Jarque-Bera test, Kaiser-Meyer-Olkin measure of sampling adequacy, Bartlett's sphericity test, Wilks' Lambda test, Likelihood ratio
            • Machine Learning: Linear regression, Clustering via Scaling Expectation-Maximization, Decision Trees, Chi-Square Automatic Interaction Detector, Principal Component Analysis / Factor Analysis, Multidimensional Scaling, Correspondence Analysis / Multiple Correspondence Analysis, Agglomerative Hierarchical Clustering

            Keys achievements:

            • Thesis Major (1st)
            • Degree with Distinctions
            • Delegate / Sub-delegate
            • Requested to teach with the Data Analysis / Interactive reporting teacher
            • Finished alone in 1 day earlier the 2-day SAP task without any extensive experience on SAP

            ENOES (10-13)

            Double degree & Certification:
            - Diplôme de Comptabilité et de Gestion (Accounting and Management Degree)
            - Diplôme Supérieur d'Audit et de Comptabilité (Auditing, Finance, and Accounting Higher Degree), obtained with Distinctions
            - Certification: Payroll Management Certification

            Courses received but also taught back.

            Although not directly related to data science, I took a place at ENOES (the top leading accounting school in France) allowing me to teach small classes (while I was a student) and large classes (while not being a student anymore).


            As ENOES is a non-profit organization, the courses taught are also free of charge (except for external students).

            Small classes could include:

            • Financial Mathematics
            • Corporate Finance
            • Information Systems
            • Accounting Management
            • Law / Taxation
            • Statistics

            Large classes could include:

            • Financial Mathematics
            • Information Systems
            • Accounting Management
            • Mathematics for Management
            • Machine Learning for Auditing/Accounting Management 
            • Data Science for Auditing/Accounting Management

            Keys achievements:

            • Over 50 mini-classes taught throughout 2010-2013
            • 6 large classes taught throughout 2014-2016
            • 8th prize in France in Auditing, Finance, and Accounting (Francis Lefebvre)
            • Class major 6 times over 3 years (Honors)
            • Class Delegate

            Keys skills:

            • Training / Teaching
            • Knowledge transfer
            • Domain knowledge
            • Small scale understandable Data Science / Machine Learning

            Work Experience

            Running my own training business: Laurae Education
            Cofounder of UPECS, the top online leading non-profit in Science/Calculators
            Sysadmin / Developer / Trainer at SYPRO Formation
            Reporter / Web Editor at meltygroup
            Assistant SEO Project Manager at Meilleur Mobile

            Laurae Education

            My own business: providing Data Science courses, Information Systems services, Consulting, and Examination.
            Possibility to pursue a M.Ed in Digital Learning and Leading at Lamar University.

            I created this personal company to receive the money my clients who wanted to pay me for my services. Usually, my services were done for free. It is now closed from 2016. The services were running from 2010.


            It is, in its basis, a Certiport examination center, providing examination facilities for any candidate who wish to take an exam. It can provide examination seats for Microsoft (Office, Educator, Technology), Adobe (Associate, Expert), Autodesk (User), IC3, and Intuit Quickbooks. In addition, I was trusted enough to run my own mobile examination center, which means I can allow a candidate to take an exam from anywhere, as long as the candidate meets the minimum standards to take an exam.


            I provide also technical assistance and support for specific Certiport examination centers.


            In addition to the exams, I teach courses and provide trainings in many domains, such as:


            • Data Science / Artificial Intelligence / Statistics
            • Information Systems
            • French Law
            • Business Intelligence / Reporting
            • Research
            • Mathematics
            • Finance
            • Creativity / Design


            Nearly all my courses are free of charge and last one week, except for one-to-one teaching which can last from one single hour to several days. I taught to over 2000 students.

            I teach in training centers, but sometimes also directly in top leading schools (accounting / business).

            • Created my own company
            • Trustable to run a mobile examination center
            • Teaching courses in diverse domains
            • Teaching courses in top leading schools
            • Ability to provide certifications
            • Ability to provide a pre-examination for pursuing a Graduate Degree in Education in USA
            Moreover, I have an examination my candidates can undertake to be able to pursue a Graduate Degree in USA (M.Ed in Digital Learning and Leading) at Lamar University.

            UPECS (Union pour la Promotion des Enseignements et Carrières Scientifiques) 

            The non-profit I cofounded, which is now the leading in its Scientific/Calculators area of expertise.
            Handling the Data Analysis and Budgeting of the organization.
            Sponsored & Funded by Texas Instruments and thousands of individuals.

            I cofounded with four other members this non-profit organization to help students paving their way in any scientific domain (or not scientific domain) post-secondary school. It is now the leading organization in France, providing the most insightful knowledge about Texas Instruments / Casio / Hewlett-Packard calculators.


            Before the organization was created, I quickly earned the spot of Moderator (a rare spot, and it was only in weeks), and was quickly put to Global Moderator (after a complete re-organization) until I got the Administrator role.


            We provide a free cloud service, which allows members to create pre-formatted files / programs for their calculators. We also provide an open discussion space (forum / chat) where people can discuss about anything.


            We also provide law support to students when examiners are going over their limits. In this case, I am the one handling it when I can.

            • Quickly provided a rare moderator spot
            • Co-founded with four other members scattered in France
            • First community in France in the scientific/calculator space
            • Providing cloud services for free
            • Providing law support if needed (free)
            • Supporting students through simple-to-use programs, examination corrections, technical to very technical articles (for those interested to push very deep their calculators – for instance using ASM / assembly language)
            • Doing the Data Analysis / Accounting / Budgeting to ensure funds are plenty enough to cover the recurring costs.
            This non-profit organization is similar (at a smaller scale in general Education) to another non-profit I have been member of, Perspectiv' Paris, where I was the Manager of Tutoring Sessions (except it was restricted to Paris, and in any domain).

            SYPRO Formation

            The Sysadmin / Developer / Trainer who managed the company.
            90% IT human time savings and 80% IT fixed costs reduction included.

            This is where I was working and dedicate some work during my free time. But also the reason I left meltygroup, even if it were to interview Ronaldo, Taylor Swift, and other worldwide stars - or running my own chronicle.


            A simple training organization, where I have re-created all the IT from scratch.


            • Enhanced all IT processes to gain over 90% lost time in repetitive tasks
            • Purchased and set up the Server as a Deployment Server (Windows Deployment Services)
            • Reduced IT investment costs by over 80%
            • Trained the Manager how to reduce IT fixed cost
            • Developed the corporate website while fixing Accounting issues related to it


            In addition, I was a trainer who provided 100% success rate at examinations:

            • Microsoft Office Specialist
            • Adobe Certified Associate
            • PCIE


            I hold the examination speed record at PCIE and Microsoft Office Specialist in France, ending examinations in under 5-20 minutes only while they are supposed to be tight for most and last at least 40-45 minutes.


            I also managed the company for 2 weeks, to replace the Manager.

            meltygroup

            Reporter / Web Editor about IT, Devices, Fashion, Design and Architecture. Better get prepared to be a specialist.
            "Swag" glasses included on recruitment.

            This is the place where if you want to have social interaction with other reporters and stars, you must be there.


            I wrote articles in four domains where I have/had extensive knowledge:


            • IT: computers, Intel CPUs,...
            • Devices: phones, tablets....
            • Fashion: clothes, dresses...
            • Design / Architecture: buildings, art...


            If you can convince with data, not only you will get time and get opportunities.


            This is a top notch place to work, if you can work there. You will travel often. Wishing to see your manga/anime conventions? Then go ahead.


            I was provided my own chronicle to run, which was unfortunately cancelled as I went to SYPRO Formation, another great company with insane learning opportunities. I did not stay there long even if I enjoyed the moments at melty, a great place to work.


            Before leaving, I was asked if I wanted to take a developer job at meltygroup, as I was excelling at troubleshooting and providing remedies for their reporting backend. And that was only 2 weeks: a developer job package offer with and a chronicle offer.


            I asked with audace a reporter to request at interview with Ronaldo. Actually, he did not only get an interview with Ronaldo scheduled, but: travel, hosting, and external fees were setup and paid by melty, along with the airplane tickets scheduled 7 days ahead.

            Meilleur Mobile

            Assistant SEO Project Manager. A mix of SEO and data analysis optimization.

            Daily reportings are great to start a workday, so you can aim specifically what you must fix when seeing your weekly results. But what if the process to generate them is slow?


            Providing the ability to enhance the reporting speed, working 200% faster in the morning means you can dedicate more time on less repetitive tasks:


            • Consolidation of databases
            • Joining multiple databases
            • Cleaning data
            • Analyzing observations meaningfully
            • Not getting caught by the outliers for significance


            I also took some initiatives: providing bug fixes to the website, spotting quantitative errors by eye in thousands of dynamic pages, providing a new framework for comparing phones...

            Achievements

            Kaggle Data Science: Master tier in Competitions and in Discussions
            Adobe Education: HEC Paris & Speaker at Adobe UK
            Business Game - SKEMA 2013: 3rd sub-universe, 1st creativity
            Certifications: over 40 earned
            Kaggle: Details

            Kaggle Data Science

            The Home of Data Science.

            The scores speak by themselves (as of 21/11/2016):
            - 2nd worldwide in Discussions (Master)
            - Competition Master in less than 9 months in a very competitive environment

            I am the insight at any time I write. Even recommended by the #1 Discussion Walter Reade (aka inversion).

            Adobe Education

            Certificates: Adobe Education Trainer
            Work: Creative Instructor
            Continued education: 2x 75 hours

            Although not directly related to data science, creativity has a large place when we have to work on process and attempt to engineer features. I took a large place during Adobe Education Train the Trainer events that now occurs twice every year. It is a 75-hour training for educators/trainers to learn how to train other educators for creativity.


            I passed successfully all the examinations during the 2014 and 2016's cohorts. I also made a book from 2014 (391 pages) and Winter 2015 (361 pages) (copyrighted by Adobe). They are available only on-demand.


            During the 2014's course, I was spotted by the Worldwide Education Program Manager (Melissa Jones) at Adobe who got impressed by my book, and I was provided the opportunity to visit an entrepreneurship class at HEC Paris.


            I was also spotted by Phil Badham (Educational and Creative Technologies Consultancy), an Education Leader and popular trainer for his "Outstanding Teaching" courses and interventions throughout the recent years. He provided me the opportunity a speaker place at future Adobe events in United Kingdom.


            My final work in 2014 was also highlighted at Adobe Education as exemplar.


            Keys achievements:

            • Spotted by the Worldwide Education Program Manager and offered a class visit at HEC Paris
            • Spotted by a famous UK Education Leader and offered to be a speaker at a future Adobe event
            • Final work highlighted at Adobe Education as exemplar
            • Earned the Adobe Education Trainer credential twice
            • Worked for Adobe Education as an Instructor

            Keys skills used:

            • Expertise in adult learning theory and best practices for training on Adobe products
            • High level of experience with the Adobe Education Exchange and Adobe Connect
            • Strong written and oral communication skills
            • Ability to represent Adobe Education professionally and conscientiously throughout the course

            Business Game – SKEMA Business School 2013

            Although not directly related to data science, this business game required team management, decision-making in uncertain environment, and forecasting/prediction. With over 400 variables to take into account and 40s of required inputs per cycle, it is not supposed to be an easy task. It requires also exploiting inside-out knowledge of company management (worker happiness, quality…).

            Roles undertaken:

            • Chief Executive Officer
            • Finance Director
            • Production Assistant

            Key achievements:

            • 3rd best team in the mini-universe played
            • 1st global best team in creativity, any campuses
            • When allotted 2 hours of decision-making, the automated analysis I setup could make a safe decision with appropriate insights under 2 minutes (even if it is only a decision assistant)
            • Reviewed as impressive individually by the multiple business game supervisors for the quality of the decision-making assistant tool
            • Reverse engineered the predictions with an absolute approximation error of 5% (due to fluctuations from the other teams' decisions)

            Key skills used:

            • Team Management
            • Data Analysis
            • Multivariate Forecasting
            • Reverse Engineering
            • Creativity

            Certifications

            I own many certifications (Desktop software, Server, IT Management, Innovation, Education) that it is hard to get a full list of it in one place. But I'll list the most there (the most important ones), sorted by type.


            For proof, please use the following link with the following credentials for Microsoft certifications: 

            Logins: 1140043 / LauraeEdu


            For the specific certifications not appearing in the list, they are available only on-demand.

            Desktop / Management Software Related


            Microsoft – Information Office Worker – Office Desktop


            Master:

            • Microsoft Office 2013 Master Specialist
            • Microsoft Office 2010 Master Specialist


            Word:

            • Microsoft Word 2016 Expert
            • Microsoft Word 2013
            • Microsoft Word 2010 Expert


            Excel:

            • Microsoft Excel 2016
            • Microsoft Excel 2013 Expert
            • Microsoft Excel 2010


            PowerPoint:

            • Microsoft PowerPoint 2016
            • Microsoft PowerPoint 2013
            • Microsoft PowerPoint 2010


            Outlook:

            • Microsoft Outlook 2013
            • Microsoft Outlook 2010


            Adobe:

            • Adobe Certified Associate (Visual Communication)



            Technical IT Related


            Windows Desktop:

            • Windows 8
            • Windows 7
            • Windows 7, Configuring
            • Windows 7 and Office 2010, Deployment
            • Windows 7, Enterprise Desktop Support Technician
            • Windows 7, Enterprise Desktop Administrator


            Windows Server:

            • Solutions Expert: Desktop Infrastructure
            • Server Virtualization with Windows Server Hyper-V and System Center
            • Windows Server 2012
            • Virtualization Administrator on Windows Server 2008 R2 
            • Windows Server 2008 R2, Server Virtualization
            • Windows Server 2008 R2, Desktop Virtualization
            • Networking Fundamentals


            Office 365:

            • Office 365
            • Administering Office 365 for Small Business


            SQL Server:

            • SQL Server 2012/2014


            Dynamics CRM:

            • Microsoft Dynamics CRM 2013 Applications
            • Microsoft Dynamics CRM 2013 Deployment
            • Microsoft Dynamics Customization and Configuration in CRM 2013


            Miscellaneous:

            • Microsoft Certified Professional

            Management / Education Related


            IT Management:

            • Managing Projects and Portfolios with Project Server 2013
            • Managing Projects with Microsoft Project 2013
            • Designing, Assessing, and Optimizing Software Asset Management (SAM)
            • Volume Licensing Specialist, Large Organizations
            • Volume Licensing Specialist, Small and Medium Organizations


            Education-related:

            • Microsoft Certified Trainer (2014-2016 cycles)
            • Microsoft Innovative Educator Trainer
            • Microsoft Innovative Educator
            • Microsoft Certified Educator (UNESCO)
            • Adobe Education Trainer
            • Adobe Certified Educator 
            • Teaching with Technology (TwT)

            Non-Certification Related

            USA Graduate (Master) Degree entrance pre-exam (M.Ed in Digital Learning and Leading)


            I am allowed to have my trainees (in Education) undertake an examination which was written co-jointly by Microsoft & UNESCO, which opens the door to a Master Degree in the USA (Lamar University, provided they have enough ECTS credits to transfer to an educational curriculum. The examination is based off UNESCO ICT Competency Framework for Teachers, and is a very difficult examination for those who never taught in a real classroom, and were not teachers/trainers at any time in their life.


            It requires the following sets of skills:

            • Education Policy 
            • Curriculum and Assessment
            • Pedagogy
            • ICT / Technology Tools
            • Organization and Administration
            • Professional Development


            A second examination, administered by Lamar University, allows to get up to 18 credit hour until you can officially be a student for the M.Ed in Digital Learning and Leading programmme.


            Books


            I wrote several books for the examinations of Accounting and Management Degree:

            • UE1 Law Introduction (112p)
            • UE2 Corporate Law (102p)
            • UE4 Taxation (123p)
            • UE5 Economics (39+50p)
            • UE7 Management (181+180p)
            • UE8 Information Systems (484p)
            • UE11 Accounting Management (111p)
            • UE11 Mathematics for Management (72p)


            Software and Hobbies

            I am able to use many statistical / data visualization software, such as:

            • R / Microsoft R Open / Markdown
            • Python
            • H2O Flow
            • Weka
            • Amelia / Amelia II
            • Tableau
            • SPSS
            • JMP
            • Stata
            • SAS
            • GraphPad Prism
            • XLSTAT
            • Excel Data Mining for SQL Server
            • SQL Server Data Tools for Visual Studio
            • Oracle Crystal Ball
            • RExcel
            • Power BI


            I am also able to create and run virtualized environments. This includes:

            • Windows Server (Desktop Deployment Servers)
            • Hyper-V
            • Virtualbox
            • VMWare
            • Bluestacks (Android)
            • … both on Windows & Linux

            In addition, I do a bit of (graphic) design, e-Learning, storytelling, along with photography being both my passions:

            • Adobe Photoshop (graphic design) 
            • Adobe Captivate (e-Learning)
            • Adobe Presenter (e-Learning / storytellling)
            • iSpring Suite (e-Learning)
            • DAZ Studio (graphic design / 3D)
            • onOne Perfect Photo Suite (graphic design)
            • Nik Software (graphic design)
            • SimpleDiagrams (storytelling)
            • Microsoft Word
            • Microsoft Excel / VBA
            • Microsoft PowerPoint
            • Microsoft OneNote
            • Microsoft Visio
            • Microsoft Project

            Kaggle: Details

            Santander Customer Satisfaction: Which customers are happy customers?
            Predict Red Hat Business Value: Classify customer potential
            TalkingData Mobile User Demographics: Get to know millions of mobile device users
            Expedia Hotel Recommendations: Which hotel type will an Expedia customer book?
            BNP Paribas Cardif Claims Management: Can you accelerate the claims mgmt process?
            Home Depot Product Search Relevance: Predict the relevance of search results

            Santander Customer Satisfaction
            Which customers are happy customers?

            From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers do not stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving.


            Santander Bank is asking help to identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness before it is too late.


            In this competition, you will work with hundreds of anonymized features to predict if a customer is satisfied or dissatisfied with their banking experience.

            Keys features:

            • Heavily unbalanced binary classification (1:20)
            • Anonymized features
            • Small data set (approximately 100k observations for training)
            • Heavily noisy data set (extremely easy to overfit)

            Keys achievements:

            • 82nd (top 2%) when the 1st dropped to 2670th place
            • Teamwork (team of 3 people learning from me)
            • One of ten top 100 that remained one in the top 100 after the overfitting was revealed
            • Theorized a new model ensembling technique that proves useful in unbalanced scenarios
            • Theorized two supervised models: Outlying Univariate Continuous Association Rule Finder, and Outlying Bivariate Linear Continuous Association Rule Finder (via Mahalanobis distance)

            Key skills used:

            • R / Python / JMP / SPSS
            • Cross-validation
            • Deep overfitting awareness
            • Subsampling Bagging
            • Powering Ensembling
            • Gradient Boosted Trees
            • Neural Networks
            • Random Forests
            • k-Nearest Neighbors
            • Application of Regression to Classification
            • Team Management
            • Automated reporting

            Predicting Red Hat Business Value
            Classify customer potential

            Like most companies, Red Hat is able to gather a great deal of information over time about the behavior of individuals who interact with them. They are in search of better methods of using this behavioral data to predict which individuals they should approach—and even when and how to approach them.


            In this competition, the challenge is to create a classification algorithm that accurately identifies which customers have the most potential business value for Red Hat based on their characteristics and activities.


            With an improved prediction model in place, Red Hat will be able to more efficiently prioritize resources to generate more business and better serve their customers.

            Keys features:

            • Binary classification
            • 50,000+ Sparse Categorical structural-based classification for 2 million 
            • Time-series dependence

            Keys achievements:

            • 87th (top 4%)
            • Teamwork (team of 3 people)
            • Forked the sparsity package in R to make it work with the current version of R, Rcpp, and Rtools for Windows (LightSVM format Input/Output, around 100-1,000,000x faster than most similar packages)

            Key skills used:

            • R / Python / JMP / SPSS
            • Cross-validation
            • Time-Series Analysis
            • Sparse matrices
            • Linear regression on extremely large sparse matrices (2,000,000 x 1,000,000)
            • Global Optimization by Differential Evolution
            • L-BFGS-B optimization (Limited Memory Broyden–Fletcher–Goldfarb–Shanno Bounded algorithm)
            • Automated Reporting
            • Gradient Boosted Trees
            • Neural Networks
            • k-Nearest Neighbors
            • Factorization Machines

            TalkingData Mobile User Demographics
            Get to know millions of mobile device users

            Nothing is more comforting than being greeted by your favorite drink just as you walk through the door of the corner café. While a thoughtful barista knows you take a macchiato every Wednesday morning at 8:15, it is much more difficult in a digital space for your preferred brands to personalize your experience.


            TalkingData, China’s largest third-party mobile data platform, understands that everyday choices and behaviors paint a picture of who we are and what we value. Currently, TalkingData is seeking to leverage behavioral data from more than 70% of the 500 million mobile devices active daily in China to help its clients better understand and interact with their audiences.


            In this competition, the challenge is to build a model predicting users’ demographic characteristics based on their app usage, geolocation, and mobile device properties. Doing so will help millions of developers and brand advertisers around the world pursue data-driven marketing efforts, which are relevant to their users and catered to their preferences.

            Keys features:

            • Hierarchical 6x2-class classification (total: 12 classes)
            • Many small data sets (up to 1,000,000 observations) with extremely high cardinality
            • Data cleaning is mandatory, many useless variables
            • Textual (non-native language), numeric, and categorical features
            • 64-bit integers tripping most statistical programming languages (R, Python, etc.)
            • Extremely difficult task (the best model is only 10% better than a random number generator)

            Keys achievements:

            • 83rd (top 5%) vs Turi's (Apple) benchmark model being 1475th
            • Working with high cardinality categorical features
            • Text-based feature extraction

            Key skills used:

            • R / Python / SQL / JMP / SPSS
            • Cross-validation
            • Data Aggregation
            • k-means Clustering
            • Sparse matrices
            • Gradient Boosted Trees
            • Neural Networks
            • Generalized Linear Models

            Expedia Hotel Recommendations
            Which hotel type will an Expedia customer book?

            Planning your dream vacation, or even a weekend escape, can be an overwhelming affair. With hundreds, even thousands, of hotels to choose from at every destination, it is difficult to know which will suit your personal preferences. Should you go with an old standby with those pillow mints you like, or risk a new hotel with a trendy pool bar?


            Expedia wants to take the proverbial rabbit hole out of hotel search by providing personalized hotel recommendations to their users. This is no small task for a site with hundreds of millions of visitors every month!


            Currently, Expedia uses search parameters to adjust their hotel recommendations, but there are not enough customer specific data to personalize them for each user. In this competition, Expedia is challenging competitors to contextualize customer data and predict the likelihood a user will stay at 100 different hotel groups.


            The data in this competition is a random selection from Expedia and is not representative of the overall statistics.

            Keys features:

            • 100-class non-static classification (classes changing over time)
            • Big data set (a simple gradient boosted tree on all data would require over 3TB of RAM)
            • Large time-series dependence

            Keys achievements:

            • 122nd (top 7%) vs Expedia's benchmark model being 1794th
            • Time-series feature extraction
            • Machine Learning on big data using 16GB RAM only

            Key skills used:

            • R / Python / JMP / SPSS
            • Cross-validation
            • Clustering and Time-scoring
            • Autoencoder
            • Gradient Descent optimization
            • Subsampling 
            • Sparse matrices
            • Factorization Machines
            • L-BFGS optimization (Limited Memory Broyden–Fletcher–Goldfarb–Shanno algorithm)
            • Batched Multilayer Perceptrons with Softmax Activation


            BNP Paribas Cardif Claims Management
            Can you accelerate BNP Paribas Cardif's claims management process?

            As a global specialist in personal insurance, BNP Paribas Cardif serves 90 million clients in 36 countries across Europe, Asia, and Latin America.


            In a world shaped by the emergence of new uses and lifestyles, everything is going faster and faster. When facing unexpected events, customers expect their insurer to support them as soon as possible. However, claims management may require different levels of check before a claim can be approved and a payment can be made. With the new practices and behaviors generated by the digital economy, this process needs adaptation thanks to data science to meet the new needs and expectations of customers.


            In this challenge, BNP Paribas Cardif is providing an anonymized database with two categories of claims:

            • Claims for which approval could be accelerated leading to faster payments
            • Claims for which additional information is required before approval

            Keys features:

            • Binary classification
            • Anonymized features
            • Small data set (approximately 100k observations for training)

            Keys achievements:

            • 112th (top 4%) vs BNP Paribas' benchmark model being 2456th
            • First appearance in a public data science competition
            • Became quickly one of the most respected person
            • Solo work

            Key skills used:

            • R / Python / JMP / SPSS
            • Automatization of credit scoring and reporting
            • Gradient Boosted Trees
              Random Forests
            • Extremely Randomized Trees
            • Tableplots
            • Sparse matrices
            • t-Stochastic Neighbor Embedding
            • Partial dependence
            • Feature Selection via Feature Importance of models

            Home Depot Product Search Relevance
            Predict the relevance of search results on homedepot.com

            Shoppers rely on Home Depot’s product authority to find and buy the latest products and to get timely solutions to their home improvement needs. From installing a new ceiling fan to remodeling an entire kitchen, with the click of a mouse or tap of the screen, customers expect the correct results to their queries – quickly. Speed, accuracy, and delivering a frictionless customer experience are essential.


            In this competition, Home Depot is asking help to improve their customers' shopping experience by developing a model that can accurately predict the relevance of search results.


            Search relevancy is an implicit measure Home Depot uses to gauge how quickly they can get customers to the right products. Currently, human raters evaluate the impact of potential changes to their search algorithms, which is a slow and subjective process. By removing or minimizing human input in search relevance evaluation, Home Depot hopes to increase the number of iterations their team can perform on the current search algorithms.

            Keys features:

            • Contextual Text-based regression
            • Small data set (approximately 100k observations for training)
            • Models must generalize to unknown contexts (easy to overfit)
            • Contaminated test set to discourage cheating

            Keys achievements:

            • 213rd (top 11%) vs Home Depot's benchmark model being 1680th
            • Textual-based feature extraction/engineering
            • Context-based feature extraction/engineering

            Key skills used:

            • R / Python / JMP / SPSS
            • Cross-validation
            • Stemming
            • Segmentation
            • Term Frequency – Inverse Document Frequency (TF-IDF)
            • Singular Value Decomposition (SVD) 
            • Sparse matrices
            • Gradient Boosted Trees
            • Random Forests
            • Neural Networks