What we will learn

Data Science Primer

The source of this summary The first link

Data Science Primer

Data Science

Bird’s Eye View

Machine learning

Machine learning is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence.But Machine learning is a comprehensive approach to solving problems. individual algorithms are only one piece of machine learning

Key Terminology

Model - a set of patterns learned from data.
Algorithm - a specific ML process used to train a model.
Training data - the dataset from which the algorithm learns the model.
Test data - a new dataset for reliably evaluating model performance.
Features - Variables (columns) in the dataset used to train the model.
Target variable - A specific variable you’re trying to predict.
Observations - Data points (rows) in the dataset.

Exploratory Analysis

Which is just fancy-talk for “getting to know” your data.Doing so upfront will make the rest of the project much smoother,exploratory analysis for machine learning should be quick, efficient, and decisive, not long and drawn out

You’ll gain valuable hints for Data Cleaning (which can make or break your models).
You’ll think of ideas for Feature Engineering (which can take your models from good to great).
You’ll get a “feel” for the dataset, which will help you communicate results and deliver greater impact

Plot Numerical Distributions

At this point, you should start making notes about potential fixes you’d like to make. If something looks out of place, such as a potential outlier in one of your features, now’s a good time to ask the client/key stakeholder, or to dig a bit deeper.

Data Cleaning

If you have a clean dataset, even simple algorithms can learn impressive insights from it. proper data cleaning can make or break your project. Professional data scientists usually spend a very large portion of their time on this step.

Handle Missing Data you cannot simply ignore missing values in your dataset.

the 2 most commonly recommended ways of dealing with missing data

- Dropping observations that have missing values

- Imputing the missing values based on other observations

Feature Engineering

Feature engineering is about creating new input features from your existing ones.You can isolate and highlight key information, which helps your algorithms “focus” on what’s important.You can bring in your own domain expertise.nce you understand the “vocabulary” of feature engineering, you can bring in other people’s domain expertise

Algorithm Selection

In applied machine learning, individual algorithms should be swapped in and out depending on which performs best for the problem and the dataset. Therefore, we will focus on intuition and practical benefits over math and theory

Regularization in Machine Learning

Regularization is a technique used to prevent overfitting by artificially penalizing model coefficients.It can discourage large coefficients (by dampening them).It can also remove features entirely (by setting their coefficients to 0), The “strength” of the penalty is tunable.

Model Training

Split Dataset

If you evaluate your model on the same data you used to train it, your model could be very overfit and you wouldn’t even know! A model should be judged on its ability to predict new, unseen data.