Data Science Certification Course Syllabus

The Data Science Certification Course at Sha Data University offers comprehensive, hands-on training in key technologies such as R, Python, and Machine Learning.

With direct interaction with industry practitioners, practical labs, and live projects, this program equips you with the skills needed to excel in the field of Data Science and Machine Learning.

Our curriculum is meticulously designed by certified experts to cover all essential concepts in Data Science, enabling you to become proficient in these in-demand disciplines sought by organizations worldwide.

Course Curriculum:

Module 1: Introduction to Data Science


– Selecting rows and observations
– Rounding numbers
– Selecting columns and fields
– Merging datasets
– Data aggregation techniques
– Data munging methodologies

Module 2: Introduction to Python/R


– Overview of Python
– Reasons for choosing Python
– Python installation procedures
– Introduction to Python IDEs
– Overview of Jupyter Notebook
– Installation of Python IDLE for Windows and Linux
– Writing your first Python program: “Hello World”

Module 3: Python/R Basics


– Basic data types in Python
– Working with lists
– Slicing techniques
– Conditional statements (IF)
– Loop structures
– Understanding dictionaries
– Utilizing tuples
– Creating and using functions
– Introduction to arrays
– Selection by position and labels

Module 4: Python Packages


– Introduction to Pandas
– Overview of NumPy
– Introduction to Scikit-Learn
– Utilizing Matplotlib for data visualization

Module 5: Importing Data
– Reading data from CSV files
– Saving and loading Python data objects
– Writing data to CSV files
– Generating datasets and visualizing data
– Creating simple plots using Matplotlib
– Exploring data distributions with scatter plots
– Constructing histograms with Pygal for data analysis

Module 6: Data Manipulation


– Selecting and filtering rows and observations
– Rounding numerical values
– Selecting and merging columns/fields
– Implementing data aggregation techniques
– Advanced data munging practices

Module 7: Fundamentals of Statistics


– Understanding central tendency (mean, median, mode, skewness, normal distribution)
– Basics of probability (definitions, types, odds ratio)
– Exploring standard deviation, data deviation, and variance
– Bias-variance tradeoff (underfitting and overfitting)
– Introduction to distance metrics (Euclidean and Manhattan distances)
– Outlier analysis (definitions, interquartile range, box plots, scatter plots, Cook’s distance)
– Addressing missing values (central imputation, KNN imputation, dummification)
– Correlation analysis

Module 8: Error Metrics


– Classification metrics (confusion matrix, precision, recall, specificity, F1 score)
– Regression metrics (mean squared error, root mean squared error, mean absolute percentage error)

 

Machine Learning Modules:

Module 1: Supervised Learning


– Linear Regression (linear equations, slope, intercept, R-squared value)
– Logistic Regression (odds ratio, probabilities, ROC curve)
– Bias-variance tradeoff concepts

Module 2: Unsupervised Learning


– K-Means Clustering
– K-Means++ Algorithm
– Hierarchical Clustering

Module 3: Support Vector Machines (SVM)


– Understanding support vectors and hyperplanes
– Linear hyperplanes in a two-dimensional space

Module 4: SVM Kernels


– Linear kernel
– Radial basis function kernel
– Polynomial kernel

Module 5: Additional Machine Learning Algorithms


– K-Nearest Neighbors (K-NN)
– Naïve Bayes Classifier
– Decision Trees 
– Random Forest Algorithms