7 Steps To Build Your Machine Learning Model with Python

Payoda Technology Inc
4 min readMar 29, 2024

--

Without the need for explicit programming, computers can now learn from data and gradually make better decisions thanks to a ground-breaking field of artificial intelligence known as machine learning (ML). Algorithms for machine learning (ML) can perform a multitude of functions, such as image and speech recognition, user experience customization, and prediction. Finding patterns and insights in massive amounts of data is how this is accomplished. By the end of this tutorial, you’ll have a solid foundation in creating and evaluating machine learning models, and you’ll be ready to explore more advanced concepts in the field.

Prerequisites to Build Your ML Model with Python

coding background and machine learning depicting picture with blue background
Image Sourced from Freepik

Before diving into building a machine learning model, make sure you have the following prerequisites:

1. Python Installed: Ensure you have Python installed on your system.

2. Python Libraries: In this tutorial, several Python libraries will be used, including NumPy, pandas, sci-kit-learn, nltk and gensim. You can install them using pip:

pip install numpy pandas scikit-learn nltk gensim

3. Jupyter Notebook: Jupyter Notebook is a great interactive environment for data analysis and machine learning.

Step 1: Importing Libraries

Open your Python environment (Jupyter Notebook or any Python IDE) and start by importing the necessary libraries:

import pandas as PD
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score, KFold
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
warnings.filter warnings("ignore")
%matplotlib inline

Step 2: Loading and Exploring Data

For this tutorial, let’s use a simple example of predicting Customer Satisfaction. The dataset is available in Kaggle, and you can use that and load it into a pandas DataFrame:

data = pd.read_csv("/content/customer_support_ticket_train_set.csv")
Step 3: Exploratory Data Analysis (EDA)

Explore the data by checking the first few rows, data types, and summary statistics data.head() — Will display the first five records of the table

data.info() — Will display the column name and its corresponding data type, count, etc.

Explore the data using graphs and charts like pie charts, bar charts, heat maps, etc.; here is an example of a bar chart that counts the customer satisfaction rating via customer gender.

# Count Plot
sns.countplot(data, x = data['Customer Satisfaction Rating'], hue='Customer Gender') plt.title('count of customer rating based on gender, loc = 'left', pad = 10, size = 15)

Step 4: Feature Engineering

The process of turning unstructured data into useful features that enhance machine learning models’ performance is known as feature engineering. Predictive modelling entails choosing, altering, or adding new variables to capture the crucial elements of the information.

# Here, calculating the total hours for the ticket status is closed based on the First response time and time to resolution

data['First Response Time'] = pd.to_datetime(data['First Response Time'])
data['Time to Resolution'] = pd.to_datetime(data['Time to Resolution'])
data['time difference hours'] = abs((data['Time to Resolution'] - data['First Response Time']).dt.total_seconds() / 3600)

Step 5: Encoding

Categorical variables are transformed into numerical form using the label encoding approach, which is employed in data preprocessing for machine learning to make them understandable to algorithms. A numerical designation, ranging from 0, is assigned to every distinct category value.

From sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
status = le.fit_transform(data['Ticket Status'].values)

Step 6: Model Selection and Model Building

Model selection is assessing and selecting the optimal machine learning algorithm that solves the problem effectively and fits the data, frequently utilizing metrics like computational efficiency and accuracy. To create a predictive model, model building entails training the selected algorithm on the dataset, adjusting its parameters, and assessing its output. For labelled data, Machine Learning has some predefined Models like SVM, Linear Regression, etc.; for this use case, a Random Forest Classifier model is chosen.

From sklearn.ensemble import RandomForestClassifier
RF_model = RandomForestClassifier(n_estimators=100, random_state=42)
RF_model.fit(X_w2v_combined_array, y_array)
predictions = RF_model.predict(x_val_w2v_combined_array)
accuracy = accuracy_score(y_val, predictions)

Step 7: Model Prediction

Model prediction involves using a trained machine learning model to estimate the outcomes for new, unseen data based on the patterns it learned during training. This step translates input features into meaningful predictions or classifications.

predictions = RF_model.predict(X_test_w2v_combined_array)
accuracy = accuracy_score(y_test, predictions)
print("Test Accuracy for Random Forest Model :",accuracy)

Final Thoughts

Now, you’ve built your machine-learning model in Python. This tutorial covers the fundamental steps of data preprocessing, model building, evaluation, and prediction. As you continue your machine-learning journey, you can explore more complex algorithms, larger datasets, and advanced techniques to tackle real-world problems. Keep learning and experimenting with data to unleash the full potential of machine learning.

Authored by: Jayakkavin E

--

--

Payoda Technology Inc

Your Digital Transformation partner. We are here to share knowledge on varied technologies, updates; and to stay in touch with the tech-space.