ML on Docker

What is ML (Machine Learning)
Machine Learning is one of the domains of AI (Artificial Intelligence), which applies a mathematical function to make a machine intelligent. ML is a broad field and it has unlimited scope. DL (Deep Learning) comes under ML and it is a more customized and advanced version of ML. Usnig ML to train a model we must need two things:
1. Real dataset that can be used as historical data from which the model will learn.
2. The predefined libraries, module, and methods those were developed by the developers to work on ML.
In this blog, I am going to use the Simple Linear Regression approach to train the model. The problems which has only one feature that will help in training, uses simple linear regression model. We will train the model on Jupyter and later will take it to the RedHat.
What is Docker?
Docker is a software or tools that provides us the containerisation service. Launching a new OS for the testing purpose of an application is very common because the application must be tested on different platform. Normally, it takes around 1 or more hours to launch one new OS. Whereas using Docker an OS can be launched within seconds. This saves time and makes more profit for the companies and so it is acquired by most of the companies in the world.
What I am going to show in this blog?
1. I have a dataset called “SalaryData.csv”. Using this dataset and Simple Linear Regression I am going to train a model on Jupyter notebook.
2. Then I will launch a docker container using the image of Centos OS.
3. I will take the saved model, code and SalaryData.csv file to the RedHat system.
4. Then I will copy that data to the docker container.
5. Then I will install Python3 and required libraries to this Docker container.
6. Then I will write a small code in Python to get the prediction.
1. Created model is on GitHub and the dataset is also available there
Link:- https://github.com/anupam54/MachineLearning
2. Here I have shown that Docker is installed and then launched one container named as MachineLearning using the image of CentOS: latest.
“systemctl status docker" command will tell you if docker is running or not.
"docker run -it --name MachineLearning CentOS: latest" command I used to launch the docker container.


3. .CSV file, saved file of the trained model, and the code of training all these three files have been transferred here from windows to RedHat using WinSCP. WinSCP needs the IP address of your RedHat only then you can transfer data.

We have one more option, we can put the code on GitHub, and then we can clone the repository on the RedHat base OS, and then we can copy it to the docker container.
“git clone https://github.com/anupam54/MachineLearning.git”

4. Here I copied all the required files to the docker container named as MachineLearning and in the next screenshot I used the saved file of the trained model named “Salary.pk1" in Python code.
The command to copy a file from the RedHat root directory to the docker container is as follow:
“docker cp file_name docker_container_name:/”
NOTE: This command will be entered on RedHat, not on the docker container.

Here, I just used the saved model and then take years of experience by the user as input and return the predicted salary.

5. Here I have installed Python, Pandas library, sklearn library, I also tried to install joblib library but it shows joblib is already installed. When I installed Python, the pip3 library also Installed with it.
Here I used “yum install python3" to install Python on the Docker container.

Here I use “pip3 install pandas” to install the pandas library. It helps you to operate the dataset.

Here you can use both the commands to install sklearn module “pip3 install sklearn” or “pip3 install scikit-learn”

6. In this section I have given an input of 1.1 years of experience and getting the predicted salary of 36187.15875227. This is not 100% accurate but its accuracy is more than 90%.

Hope this blog is informative for you, if you get any mistakes in this blog please let me know. THANKS 😊