Hi everyone! Welcome back to my blog!
Week 3 and 4 of my senior project have been quite eventful. I know that my blogs (including this one) have been quite technical and long (sorry!). If you’re confused at all, feel free to ask any questions you have in the comments below.
In my last blog post, I explained the framework I’d built for my project, which included four endpoints: /train, /models, /predict, and /test. I’ve changed this architecture quite a bit. As shown in the code below, now I only have one endpoint: /models. If the user chooses to add a number after /models, that number will be recognized as the model_ID.
In this blog post, I will discuss the following:
- HTTP Requests
- Code Analysis
- Docker Compose
- Terminal and Virtual Environments
- Next Steps
Another significant difference is that before I only accepted HTTP GET and POST requests. Now I accept GET, POST, PUT, PATCH, and DELETE requests. But what do these HTTP requests mean?
HTTP stands for Hypertext Transfer Protocol, and it allows clients and servers to talk. When you enter the word “hi” into Google, you are essentially sending a GET request to Google, and Google then returns information related to the word “hi.”
1. GET: most common request, used to retrieve data
2. POST: used to send data
3. PUT: sends data to server, overwrites existing data
4. PATCH: sends data to server, adds onto existing data
5. DELETE: deletes data
A more detailed explanation can be found here: https://www.w3schools.com/tags/ref_httpmethods.asp
If we take a look at my code above, for the /models endpoint, I accept GET and POST requests. The GET request returns a list of trained models, and the POST request is used to train new models. If the user sends a POST request to “http://localhost:5000/models?actions=train,test,” since the actions parameter includes test as well, a new model will be trained and its average k-folds cross-validation accuracy will be returned to the user. However, a POST request sent to “http://localhost:5000/models?actions=predict,test,” would return an error, as it’s impossible to classify data (predict) and evaluate a model (test) if it hasn’t been trained yet.
Each time a POST request is sent, my code assigns the newly trained model to the lowest possible unassigned model ID using the generate_model_id() function. Not letting the client assign the model ID ensures that each model ID is unique. Hence, POST requests are not accepted for the /models/<modelID> endpoint.
Taking a look at my code, for the /models/<modelID> endpoint, I accept GET, PUT, PATCH, and DELETE requests. The GET request returns information about the model specified by the modelID. The PUT request replaces an existing model (specified by the modelID provided by user) with a new model, trained by data provided by the client in the request (for more information about the JSON payload, see my last blog post). The PATCH request re-trains an existing model with newly added data, and the DELETE request deletes a specified model.
A video of me running my program can be found here.
As you can see in the video, Postman (a platform used to send HTTP requests) returns a HTTP status code based on if the request is valid. A status code in the 200s signal to the user that the request was valid, while a status code in the 400s tell the user that a client error has occurred. Some possible client errors: 400 – “bad request” (user sent an invalid payload), 404 – “endpoint not found” (ex: http://google.com/hi), 405 – method not allowed (ex. POST request sent to /models/<modelID>). If a status code in the 500s is returned, the request by the user was valid, but a server error occurred (501 – Not Implemented, 502 – Bad Gateway, 503 – Service Unavailable, 504 – Gateway Timeout, etc.).
Another thing you will notice in the video is that I ran my program using one single command: docker-compose up. But before, I had to run two long commands to get my program to run:
1. docker build -t classifier –build-arg upload_dir=usr/src/Classifier/uploaded_files
2. docker run -d -p 5000:5000 -v /Users/ethanhsiao/Documents/Classifier/app.py:/usr/src/Classifier/app.py classifier service
Docker compose is an incredibly useful tool, especially for multicontainer environments. So, how does docker-compose work? When the command “docker-compose up” is run, the YAML file docker-compose.yml is run. This file is shown below on the left.
The fifth line of docker-compose.yml builds the Dockerfile, which contains the instructions for building my image. My Dockerfile can be found above on the right. As you can see in the Dockerfile, I set my work directory for my application, copy requirements.txt (a text file containing all requirements for my program to run) to that work directory, and install all requirements. I then copy app.py to the work directory and run it using CMD.
Looking back at docker-compose.yml, I expose port 5000 in line 13, and I also create two docker volumes. The first docker volume makes sure that the program app.py is the same inside and outside of the container, and the second docker volume ensures the data saved in the folder “container_data” in the container is the same as the data inside the “local_data” folder on the local host machine. These volumes help tremendously, as now I don’t have to rebuild the image every time I change code in the python file “app.py,” and data files saved in the container (files with raw data, modelID_dict.json, modelid_info.txt, and pickle files containing models) are automatically mounted inside my local host machine. Hence, with just the one command “docker-compose up,” my application is up and running!
Terminal and Virtual Environments
Below is the output in the terminal of my application running for the first time. As you can see, it executes the steps in the Dockerfile and creates a docker network “classifier_web_1,” then attaching my container to that bridge network. For now, I only have one container, but if I had more, they could use this bridge network to communicate with each other. Then, as you can see in the terminal on the bottom right, the Flask app is launched!
Since I did not specify –no-cache-dir in my Dockerfile, the next time I run “docker-compose up,” it will use cache and run much faster.
I am running “docker-compose up” inside venv2, which is a virtual environment. Virtual environments are great, as others will not need to install all the dependencies themselves – if they run the code inside the virtual environment, all the dependencies are already present! My venv2 folder is shown below.
If you would like to play around with my code, it can be found here.
Now that I’ve finished revising my projects framework, I’m ready to get started on my humor detection model. I will first be implementing a rule-based model, and then I will implement a machine learning model. Comparing the accuracy of each method, my end product will be the model with the highest performance (I suspect it will be a combination of both, but we’ll see!).
Thank you so much for reading! Stay tuned for next week’s post!