BTCUSD Offline Batch Scoring Predictions

An Offline End-to-End Machine Learning Service for Predicting BTCUSD Hourly Prices.

Let's connect on LinkedIn 🤗

Welcome!

Here's an End-to-End Machine Learning Solution for Predicting BTCUSD Crypto Prices on an Hourly Basis through Batch Scoring.

How does it Work?

Simply put, the whole architecture Relies on a Monolithic Local Setup that uses Python, Apache Airflow and MariaDB.

The Core of the Project relies in the 3-Pipelines approach:

• The ETL Pipeline - AKA Feature Pipeline

On Every Hour the ETL Pipeline (Scheduled by Airflow) gets triggered and Fetches the Previous Data Point, as we need Closing Prices of the Hour, hence, the last price before the new hour ticks in. The Fetched Data Point, get Transformed and pushed to a Local MariaDB Database.

• The Prediction Pipeline - AKA Inference Pipeline

Always on every hour, the Prediction Pipeline gets triggered too and Predicts the Next incoming Data Point with a LightGBM Model. The Predictions gets then pushed into a different table to the Local MariaDB.

• The ReTraining Pipeline - AKA Training Pipeline

Every 23:30 of every Sunday, the Retraining Pipeline gets triggered and ReTrains a new model. This new model, gets stored in a Model Store alongside all the models that have been trained by the Pipeline. Lastly, the Pipeline, Computes an Error Measurement for every Model in the Store, picks the best one in terms of metrics and pushes it to Production.

FlowCharts

Moving Online

This Particular System has been Designed with the sole purpose of being an Offline Batch Scoring Service. However, it can also be deployed online, and here are the best Options for Deploying this System:

Triple Communicating Container Architecture (1 for Airflow, 1 for MariaDB, 1 for Python) This way development can be more scalable and efficient.
Deploy the Architecture as-is into a Server and configure MariaDB for Accepting External Connections. By doing this, we can have the Infrastructure Running Locally, but we can access Transformed Data and Predictions from everywhere, and eventually build up from there. (e.g.: A Community Deployed Streamlit Dashboard that Fetches Data Directly from Online MariaDB)
Deploy the Architecture as-is into a Server and Build an API FastAPI for Serving Both Transformed Data and Predictions. Having an API Endpoint to Fetch Data and Predictions, is both the Safer choice in terms of security and the most Scalable and Efficient one, as long as the Server (with the whole Infrastructure) is up and running.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data		data
model		model
modelstore		modelstore
notebooks		notebooks
scripts		scripts
tempfiles/.ipynb_checkpoints		tempfiles/.ipynb_checkpoints
tests		tests
.gitignore		.gitignore
ETLPipeline.png		ETLPipeline.png
InferencePipeline.png		InferencePipeline.png
README.md		README.md
TrainingPipeline.png		TrainingPipeline.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BTCUSD Offline Batch Scoring Predictions

Welcome!

How does it Work?

• The ETL Pipeline - AKA Feature Pipeline

• The Prediction Pipeline - AKA Inference Pipeline

• The ReTraining Pipeline - AKA Training Pipeline

FlowCharts

Moving Online

About

Releases

Packages

Languages

KazuhiraBenedictMiller/BTCPricePrediction

Folders and files

Latest commit

History

Repository files navigation

BTCUSD Offline Batch Scoring Predictions

Welcome!

How does it Work?

• The ETL Pipeline - AKA Feature Pipeline

• The Prediction Pipeline - AKA Inference Pipeline

• The ReTraining Pipeline - AKA Training Pipeline

FlowCharts

Moving Online

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages