August 15, 2024

Real Estate Price Prediction Using Regression Models

Project Summary:

This project focuses on predicting real estate prices in the USA using various regression models. The dataset, sourced from Kaggle and enriched with additional neighborhood data, was analyzed and preprocessed using Python and Databricks. The goal was to develop predictive models that estimate property prices based on features like property size, location, number of rooms, and other relevant variables. Several regression techniques, including Linear, Lasso, Ridge, Polynomial, ElasticNet, RandomForest, and GradientBoosting, were implemented and evaluated. The RandomForest Regression model emerged as the best performer, providing the most accurate price predictions.

Read More

GitHub Link: To see my project on GitHub please click GitHub Repository button below

August 13, 2024

Home Sales Data Analysis with Apache Spark

Project Summary:

This project involves analyzing home sales data using Apache Spark's PySpark library to calculate average home prices based on various criteria, such as the number of bedrooms, bathrooms, and other features. Key aspects of the analysis include executing queries to derive insights on pricing trends, caching data for performance improvements, and storing data in Parquet format for re-analysis. The project emphasizes performance comparison between cached and uncached data and demonstrates how Apache Spark can handle large-scale data processing effectively. enhancing decision-making based on historical stock data.

Read More

GitHub Link: To see my project on GitHub please click GitHub Repository button below

July 29, 2024

Charity Application Success Prediction

Project Summary:

This project uses a neural network model to predict the likelihood of charity applications being approved based on historical data. Key features like application type, affiliation, use case, and requested amounts were preprocessed and fed into the model, which was trained and optimized through various iterations. The final model, with multiple hidden layers and fine-tuned parameters, achieved a validation accuracy of 74.18%. Although the model shows promise, further improvements are necessary to increase the accuracy. This analysis provides valuable insights for charities looking to improve their application success rates.

Read More

GitHub Link: To see my project on GitHub please click GitHub Repository button below

July 22, 2024

Loan Risk Prediction Model

Project Summary:

This project showcases the development of a logistic regression model for credit risk classification. With an accuracy of 99.22%, the model performed well in identifying both healthy and high-risk loans. Given the strong precision and recall scores, especially for healthy loans, the model is suitable for practical applications in credit risk prediction. However, further model tuning or exploring more complex algorithms could enhance performance in reducing false positives and false negatives.

Read More

GitHub Link: To see my project on GitHub please click GitHub Repository button below

July 18, 2024

Cryptocurrency Clustering Analysis

Project Summary:

This project focuses on clustering cryptocurrencies based on their market performance data. Using K-Means clustering and Principal Component Analysis (PCA), the goal is to identify patterns and group similar cryptocurrencies. By reducing dimensionality with PCA, I further optimized the clustering process while retaining a high amount of variance from the original dataset.

Read More

GitHub Link: To see my project on GitHub please click GitHub Repository button below