How I Cracked the Top 100 in the Kaggle House Prices Competition

Kaggle is a vibrant online community for data science and machine learning, providing a platform for learning, sharing, and competition. It’s an invaluable resource for individuals interested in these fields, regardless of their level of experience. The Kaggle House Prices – Advanced Regression Techniques Competition, in particular, is an excellent starting point for anyone who … Read more

Python Time Series Forecast – A Guided Example on Bitcoin Price Data

A Time Series is essentially a tabular data with the special feature of having a time index. The common forecast task is ‘knowing the past (and sometimes the present), predict the future’. This task, taken as a principle, reveals itself in several ways: in how to interpret your problem, in feature engineering, and in which … Read more

How to Calculate z-scores in Python?

The z-scores can be used to compare data with different measurements and for normalization of data for machine learning algorithms and comparisons. 💡 Note: There are different methods to calculate the z-score. The quickest and easiest one is: scipy.stats.zscore(). What is the z-score? The z-score is used for normalization or standardization to make differently scaled … Read more

How to Develop LARS Regression Models in Python?

What is LARS regression? Regression is the analysis of how a variable (the outcome variable) depends on the evolution of other variables (explanatory variables). In regression, we are looking for the answer to the question of what is the function that can be used to predict the value of another variable Y by knowing the … Read more

How to Install Scikit-Learn on PyCharm?

Scikit-Learn, often abbreviated as sklearn, is a popular machine learning library for Python. Problem Formulation: Given a PyCharm project. How to install the Scikit-Learn library in your project within a virtual environment or globally? Here’s a solution that always works: Open File > Settings > Project from the PyCharm menu. Select your current project. Click … Read more

Logistic Regression in Python Scikit-Learn

Logistic regression is a popular algorithm for classification problems (despite its name indicating that it is a “regression” algorithm). It belongs to one of the most important algorithms in the machine learning space. Linear Regression Background Let’s review linear regression. Given the training data, we compute a line that fits this training data so that … Read more

Random Forest Classifier with sklearn

Does your model’s prediction accuracy suck but you need to meet the deadline at all costs? Try the quick and dirty “meta-learning” approach called ensemble learning. In this article, you’ll learn about a specific ensemble learning technique called random forests that combines the predictions (or classifications) of multiple machine learning algorithms. In many cases, it … Read more

SVM sklearn: Python Support Vector Machines Made Simple

Support Vector Machines (SVM) have gained huge popularity in recent years. The reason is their robust classification performance – even in high-dimensional spaces: SVMs even work if there are more dimensions (features) than data items. This is unusual for classification algorithms because of the curse of dimensionality – with increasing dimensionality, data becomes extremely sparse … Read more

Python Scikit-Learn Decision Tree [Video + Blog]

Decision Trees are powerful and intuitive tools in your machine learning toolbelt. Decision trees are human-readable – in contrast to most other machine learning techniques. You can easily train a decision tree and show it to your supervisors who do not need to know anything about machine learning in order to understand how your model … Read more

Neural Networks with SKLearn MLPRegressor

Neural Networks have gained massive popularity in the last years. This is not only a result of the improved algorithms and learning techniques in the field but also of the accelerated hardware performance and the rise of General Processing GPU (GPGPU) technology. In this article, you’ll learn about the Multi-Layer Perceptron (MLP) which is one … Read more

[Fixed] Unknown label type: ‘continuous’ in sklearn LogisticRegression

Summary: Use SKLearn’s LogisticRegression Model for classification problems only. The Y variable is a category (e.g., binary [0,1]), not continuous (e.g. float numbers 3.4, 7.9). If the Y variable is non-categorical (i.e., continuous), the potential fixes are as follows. Re-examine the data. Try to encode the continuous Y variable into categories (e.g., use SKLearn’s LabelEncoder preprocessor). Re-examine … Read more

Sklearn fit() vs transform() vs fit_transform() – What’s the Difference?

Scikit-learn has a library of transformers to preprocess a data set. These transformers clean, generate, reduce or expand the feature representation of the data set. These transformers provide the fit(), transform() and fit_transform() methods. The fit() method identifies and learns the model parameters from a training data set. For example, standard deviation and mean for … Read more