If you’ve used the sklearn library in your own code, you may have realized that all attributes are suffixed with a trailing underscore. Here’s an example for the k-means algorithm:
## Dependencies from sklearn.cluster import KMeans import numpy as np ## Data (Work Work (hh) / Salary Salary ($)) X = np.array([[35, 7000], [45, 6900], [70, 7100], [20, 2000], [25, 2200], [15, 1800]]) ## One-liner kmeans = KMeans(n_clusters=2).fit(X) ## Result & puzzle cc = kmeans.cluster_centers_ print(cc) ''' [[ 50. 7000.] [ 20. 2000.]] '''
In the second-last line, we used the kmeans
attribute cluster_centers_
. Why does sklearn library not use the attribute name cluster_centers
?
‘The short answer is, the trailing underscore (kmeans.cluster_centers_
) in class attributes is a scikit-learn convention to denote “estimated” or “fitted” attributes.’ (source)
So the underscore simply indicates that the attribute was estimated from the data.
The sklearn documentation is very clear about this:
‘Attributes that have been estimated from the data must always have a name ending with trailing underscore, for example the coefficients of some regression estimator would be stored in a coef_
attribute after fit
has been called.’
This is very useful for you because you immediately know that these attributes have been set in the learning phase of the algorithm (and not in the initializer etc.). Thus, you can easily spot that a model has not been trained by checking the attributes with trailing underscores:
## Dependencies from sklearn.cluster import KMeans import numpy as np ## Data (Work Work (hh) / Salary Salary ($)) X = np.array([[35, 7000], [45, 6900], [70, 7100], [20, 2000], [25, 2200], [15, 1800]]) ## One-liner kmeans = KMeans(n_clusters=2) cc = kmeans.cluster_centers_ print(cc) ''' Traceback (most recent call last): File "C:\Users\xcent\Desktop\code.py", line 13, in <module> cc = kmeans.cluster_centers_ AttributeError: 'KMeans' object has no attribute 'cluster_centers_' '''
You can see that without calling the fit()
function, there is no cluster_centers_
attribute, yet. Instead, it’s created dynamically as fit()
is executed.