Why Does the Scikit-learn Library use a Trailing Underscore Convention for Attribute Names?

If you’ve used the sklearn library in your own code, you may have realized that all attributes are suffixed with a trailing underscore. Here’s an example for the k-means algorithm:

## Dependencies
from sklearn.cluster import KMeans
import numpy as np


## Data (Work Work (hh) / Salary Salary ($))
X = np.array([[35, 7000], [45, 6900], [70, 7100],  
              [20, 2000], [25, 2200], [15, 1800]])


## One-liner
kmeans = KMeans(n_clusters=2).fit(X)  


## Result & puzzle
cc = kmeans.cluster_centers_
print(cc)
'''
[[  50. 7000.]
 [  20. 2000.]]
'''

In the second-last line, we used the kmeans attribute cluster_centers_. Why does sklearn library not use the attribute name cluster_centers?

‘The short answer is, the trailing underscore (kmeans.cluster_centers_) in class attributes is a scikit-learn convention to denote “estimated” or “fitted” attributes.’ (source)

So the underscore simply indicates that the attribute was estimated from the data.

The sklearn documentation is very clear about this:

‘Attributes that have been estimated from the data must always have a name ending with trailing underscore, for example the coefficients of some regression estimator would be stored in a coef_ attribute after fit has been called.’

This is very useful for you because you immediately know that these attributes have been set in the learning phase of the algorithm (and not in the initializer etc.). Thus, you can easily spot that a model has not been trained by checking the attributes with trailing underscores:

## Dependencies
from sklearn.cluster import KMeans
import numpy as np


## Data (Work Work (hh) / Salary Salary ($))
X = np.array([[35, 7000], [45, 6900], [70, 7100],  
              [20, 2000], [25, 2200], [15, 1800]])

## One-liner
kmeans = KMeans(n_clusters=2)

cc = kmeans.cluster_centers_
print(cc)
'''
Traceback (most recent call last):
  File "C:\Users\xcent\Desktop\code.py", line 13, in <module>
    cc = kmeans.cluster_centers_
AttributeError: 'KMeans' object has no attribute 'cluster_centers_'
'''

You can see that without calling the fit() function, there is no cluster_centers_ attribute, yet. Instead, it’s created dynamically as fit() is executed.

Leave a Comment Cancel reply