Introduction

Experiment tracking is a vital step in a modern machine learning, as without it it would be difficult to compare results and choose best parameters for our models. Here, I will show the minimum to track experiments with the help of Weights&Biases(I will refer to it as WandB).

So what is WandB?

WandB is a machine learning platform not only for keeping track of your hyperparameters, system metrics, and predictions so you can compare models, but also it enables to create ML workflows, version dataset and models, optimize hyperparameters and monitor models in production. It has several competitors. Namely, mlflow is one of the most famous ones. However, personally, I found it easier to set up and run WandB than mlflow. Moreover, it has a free tier plan, which is more than enough for personal needs.

First of all, you need to register an account there, and after that go to the User Settings and under the API keys you will find your key to interact with the service.

Then, you need to install WandB itself locally with:

pip install wandb

Now, launch your favourite IDE (I would suggest jupyter-lab/jupyter notebook for prototyping) and connect to your dashboard with the following commands:

import wandb

wandb.login['YOUR API KEY HERE']

There are 5 key commands in order to work with WandB:
1. wandb.login() - authorisation in the system
2. wandb.init() - new experiment initialization
3. wandb.log() - logging metrics
4. wandb.log_artifact() - logging artifacts
5. wandb.finish() - finishing experiment

We will look into all these commands with the help of XGBoost example.

First, we’ll import libraries.

from xgboost import XGBClassifier
from xgboost.callback import EarlyStopping
import wandb

from sklearn.model_selection import StratifiedKFold
wandb.login()

Then, we will write configuration for our model.

Config = dict(
    n_splits = 5,
    random_seed = 42,
    #params for the model
    objective = "binary:logistic",
    tree_method = "hist",
    n_estimators=200,
    early_stopping=20,

    # regularization
    max_depth=5,
    max_delta_step=17,
    colsample_bytree=0.632831510106799,
    colsample_bylevel=0.6390056763292044,
    eta=0.487396497096089,
    min_child_weight = 1,
    gamma = 0.25490782392352085,
    reg_lambda = 59.960195187994934,
    reg_alpha = 8.529168659942826,
    scale_factor=4.71
)

Some training code with cross-validation.

clfs = []
scores = []
scores_eval = []
#X = X_train.drop(cols2drop, axis=1)

wandb.init(project="Project Name",
           config=Config,
          group="xgboost",
          job_type="train",
          name = "Training with parameters suggested with optuna with additional feats")


skf = StratifiedKFold(n_splits=Config["n_splits"], shuffle=True, random_state=Config["random_seed"])

for train_index, test_index in skf.split(X, y):
    X_train, X_test = X.iloc[train_index], X.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]
    
    es = EarlyStopping(
    rounds=Config["early_stopping"],
    min_delta=1e-3,
    save_best=True,
    maximize=True,
    data_name="validation_0",
    metric_name="auc",
)
    
    clf = XGBClassifier(tree_method=Config["tree_method"],
            n_estimators=Config["n_estimators"],
            max_depth=Config["max_depth"],
            scale_pos_weight=Config["scale_factor"],
            max_delta_step=Config["max_delta_step"],
            colsample_bytree=Config["colsample_bytree"],
            colsample_bylevel=Config["colsample_bylevel"],
            learning_rate=Config["eta"],
            min_child_weight = Config["min_child_weight"],
            gamma = Config["gamma"],
            reg_lambda = Config["reg_lambda"],
            reg_alpha = Config["reg_alpha"],
            enable_categorical=True,
            objective=Config["objective"],
            eval_metric="auc",
            random_seed=Config["random_seed"],
            callbacks=[es])
            
    
            
    
    clfs.append(clf)
    clf.fit(X_train, y_train, eval_set=[(X_test, y_test)], verbose=10)
    preds = clf.predict(X_valid) 
    auc_valid = roc_auc_score(y_valid, preds)
    wandb.log({"Valid AUC": auc_valid})
    scores_eval.append(auc_valid)
    print(f"Valid AUC {auc_valid}")
    
    
    wandb.log({'Train AUC': np.mean([v for k, v in clf.evals_result()["validation_0"].items() if "auc" in k], dtype="float16")})
    scores.append(np.mean([v for k, v in clf.evals_result()["validation_0"].items() if "auc" in k], dtype="float16")) 


mean_score = np.mean(scores, dtype="float16") - np.std(scores, dtype="float16")

print("mean AUC score --------->", mean_score)
print(f"mean valid AUC score {np.mean(scores_eval, dtype='float16') - np.std(scores_eval, dtype='float16')}")

wandb.log({"Mean AUC": mean_score, "Mean AUC valid": np.mean(scores_eval)})


clf.save_model("xgb_classificator.json")

artifact = wandb.Artifact(name='best_XGBoost', type='model')
artifact.add_file('xgb_classificator.json')
wandb.log_artifact(artifact)

wandb.finish()

With the wandb.init() method we define our parameters for WandB. Namely,
project parameter is for the project name
config parameter is for the model config
group is for your group of models
job_type is whether your model is in training mode or inference
name is for your experiment name

Tip

I suggest giving meaningful name for your experimentation name, because it would be easier to find the right one among other experiments.

With wandb.log() you can track any metric. It should be in a python’s dictype data type.

WandB artifacts is a way to save your input/output data and model.

Note

Create an empty artifact with wandb.Artifact()
Add your model file or other files with wandb.add_file()
Call wandb.log_artifact() to save your files.

Finally, finish logging with wandb.finish().

I hope, this small article help you to start using Weights and Biases. For more information about artifacts refer to this Colab notebook. And link for official documenation and examples.