graphdoc.train.doc_quality_trainer module

class graphdoc.train.doc_quality_trainer.DocQualityTrainer(prompt: DocQualityPrompt, optimizer_type: str, optimizer_kwargs: Dict[str, Any], mlflow_model_name: str, mlflow_experiment_name: str, mlflow_tracking_uri: str, trainset: List[Example], evalset: List[Example])[source]

Bases: SinglePromptTrainer

__init__(prompt: DocQualityPrompt, optimizer_type: str, optimizer_kwargs: Dict[str, Any], mlflow_model_name: str, mlflow_experiment_name: str, mlflow_tracking_uri: str, trainset: List[Example], evalset: List[Example])[source]

Initialize the DocQualityTrainer. This is the base class for implementing a trainer for a DocQualityPrompt.

Parameters:

prompt (DocQualityPrompt) – The prompt to train.
optimizer_type (str) – The type of optimizer to use.
optimizer_kwargs (Dict[str, Any]) – The keyword arguments for the optimizer.
mlflow_model_name (str) – The name of the model in mlflow.
mlflow_experiment_name (str) – The name of the experiment in mlflow.
mlflow_tracking_uri (str) – The uri of the mlflow tracking server.
trainset (List[dspy.Example]) – The training set.
evalset (List[dspy.Example]) – The evaluation set.

evaluation_metrics(base_evaluation, optimized_evaluation)[source]

Log evaluation metrics to mlflow. We will log the overall scores and the per category scores. Per category scores will be logged as a csv file.

Parameters:

base_evaluation (Any) – The evaluation metrics of the base model.
optimized_evaluation (Any) – The evaluation metrics of the optimized model.

evaluate_training(base_model, optimized_model) → Tuple[Dict[str, Any], Dict[str, Any]][source]

Evaluate the training of the model. Comparing the base and optimized models.

Parameters:

base_model (Any) – The base model.
optimized_model (Any) – The optimized model.

train(load_model_args: Dict[str, Any] | None = None, save_model: bool = True)[source]

Train the model. If provided, we will load the model from mlflow. Otherwise, we will use the provided DocQualityPrompt as the base model.

Parameters:

load_model_args (Dict[str, Any]) – The arguments to load the model.
save_model (bool) – Whether to save the model.