graphdoc.config module

graphdoc.config.lm_from_dict(lm_config: dict)[source]

Load a language model from a dictionary of parameters.

Parameters:

lm_config (dict) – Dictionary containing language model parameters.

graphdoc.config.lm_from_yaml(yaml_path: str | Path)[source]

Load a language model from a YAML file.

Parameters:

lm_config (dict) – Dictionary containing language model parameters.

graphdoc.config.dspy_lm_from_dict(lm_config: dict)[source]

Load a language model from a dictionary of parameters. Set the dspy language model.

Parameters:

lm_config (dict) – Dictionary containing language model parameters.

graphdoc.config.dspy_lm_from_yaml(yaml_path: str | Path)[source]

Load a language model from a YAML file. Set the dspy language model.

Parameters:

lm_config (dict) – Dictionary containing language model parameters.

graphdoc.config.mlflow_data_helper_from_dict(mlflow_config: dict) MlflowDataHelper[source]

Load a MLflow data helper from a dictionary of parameters.

The following keys are expected: - mlflow_tracking_uri - mlflow_tracking_username (optional) - mlflow_tracking_password (optional)

{
    "mlflow_tracking_uri": "http://localhost:5000",
    "mlflow_tracking_username": "admin",
    "mlflow_tracking_password": "password"
}
Parameters:

mlflow_config (dict) – Dictionary containing MLflow parameters.

Returns:

A MlflowDataHelper object.

Return type:

MlflowDataHelper

graphdoc.config.mlflow_data_helper_from_yaml(yaml_path: str | Path) MlflowDataHelper[source]

Load a mlflow data helper from a YAML file.

Parameters:

yaml_path (Union[str, Path]) – Path to the YAML file.

mlflow:
    mlflow_tracking_uri:      !env MLFLOW_TRACKING_URI      # The tracking URI for MLflow
    mlflow_tracking_username: !env MLFLOW_TRACKING_USERNAME # The username for the mlflow tracking server
    mlflow_tracking_password: !env MLFLOW_TRACKING_PASSWORD # The password for the mlflow tracking server
graphdoc.config.trainset_from_dict(trainset_dict: dict) List[Example][source]

Load a trainset from a dictionary of parameters.

{
    "hf_api_key": !env HF_DATASET_KEY,          # Must be a valid Hugging
                                                # Face API key
                                                # (with permission to
                                                # access graphdoc)
                                                # TODO: we may make
                                                # this public in the future
    "load_from_hf": false,                      # Whether to load the dataset
                                                # from Hugging Face
    "load_from_local": true,                    # Whether to load the dataset
                                                # from a local directory
    "load_local_specific_category": false,      # Whether to load all categories
                                                # or a specific category
    "local_specific_category": perfect,         # The specific category
                                                # (if load_from_local is true)
    "local_parse_objects": true,                # Whether to parse the objects
                                                # in the dataset
                                                # (if load_from_local is true)
    "split_for_eval": true,                     # Whether to split the dataset
                                                # into trainset and evalset
    "trainset_size": 1000,                      # The size of the trainset
    "evalset_ratio": 0.1,                       # The proportionate size of evalset
    "data_helper_type": "quality"               # Type of data helper to use
                                                # (quality, generation)
}
Parameters:

trainset_dict (dict) – Dictionary containing trainset parameters.

Returns:

A trainset.

Return type:

List[dspy.Example]

graphdoc.config.trainset_from_yaml(yaml_path: str | Path) List[Example][source]

Load a trainset from a YAML file.

data:
    hf_api_key: !env HF_DATASET_KEY         # Must be a valid Hugging Face API key
                                            # (with permission to access graphdoc)
                                            # TODO: we may make this public
    load_from_hf: false                     # Load the dataset from Hugging Face
    load_from_local: true                   # Load the dataset from a local directory
    load_local_specific_category: false     # Load all categories or a specific category
                                            # (if load_from_local is true)
    local_specific_category: perfect,       # Which category to load from the dataset
                                            # (if load_from_local is true)
    local_parse_objects: true,              # Whether to parse the objects
                                            # in the dataset
                                            # (if load_from_local is true)
    split_for_eval: true,                   # Whether to split the dataset
                                            # into trainset and evalset
    trainset_size: 1000,                    # The size of the trainset
    evalset_ratio: 0.1,                     # The proportionate size of evalset
    data_helper_type: quality               # Type of data helper to use
                                            # (quality, generation)
Parameters:

yaml_path (Union[str, Path]) – Path to the YAML file.

Returns:

A trainset.

Return type:

List[dspy.Example]

graphdoc.config.split_trainset(trainset: List[Example], evalset_ratio: float, seed: int = 42) tuple[List[Example], List[Example]][source]

Split a trainset into a trainset and evalset.

Parameters:
  • trainset (List[dspy.Example]) – The trainset to split.

  • evalset_ratio (float) – The proportionate size of the evalset.

Returns:

A tuple of trainset and evalset.

Return type:

tuple[List[dspy.Example], List[dspy.Example]]

graphdoc.config.trainset_and_evalset_from_yaml(yaml_path: str | Path) tuple[List[Example], List[Example]][source]

Load a trainset and evalset from a YAML file.

data:
    hf_api_key: !env HF_DATASET_KEY         # Must be a valid Hugging Face API key
                                            # (with permission to access graphdoc)
                                            # TODO: we may make this public
    load_from_hf: false                     # Load the dataset from Hugging Face
    load_from_local: true                   # Load the dataset from a local directory
    load_local_specific_category: false     # Load all categories or a specific category
                                            # (if load_from_local is true)
    local_specific_category: perfect,       # Which category to load from the dataset
                                            # (if load_from_local is true)
    local_parse_objects: true,              # Whether to parse the objects
                                            # in the dataset
                                            # (if load_from_local is true)
    split_for_eval: true,                   # Whether to split the dataset
                                            # into trainset and evalset
    trainset_size: 1000,                    # The size of the trainset
    evalset_ratio: 0.1,                     # The proportionate size of evalset
    data_helper_type: quality               # Type of data helper to use
                                            # (quality, generation)
    seed: 42                                # The seed for the random number generator
Parameters:

yaml_path (Union[str, Path]) – Path to the YAML file.

Returns:

A tuple of trainset and evalset.

Return type:

tuple[List[dspy.Example], List[dspy.Example]]

graphdoc.config.single_prompt_from_dict(prompt_dict: dict, prompt_metric: str | SinglePrompt, mlflow_dict: dict | None = None) SinglePrompt[source]

Load a single prompt from a dictionary of parameters.

{
    "prompt": "doc_quality",             # Which prompt signature to use
    "class": "SchemaDocQualityPrompt",   # Must be a child of SinglePrompt
    "type": "predict",                   # Must be one of predict, generate
    "metric": "rating",                  # The metric to use for evaluation
    "load_from_mlflow": false,           # Whether to load the prompt from MLflow
    "model_uri": null,                   # The tracking URI for MLflow
    "model_name": null,                  # The name of the model in MLflow
    "model_version": null                # The version of the model in MLflow
}
Parameters:
  • prompt_dict (dict) – Dictionary containing prompt parameters.

  • prompt_metric (Union[str, SinglePrompt]) – The prompt to use for the metric.

  • mlflow_dict (Optional[dict]) – Dictionary containing MLflow parameters.

Returns:

A SinglePrompt object.

Return type:

SinglePrompt

graphdoc.config.single_prompt_from_yaml(yaml_path: str | Path) SinglePrompt[source]

Load a single prompt from a YAML file.

prompt:
    prompt: base_doc_gen        # Which prompt signature to use
    class: DocGeneratorPrompt   # Must be a child of SinglePrompt
                                # (we will use an enum to map this)
    type: chain_of_thought      # The type of prompt to use
                                # (predict, chain_of_thought)
    metric: rating              # The type of metric to use
                                # (rating, category)
    load_from_mlflow: false     # Whether to load the prompt
                                # from an MLFlow URI
    model_uri: null             # The tracking URI for MLflow
    model_name: null            # The name of the model in MLflow
    model_version: null         # The version of the model in MLflow
    prompt_metric: true         # Whether another prompt is used
                                # to calculate the metric
                                # (in which case we must also load that prompt)

prompt_metric:
    prompt: doc_quality         # The prompt to use to calculate
                                # the metric
    class: DocQualityPrompt     # The class of the prompt to use
                                # to calculate the metric
    type: predict               # The type of prompt to use
                                # to calculate the metric
    metric: rating              # The metric to use to calculate
                                # the metric
    load_from_mlflow: false     # Whether to load the prompt
                                # from an MLFlow URI
Parameters:

yaml_path (str) – Path to the YAML file.

Returns:

A SinglePrompt object.

Return type:

SinglePrompt

graphdoc.config.single_trainer_from_dict(trainer_dict: dict, prompt: SinglePrompt, trainset: List[Example] | None = None, evalset: List[Example] | None = None) SinglePromptTrainer[source]

Load a single trainer from a dictionary of parameters.

{
    "mlflow": {
        "mlflow_tracking_uri": "http://localhost:5000",
        "mlflow_tracking_username": "admin",
        "mlflow_tracking_password": "password",
    },
    "trainer": {
        "class": "DocQualityTrainer",
        "mlflow_model_name": "doc_quality_model",
        "mlflow_experiment_name": "doc_quality_experiment",
    },
    "optimizer": {
        "optimizer_type": "miprov2",
        "auto": "light",
        "max_labeled_demos": 2,
        "max_bootstrapped_demos": 4,
        "num_trials": 2,
        "minibatch": true
    },
}
Parameters:
  • trainer_dict (dict) – Dictionary containing trainer parameters.

  • prompt (SinglePrompt) – The prompt to use for this trainer.

Returns:

A SinglePromptTrainer object.

Return type:

SinglePromptTrainer

graphdoc.config.single_trainer_from_yaml(yaml_path: str | Path) SinglePromptTrainer[source]

Load a single prompt trainer from a YAML file.

trainer:
    hf_api_key: !env HF_DATASET_KEY         # Must be a valid Hugging Face API key
                                            # (with permission to access graphdoc)
                                            # TODO: we may make this public
    load_from_hf: false                     # Load the dataset from Hugging Face
    load_from_local: true                   # Load the dataset from a local directory
    load_local_specific_category: false     # Load all categories or a specific category
                                            # (if load_from_local is true)
    local_specific_category: perfect,       # Which category to load from the dataset
                                            # (if load_from_local is true)
    local_parse_objects: true,              # Whether to parse the objects
                                            # in the dataset
                                            # (if load_from_local is true)
    split_for_eval: true,                   # Whether to split the dataset
                                            # into trainset and evalset
    trainset_size: 1000,                    # The size of the trainset
    evalset_ratio: 0.1,                     # The proportionate size of evalset

prompt:
    prompt: base_doc_gen                    # Which prompt signature to use
    class: DocGeneratorPrompt               # Must be a child of SinglePrompt
                                            # (we will use an enum to map this)
    type: chain_of_thought                  # The type of prompt to use
                                            # (predict, chain_of_thought)
    metric: rating                          # The type of metric to use
                                            # (rating, category)
    load_from_mlflow: false                 # L oad the prompt from an MLFlow URI
    model_uri: null                         # The tracking URI for MLflow
    model_name: null                        # The name of the model in MLflow
    model_version: null                     # The version of the model in MLflow
    prompt_metric: true                     # Whether another prompt is used
                                            # to calculate the metric
Parameters:

yaml_path (Union[str, Path]) – Path to the YAML file.

Returns:

A SinglePromptTrainer object.

Return type:

SinglePromptTrainer

graphdoc.config.doc_generator_module_from_dict(module_dict: dict, prompt: DocGeneratorPrompt | SinglePrompt) DocGeneratorModule[source]

Load a single doc generator module from a dictionary of parameters.

{
    "retry": true,
    "retry_limit": 1,
    "rating_threshold": 3,
    "fill_empty_descriptions": true
}
Parameters:
  • module_dict (dict) – Dictionary containing module parameters.

  • prompt (DocGeneratorPrompt) – The prompt to use for this module.

Returns:

A DocGeneratorModule object.

Return type:

DocGeneratorModule

graphdoc.config.doc_generator_module_from_yaml(yaml_path: str | Path) DocGeneratorModule[source]

Load a doc generator module from a YAML file.

prompt:
    prompt: base_doc_gen            # Which prompt signature to use
    class: DocGeneratorPrompt       # Must be a child of SinglePrompt
                                    # (we will use an enum to map this)
    type: chain_of_thought          # The type of prompt to use
                                    # (predict, chain_of_thought)
    metric: rating                  # The type of metric to use
                                    # (rating, category)
    load_from_mlflow: false         # Whether to load the prompt
                                    # from an MLFlow URI
    model_uri: null                 # The tracking URI for MLflow
    model_name: null                # The name of the model in MLflow
    model_version: null             # The version of the model in MLflow
    prompt_metric: true             # Whether another prompt is used
                                    # to calculate the metric
                                    # (in which case we must load that prompt)

prompt_metric:
    prompt: doc_quality             # The prompt to use to calculate the metric
    class: DocQualityPrompt         # The class of the prompt to use
                                    # to calculate the metric
    type: predict                   # The type of prompt to use
                                    # to calculate the metric
    metric: rating                  # The metric to use to calculate the metric
    load_from_mlflow: false         # Whether to load the prompt
                                    # from an MLFlow URI
    model_uri: null                 # The tracking URI for MLflow
    model_name: null                # The name of the model in MLflow
    model_version: null             # The version of the model in MLflow

module:
    retry: true                     # Whether to retry the generation
                                    # if the quality check fails
    retry_limit: 1                  # The maximum number of retries
    rating_threshold: 3             # The rating threshold for the quality check
    fill_empty_descriptions: true   # Whether to fill empty descriptions with
                                    # generated documentation
Parameters:

yaml_path (Union[str, Path]) – Path to the YAML file.

Returns:

A DocGeneratorModule object.

Return type:

DocGeneratorModule

graphdoc.config.doc_generator_eval_from_yaml(yaml_path: str | Path) DocGeneratorEvaluator[source]

Load a doc generator evaluator from a YAML file.

mlflow:
    mlflow_tracking_uri:      !env MLFLOW_TRACKING_URI      # The tracking URI for MLflow
    mlflow_tracking_username: !env MLFLOW_TRACKING_USERNAME # The username for the mlflow tracking server
    mlflow_tracking_password: !env MLFLOW_TRACKING_PASSWORD # The password for the mlflow tracking server

prompt:
    prompt: base_doc_gen                                  # Which prompt signature to use
    class: DocGeneratorPrompt                             # Must be a child of SinglePrompt (we will use an enum to map this)
    type: chain_of_thought                                # The type of prompt to use (predict, chain_of_thought)
    metric: rating                                        # The type of metric to use (rating, category)
    load_from_mlflow: false                               # Whether to load the prompt from an MLFlow URI
    model_uri: null                                       # The tracking URI for MLflow
    model_name: null                                      # The name of the model in MLflow
    model_version: null                                   # The version of the model in MLflow
    prompt_metric: true                                   # Whether another prompt is used to calculate the metric (in which case we must also load that prompt)

prompt_metric:
    prompt: doc_quality                                   # The prompt to use to calculate the metric
    class: DocQualityPrompt                               # The class of the prompt to use to calculate the metric
    type: predict                                         # The type of prompt to use to calculate the metric
    metric: rating                                        # The metric to use to calculate the metric
    load_from_mlflow: false                               # Whether to load the prompt from an MLFlow URI
    model_uri: null                                       # The tracking URI for MLflow
    model_name: null                                      # The name of the model in MLflow
    model_version: null

module:
    retry: true                                           # Whether to retry the generation if the quality check fails
    retry_limit: 1                                        # The maximum number of retries
    rating_threshold: 3                                   # The rating threshold for the quality check
    fill_empty_descriptions: true                         # Whether to fill the empty descriptions in the schema

eval:
    mlflow_experiment_name: doc_generator_eval            # The name of the experiment in MLflow
    generator_prediction_field: documented_schema         # The field in the generator prediction to use
    evaluator_prediction_field: rating                    # The field in the evaluator prediction to use
    readable_value: 25
Parameters:

yaml_path (Union[str, Path]) – Path to the YAML file.

Returns:

A DocGeneratorEvaluator object.

Return type:

DocGeneratorEvaluator