graphdoc.config module
- graphdoc.config.lm_from_dict(lm_config: dict)[source]
Load a language model from a dictionary of parameters.
- Parameters:
lm_config (dict) – Dictionary containing language model parameters.
- graphdoc.config.lm_from_yaml(yaml_path: str | Path)[source]
Load a language model from a YAML file.
- Parameters:
lm_config (dict) – Dictionary containing language model parameters.
- graphdoc.config.dspy_lm_from_dict(lm_config: dict)[source]
Load a language model from a dictionary of parameters. Set the dspy language model.
- Parameters:
lm_config (dict) – Dictionary containing language model parameters.
- graphdoc.config.dspy_lm_from_yaml(yaml_path: str | Path)[source]
Load a language model from a YAML file. Set the dspy language model.
- Parameters:
lm_config (dict) – Dictionary containing language model parameters.
- graphdoc.config.mlflow_data_helper_from_dict(mlflow_config: dict) MlflowDataHelper [source]
Load a MLflow data helper from a dictionary of parameters.
The following keys are expected: - mlflow_tracking_uri - mlflow_tracking_username (optional) - mlflow_tracking_password (optional)
{ "mlflow_tracking_uri": "http://localhost:5000", "mlflow_tracking_username": "admin", "mlflow_tracking_password": "password" }
- Parameters:
mlflow_config (dict) – Dictionary containing MLflow parameters.
- Returns:
A MlflowDataHelper object.
- Return type:
MlflowDataHelper
- graphdoc.config.mlflow_data_helper_from_yaml(yaml_path: str | Path) MlflowDataHelper [source]
Load a mlflow data helper from a YAML file.
- Parameters:
yaml_path (Union[str, Path]) – Path to the YAML file.
mlflow: mlflow_tracking_uri: !env MLFLOW_TRACKING_URI # The tracking URI for MLflow mlflow_tracking_username: !env MLFLOW_TRACKING_USERNAME # The username for the mlflow tracking server mlflow_tracking_password: !env MLFLOW_TRACKING_PASSWORD # The password for the mlflow tracking server
- graphdoc.config.trainset_from_dict(trainset_dict: dict) List[Example] [source]
Load a trainset from a dictionary of parameters.
{ "hf_api_key": !env HF_DATASET_KEY, # Must be a valid Hugging # Face API key # (with permission to # access graphdoc) # TODO: we may make # this public in the future "load_from_hf": false, # Whether to load the dataset # from Hugging Face "load_from_local": true, # Whether to load the dataset # from a local directory "load_local_specific_category": false, # Whether to load all categories # or a specific category "local_specific_category": perfect, # The specific category # (if load_from_local is true) "local_parse_objects": true, # Whether to parse the objects # in the dataset # (if load_from_local is true) "split_for_eval": true, # Whether to split the dataset # into trainset and evalset "trainset_size": 1000, # The size of the trainset "evalset_ratio": 0.1, # The proportionate size of evalset "data_helper_type": "quality" # Type of data helper to use # (quality, generation) }
- Parameters:
trainset_dict (dict) – Dictionary containing trainset parameters.
- Returns:
A trainset.
- Return type:
List[dspy.Example]
- graphdoc.config.trainset_from_yaml(yaml_path: str | Path) List[Example] [source]
Load a trainset from a YAML file.
data: hf_api_key: !env HF_DATASET_KEY # Must be a valid Hugging Face API key # (with permission to access graphdoc) # TODO: we may make this public load_from_hf: false # Load the dataset from Hugging Face load_from_local: true # Load the dataset from a local directory load_local_specific_category: false # Load all categories or a specific category # (if load_from_local is true) local_specific_category: perfect, # Which category to load from the dataset # (if load_from_local is true) local_parse_objects: true, # Whether to parse the objects # in the dataset # (if load_from_local is true) split_for_eval: true, # Whether to split the dataset # into trainset and evalset trainset_size: 1000, # The size of the trainset evalset_ratio: 0.1, # The proportionate size of evalset data_helper_type: quality # Type of data helper to use # (quality, generation)
- Parameters:
yaml_path (Union[str, Path]) – Path to the YAML file.
- Returns:
A trainset.
- Return type:
List[dspy.Example]
- graphdoc.config.split_trainset(trainset: List[Example], evalset_ratio: float, seed: int = 42) tuple[List[Example], List[Example]] [source]
Split a trainset into a trainset and evalset.
- graphdoc.config.trainset_and_evalset_from_yaml(yaml_path: str | Path) tuple[List[Example], List[Example]] [source]
Load a trainset and evalset from a YAML file.
data: hf_api_key: !env HF_DATASET_KEY # Must be a valid Hugging Face API key # (with permission to access graphdoc) # TODO: we may make this public load_from_hf: false # Load the dataset from Hugging Face load_from_local: true # Load the dataset from a local directory load_local_specific_category: false # Load all categories or a specific category # (if load_from_local is true) local_specific_category: perfect, # Which category to load from the dataset # (if load_from_local is true) local_parse_objects: true, # Whether to parse the objects # in the dataset # (if load_from_local is true) split_for_eval: true, # Whether to split the dataset # into trainset and evalset trainset_size: 1000, # The size of the trainset evalset_ratio: 0.1, # The proportionate size of evalset data_helper_type: quality # Type of data helper to use # (quality, generation) seed: 42 # The seed for the random number generator
- graphdoc.config.single_prompt_from_dict(prompt_dict: dict, prompt_metric: str | SinglePrompt, mlflow_dict: dict | None = None) SinglePrompt [source]
Load a single prompt from a dictionary of parameters.
{ "prompt": "doc_quality", # Which prompt signature to use "class": "SchemaDocQualityPrompt", # Must be a child of SinglePrompt "type": "predict", # Must be one of predict, generate "metric": "rating", # The metric to use for evaluation "load_from_mlflow": false, # Whether to load the prompt from MLflow "model_uri": null, # The tracking URI for MLflow "model_name": null, # The name of the model in MLflow "model_version": null # The version of the model in MLflow }
- graphdoc.config.single_prompt_from_yaml(yaml_path: str | Path) SinglePrompt [source]
Load a single prompt from a YAML file.
prompt: prompt: base_doc_gen # Which prompt signature to use class: DocGeneratorPrompt # Must be a child of SinglePrompt # (we will use an enum to map this) type: chain_of_thought # The type of prompt to use # (predict, chain_of_thought) metric: rating # The type of metric to use # (rating, category) load_from_mlflow: false # Whether to load the prompt # from an MLFlow URI model_uri: null # The tracking URI for MLflow model_name: null # The name of the model in MLflow model_version: null # The version of the model in MLflow prompt_metric: true # Whether another prompt is used # to calculate the metric # (in which case we must also load that prompt) prompt_metric: prompt: doc_quality # The prompt to use to calculate # the metric class: DocQualityPrompt # The class of the prompt to use # to calculate the metric type: predict # The type of prompt to use # to calculate the metric metric: rating # The metric to use to calculate # the metric load_from_mlflow: false # Whether to load the prompt # from an MLFlow URI
- Parameters:
yaml_path (str) – Path to the YAML file.
- Returns:
A SinglePrompt object.
- Return type:
SinglePrompt
- graphdoc.config.single_trainer_from_dict(trainer_dict: dict, prompt: SinglePrompt, trainset: List[Example] | None = None, evalset: List[Example] | None = None) SinglePromptTrainer [source]
Load a single trainer from a dictionary of parameters.
{ "mlflow": { "mlflow_tracking_uri": "http://localhost:5000", "mlflow_tracking_username": "admin", "mlflow_tracking_password": "password", }, "trainer": { "class": "DocQualityTrainer", "mlflow_model_name": "doc_quality_model", "mlflow_experiment_name": "doc_quality_experiment", }, "optimizer": { "optimizer_type": "miprov2", "auto": "light", "max_labeled_demos": 2, "max_bootstrapped_demos": 4, "num_trials": 2, "minibatch": true }, }
- Parameters:
trainer_dict (dict) – Dictionary containing trainer parameters.
prompt (SinglePrompt) – The prompt to use for this trainer.
- Returns:
A SinglePromptTrainer object.
- Return type:
SinglePromptTrainer
- graphdoc.config.single_trainer_from_yaml(yaml_path: str | Path) SinglePromptTrainer [source]
Load a single prompt trainer from a YAML file.
trainer: hf_api_key: !env HF_DATASET_KEY # Must be a valid Hugging Face API key # (with permission to access graphdoc) # TODO: we may make this public load_from_hf: false # Load the dataset from Hugging Face load_from_local: true # Load the dataset from a local directory load_local_specific_category: false # Load all categories or a specific category # (if load_from_local is true) local_specific_category: perfect, # Which category to load from the dataset # (if load_from_local is true) local_parse_objects: true, # Whether to parse the objects # in the dataset # (if load_from_local is true) split_for_eval: true, # Whether to split the dataset # into trainset and evalset trainset_size: 1000, # The size of the trainset evalset_ratio: 0.1, # The proportionate size of evalset prompt: prompt: base_doc_gen # Which prompt signature to use class: DocGeneratorPrompt # Must be a child of SinglePrompt # (we will use an enum to map this) type: chain_of_thought # The type of prompt to use # (predict, chain_of_thought) metric: rating # The type of metric to use # (rating, category) load_from_mlflow: false # L oad the prompt from an MLFlow URI model_uri: null # The tracking URI for MLflow model_name: null # The name of the model in MLflow model_version: null # The version of the model in MLflow prompt_metric: true # Whether another prompt is used # to calculate the metric
- Parameters:
yaml_path (Union[str, Path]) – Path to the YAML file.
- Returns:
A SinglePromptTrainer object.
- Return type:
SinglePromptTrainer
- graphdoc.config.doc_generator_module_from_dict(module_dict: dict, prompt: DocGeneratorPrompt | SinglePrompt) DocGeneratorModule [source]
Load a single doc generator module from a dictionary of parameters.
{ "retry": true, "retry_limit": 1, "rating_threshold": 3, "fill_empty_descriptions": true }
- Parameters:
module_dict (dict) – Dictionary containing module parameters.
prompt (DocGeneratorPrompt) – The prompt to use for this module.
- Returns:
A DocGeneratorModule object.
- Return type:
DocGeneratorModule
- graphdoc.config.doc_generator_module_from_yaml(yaml_path: str | Path) DocGeneratorModule [source]
Load a doc generator module from a YAML file.
prompt: prompt: base_doc_gen # Which prompt signature to use class: DocGeneratorPrompt # Must be a child of SinglePrompt # (we will use an enum to map this) type: chain_of_thought # The type of prompt to use # (predict, chain_of_thought) metric: rating # The type of metric to use # (rating, category) load_from_mlflow: false # Whether to load the prompt # from an MLFlow URI model_uri: null # The tracking URI for MLflow model_name: null # The name of the model in MLflow model_version: null # The version of the model in MLflow prompt_metric: true # Whether another prompt is used # to calculate the metric # (in which case we must load that prompt) prompt_metric: prompt: doc_quality # The prompt to use to calculate the metric class: DocQualityPrompt # The class of the prompt to use # to calculate the metric type: predict # The type of prompt to use # to calculate the metric metric: rating # The metric to use to calculate the metric load_from_mlflow: false # Whether to load the prompt # from an MLFlow URI model_uri: null # The tracking URI for MLflow model_name: null # The name of the model in MLflow model_version: null # The version of the model in MLflow module: retry: true # Whether to retry the generation # if the quality check fails retry_limit: 1 # The maximum number of retries rating_threshold: 3 # The rating threshold for the quality check fill_empty_descriptions: true # Whether to fill empty descriptions with # generated documentation
- Parameters:
yaml_path (Union[str, Path]) – Path to the YAML file.
- Returns:
A DocGeneratorModule object.
- Return type:
DocGeneratorModule
- graphdoc.config.doc_generator_eval_from_yaml(yaml_path: str | Path) DocGeneratorEvaluator [source]
Load a doc generator evaluator from a YAML file.
mlflow: mlflow_tracking_uri: !env MLFLOW_TRACKING_URI # The tracking URI for MLflow mlflow_tracking_username: !env MLFLOW_TRACKING_USERNAME # The username for the mlflow tracking server mlflow_tracking_password: !env MLFLOW_TRACKING_PASSWORD # The password for the mlflow tracking server prompt: prompt: base_doc_gen # Which prompt signature to use class: DocGeneratorPrompt # Must be a child of SinglePrompt (we will use an enum to map this) type: chain_of_thought # The type of prompt to use (predict, chain_of_thought) metric: rating # The type of metric to use (rating, category) load_from_mlflow: false # Whether to load the prompt from an MLFlow URI model_uri: null # The tracking URI for MLflow model_name: null # The name of the model in MLflow model_version: null # The version of the model in MLflow prompt_metric: true # Whether another prompt is used to calculate the metric (in which case we must also load that prompt) prompt_metric: prompt: doc_quality # The prompt to use to calculate the metric class: DocQualityPrompt # The class of the prompt to use to calculate the metric type: predict # The type of prompt to use to calculate the metric metric: rating # The metric to use to calculate the metric load_from_mlflow: false # Whether to load the prompt from an MLFlow URI model_uri: null # The tracking URI for MLflow model_name: null # The name of the model in MLflow model_version: null module: retry: true # Whether to retry the generation if the quality check fails retry_limit: 1 # The maximum number of retries rating_threshold: 3 # The rating threshold for the quality check fill_empty_descriptions: true # Whether to fill the empty descriptions in the schema eval: mlflow_experiment_name: doc_generator_eval # The name of the experiment in MLflow generator_prediction_field: documented_schema # The field in the generator prediction to use evaluator_prediction_field: rating # The field in the evaluator prediction to use readable_value: 25
- Parameters:
yaml_path (Union[str, Path]) – Path to the YAML file.
- Returns:
A DocGeneratorEvaluator object.
- Return type:
DocGeneratorEvaluator