graphdoc.prompts.single_prompt module

class graphdoc.prompts.single_prompt.SinglePrompt(prompt: Signature | SignatureMeta, prompt_type: Literal['predict', 'chain_of_thought'] | Callable, prompt_metric: Any)[source]

Bases: ABC

__init__(prompt: Signature | SignatureMeta, prompt_type: Literal['predict', 'chain_of_thought'] | Callable, prompt_metric: Any) → None[source]

Initialize a single prompt.

Parameters:

prompt (dspy.Signature) – The prompt to use.
prompt_type (Union[Literal["predict", "chain_of_thought"], Callable]) – The type of prompt to use. Can be “predict” or “chain_of_thought”. Optionally, pass another dspy.Module.
prompt_metric (Any) – The metric to use. Marked as Any for flexibility (as metrics can be other prompts).

abstract evaluate_metric(example: Example, prediction: Prediction, trace=None) → Any[source]

This is the metric used to evalaute the prompt.

Parameters:

example (dspy.Example) – The example to evaluate the metric on.
prediction (dspy.Prediction) – The prediction to evaluate the metric on.
trace (Any) – The trace to evaluate the metric on. This is for DSPy.

abstract format_metric(examples: List[Example], overall_score: float, results: List, scores: List) → Dict[str, Any][source]

This takes the results from the evaluate_evalset and does any necessary formatting, taking into account the metric type.

Parameters:

examples (List[dspy.Example]) – The examples to evaluate the metric on.
overall_score (float) – The overall score of the metric.
results (List) – The results from the evaluate_evalset.
scores (List) – The scores from the evaluate_evalset.

abstract compare_metrics(base_metrics: Any, optimized_metrics: Any, comparison_value: str = 'overall_score') → bool[source]

Compare the metrics of the base and optimized models. Return true if the optimized model is better than the base model.

Parameters:

base_metrics (Any) – The metrics of the base model.
optimized_metrics (Any) – The metrics of the optimized model.
comparison_value (str) – The value to compare the metrics on. Determines which metric is used to compare the models.

Returns:

True if the optimized model is better than the base model, False otherwise.

Return type:

bool

evaluate_evalset(examples: List[Example], num_threads: int = 1, display_progress: bool = True, display_table: bool = True) → Dict[str, Any][source]

Take in a list of examples and evaluate the results.

Parameters:

examples (List[dspy.Example]) – The examples to evaluate the results on.
num_threads (int) – The number of threads to use for evaluation.
display_progress (bool) – Whether to display the progress of the evaluation.
display_table (bool) – Whether to display the table of the evaluation.

Returns:

A dictionary containing the overall score, results, and scores.

Return type:

Dict[str, Any]