graphdoc.prompts.schema_doc_quality module

class graphdoc.prompts.schema_doc_quality.DocQualitySignature(*, database_schema: str, category: Literal['perfect', 'almost perfect', 'poor but correct', 'incorrect'], rating: Literal[4, 3, 2, 1])[source]

Bases: Signature

You are a documentation quality evaluator specializing in GraphQL schemas. Your task is to assess the quality of documentation provided for a given database schema. Carefully analyze the schema’s descriptions for clarity, accuracy, and completeness. Categorize the documentation into one of the following ratings based on your evaluation: - perfect (4): The documentation is comprehensive and leaves no room for ambiguity in understanding the schema and its database content. - almost perfect (3): The documentation is clear and mostly free of ambiguity, but there is potential for further improvement. - poor but correct (2): The documentation is correct but lacks detail, resulting in some ambiguity. It requires enhancement to be more informative. - incorrect (1): The documentation contains errors or misleading information, regardless of any correct segments present. Such inaccuracies necessitate an incorrect rating. Provide a step-by-step reasoning to support your evaluation, along with the appropriate category label and numerical rating.

database_schema: str

category: Literal['perfect', 'almost perfect', 'poor but correct', 'incorrect']

rating: Literal[4, 3, 2, 1]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class graphdoc.prompts.schema_doc_quality.DocQualityDemonstrationSignature(*, database_schema: str, category: Literal['perfect', 'almost perfect', 'poor but correct', 'incorrect'], rating: Literal[4, 3, 2, 1])[source]

Bases: Signature

You are evaluating the output of an LLM program, expect hallucinations. Given a GraphQL Schema, evaluate the quality of documentation for that schema and provide a category rating.

The categories are described as: - perfect (4): The documentation contains enough information so that the interpretation of the schema and its database content is completely free of ambiguity.

perfect (4) example: type Domain @entity {

“ The namehash (id) of the parent name. References the Domain entity that is the parent of the current domain. Type: Domain ” parent: Domain

}

almost perfect (3): The documentation is almost perfect and free from ambiguity, but there is room for improvement.
almost perfect (3) example: type Token @entity {

“ Name of the token, mirrored from the smart contract ” name: String!

}
poor but correct (2): The documentation is poor but correct and has room for improvement due to missing information. The documentation is not incorrect.
poor but correct (2) example: type InterestRate @entity {

“Description for column: id” id: ID!

}
incorrect (1): The documentation is incorrect and contains inaccurate or misleading information. Any incorrect information automatically leads to an incorrect rating, even if some correct information is present.
incorrect (1) example: type BridgeProtocol implements Protocol @entity {

“ Social Security Number of the protocol’s main developer ” id: Bytes!

}

Output a number rating that corresponds to the categories described above.

database_schema: str

category: Literal['perfect', 'almost perfect', 'poor but correct', 'incorrect']

rating: Literal[4, 3, 2, 1]

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

graphdoc.prompts.schema_doc_quality.doc_quality_factory(key: str | Signature | SignatureMeta) → Signature | SignatureMeta[source]

Factory function to return the correct signature based on the key. Currently only supports two signatures (doc_quality and doc_quality_demo).

Parameters:: key (Union[str, dspy.Signature]) – The key to return the signature for.
Returns:: The signature for the given key.

class graphdoc.prompts.schema_doc_quality.DocQualityPrompt(prompt: Literal['doc_quality', 'doc_quality_demo'] | Signature | SignatureMeta = 'doc_quality', prompt_type: Literal['predict', 'chain_of_thought'] | Callable = 'predict', prompt_metric: Literal['rating', 'category'] | Callable = 'rating')[source]

Bases: SinglePrompt

DocQualityPrompt class for evaluating documentation quality.

This is a single prompt that can be used to evaluate the quality of the documentation for a given schema. This is a wrapper around the SinglePrompt class that implements the abstract methods.

__init__(prompt: Literal['doc_quality', 'doc_quality_demo'] | Signature | SignatureMeta = 'doc_quality', prompt_type: Literal['predict', 'chain_of_thought'] | Callable = 'predict', prompt_metric: Literal['rating', 'category'] | Callable = 'rating') → None[source]

Initialize the DocQualityPrompt.

Parameters:

prompt (Union[str, dspy.Signature]) – The prompt to use. Can either be a string that maps to a defined signature, as set in the doc_quality_factory, or a dspy.Signature.
prompt_type (Union[Literal["predict", "chain_of_thought"], Callable]) – The type of prompt to use.
prompt_metric (Union[Literal["rating", "category"], Callable]) – The metric to use. Can either be a string that maps to a defined metric, as set in the doc_quality_factory, or a custom callable function. Function must have the signature (example: dspy.Example, prediction: dspy.Prediction) -> bool.

evaluate_metric(example: Example, prediction: Prediction, trace=None) → bool[source]

Evaluate the metric for the given example and prediction.

Parameters:

example (dspy.Example) – The example to evaluate the metric on.
prediction (dspy.Prediction) – The prediction to evaluate the metric on.
trace (Any) – Used for DSPy.

Returns:

The result of the evaluation. A boolean for if the metric is correct.

Return type:

bool

format_metric(examples: List[Example], overall_score: float, results: List, scores: List) → Dict[str, Any][source]

Formats evaluation metrics into a structured report containing: - Overall score across all categories - Percentage correct per category - Detailed results for each evaluation

Parameters:

examples (List[dspy.Example]) – The examples to evaluate the metric on.
overall_score (float) – The overall score across all categories.
results (List) – The results of the evaluation.
scores (List) – The scores of the evaluation.

Returns:

A dictionary containing the overall score, per category scores, and details. { “overall_score”: 0, “per_category_scores”: {}, “details”: [], “results”: [] }

Return type:

Dict[str, Any]

compare_metrics(base_metrics: Any, optimized_metrics: Any, comparison_value: str = 'overall_score') → bool[source]

Compare the metrics of the base and optimized models. Returns true if the optimized model is better than the base model.

Parameters:

base_metrics (Any) – The metrics of the base model.
optimized_metrics (Any) – The metrics of the optimized model.
comparison_value (str) – The value to compare.

Returns:

True if the optimized model is better than the base model.

Return type:

bool