graphdoc.data.schema module

class graphdoc.data.schema.SchemaCategory(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Schema quality categories enumeration.

PERFECT = 'perfect'
ALMOST_PERFECT = 'almost perfect'
POOR_BUT_CORRECT = 'poor but correct'
INCORRECT = 'incorrect'
BLANK = 'blank'
classmethod from_str(value: str) SchemaCategory | None[source]
class graphdoc.data.schema.SchemaRating(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Schema quality ratings enumeration.

FOUR = '4'
THREE = '3'
TWO = '2'
ONE = '1'
ZERO = '0'
classmethod from_value(value: str | int) SchemaRating | None[source]
class graphdoc.data.schema.SchemaCategoryRatingMapping[source]

Bases: object

Mapping between schema categories and ratings.

static get_rating(category: SchemaCategory) SchemaRating[source]

Get the corresponding rating for a given schema category.

Parameters:

category – The schema category

Returns:

The corresponding rating

static get_category(rating: SchemaRating) SchemaCategory[source]

Get the corresponding category for a given schema rating.

Parameters:

rating – The schema rating

Returns:

The corresponding category

class graphdoc.data.schema.SchemaType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Schema type enumeration.

FULL_SCHEMA = 'full schema'
TABLE_SCHEMA = 'table schema'
ENUM_SCHEMA = 'enum schema'
classmethod from_str(value: str) SchemaType | None[source]
class graphdoc.data.schema.SchemaCategoryPath(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Maps schema categories to their folder names.

PERFECT = 'perfect'
ALMOST_PERFECT = 'almost_perfect'
POOR_BUT_CORRECT = 'poor_but_correct'
INCORRECT = 'incorrect'
BLANK = 'blank'
classmethod get_path(category: SchemaCategory, folder_path: str | Path) Path | None[source]

Get the folder path for a given schema category and folder path.

Parameters:

category – The schema category

Returns:

The corresponding folder path

class graphdoc.data.schema.SchemaObject(key: str, category: Enum | None = None, rating: Enum | None = None, schema_name: str | None = None, schema_type: Enum | None = None, schema_str: str | None = None, schema_ast: Node | None = None)[source]

Bases: object

Schema object containing schema data and metadata.

key: str
category: Enum | None = None
rating: Enum | None = None
schema_name: str | None = None
schema_type: Enum | None = None
schema_str: str | None = None
schema_ast: Node | None = None
classmethod from_dict(data: dict, category_enum: ~typing.Type[~enum.Enum] = <enum 'SchemaCategory'>, rating_enum: ~typing.Type[~enum.Enum] = <enum 'SchemaRating'>, type_enum: ~typing.Type[~enum.Enum] = <enum 'SchemaType'>) SchemaObject[source]

Create SchemaObject from dictionary with validation.

Parameters:
  • data – The data dictionary

  • category_enum – Custom Enum class for categories

  • rating_enum – Custom Enum class for ratings

  • type_enum – Custom Enum class for schema types

to_dict() dict[source]

Convert the SchemaObject to a dictionary, excluding the key field.

Returns:

Dictionary representation of the SchemaObject without the key

Return type:

dict

static _hf_schema_object_columns() Features[source]

Return the columns for the graph_doc dataset, based on the SchemaObject fields.

Returns:

The columns for the graph_doc dataset

Return type:

Features

to_dataset() Dataset[source]

Convert the SchemaObject to a Hugging Face Dataset.

Returns:

The Hugging Face Dataset

Return type:

Dataset

graphdoc.data.schema.schema_objects_to_dataset(schema_objects: List[SchemaObject]) Dataset[source]

Convert a list of SchemaObjects to a Hugging Face Dataset.

Parameters:

schema_objects – The list of SchemaObjects

Returns:

The Hugging Face Dataset