graphdoc.data.schema module
- class graphdoc.data.schema.SchemaCategory(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
-
Schema quality categories enumeration.
- PERFECT = 'perfect'
- ALMOST_PERFECT = 'almost perfect'
- POOR_BUT_CORRECT = 'poor but correct'
- INCORRECT = 'incorrect'
- BLANK = 'blank'
- class graphdoc.data.schema.SchemaRating(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
-
Schema quality ratings enumeration.
- FOUR = '4'
- THREE = '3'
- TWO = '2'
- ONE = '1'
- ZERO = '0'
- class graphdoc.data.schema.SchemaCategoryRatingMapping[source]
Bases:
object
Mapping between schema categories and ratings.
- static get_rating(category: SchemaCategory) SchemaRating [source]
Get the corresponding rating for a given schema category.
- Parameters:
category – The schema category
- Returns:
The corresponding rating
- static get_category(rating: SchemaRating) SchemaCategory [source]
Get the corresponding category for a given schema rating.
- Parameters:
rating – The schema rating
- Returns:
The corresponding category
- class graphdoc.data.schema.SchemaType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
-
Schema type enumeration.
- FULL_SCHEMA = 'full schema'
- TABLE_SCHEMA = 'table schema'
- ENUM_SCHEMA = 'enum schema'
- class graphdoc.data.schema.SchemaCategoryPath(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
-
Maps schema categories to their folder names.
- PERFECT = 'perfect'
- ALMOST_PERFECT = 'almost_perfect'
- POOR_BUT_CORRECT = 'poor_but_correct'
- INCORRECT = 'incorrect'
- BLANK = 'blank'
- class graphdoc.data.schema.SchemaObject(key: str, category: Enum | None = None, rating: Enum | None = None, schema_name: str | None = None, schema_type: Enum | None = None, schema_str: str | None = None, schema_ast: Node | None = None)[source]
Bases:
object
Schema object containing schema data and metadata.
- key: str
- schema_ast: Node | None = None
- classmethod from_dict(data: dict, category_enum: ~typing.Type[~enum.Enum] = <enum 'SchemaCategory'>, rating_enum: ~typing.Type[~enum.Enum] = <enum 'SchemaRating'>, type_enum: ~typing.Type[~enum.Enum] = <enum 'SchemaType'>) SchemaObject [source]
Create SchemaObject from dictionary with validation.
- Parameters:
data – The data dictionary
category_enum – Custom Enum class for categories
rating_enum – Custom Enum class for ratings
type_enum – Custom Enum class for schema types
- to_dict() dict [source]
Convert the SchemaObject to a dictionary, excluding the key field.
- Returns:
Dictionary representation of the SchemaObject without the key
- Return type:
- static _hf_schema_object_columns() Features [source]
Return the columns for the graph_doc dataset, based on the SchemaObject fields.
- Returns:
The columns for the graph_doc dataset
- Return type:
Features
- to_dataset() Dataset [source]
Convert the SchemaObject to a Hugging Face Dataset.
- Returns:
The Hugging Face Dataset
- Return type:
Dataset