API Reference
UIMA CAS processing library in Python.
- class cassis.Cas(typesystem=None, lenient=False, sofa_string=None, sofa_mime=None, document_language=None)[source]
A CAS object is a container for text (sofa) and annotations
- add(annotation, keep_id=True)[source]
Adds an annotation to this Cas.
- Parameters:
annotation (
FeatureStructure
) – The annotation to add.keep_id (
Optional
[bool
]) – Keep the XMI id of annotation if true, else generate a new one.
- add_all(annotations)[source]
Adds several annotations at once to this CAS.
- Parameters:
annotations (
Iterable
[FeatureStructure
]) – An iterable of annotations to add.
- add_annotation(annotation, keep_id=True)[source]
Adds an annotation to this Cas.
- Parameters:
annotation (
FeatureStructure
) – The annotation to add.keep_id (
Optional
[bool
]) – Keep the XMI id of annotation if true, else generate a new one.
Deprecated since version Use: add()
- add_annotations(annotations)[source]
Adds several annotations at once to this CAS.
- Parameters:
annotations (
Iterable
[FeatureStructure
]) – An iterable of annotations to add.
Deprecated since version Use: add_all()
- create_view(name, xmiID=None, sofaNum=None)[source]
Create a view and its underlying Sofa (subject of analysis).
- Parameters:
name (
str
) – The name of the view. This is the same as the associated Sofa name.xmiID (
Optional
[int
]) – If specified, use this XMI id instead of generating a new one.sofaNum (
Optional
[int
]) – If specified, use this sofaNum instead of generating a new one.
- Return type:
- Returns:
The newly created view.
- Raises:
ValueError – If a view with name already exists.
- property document_language: str
The document language contains the language code for the document.
Returns: The document language.
- get_covered_text(annotation)[source]
Gets the text that is covered by annotation.
- Parameters:
annotation (
FeatureStructure
) – The annotation whose covered text is to be retrieved.- Return type:
str
- Returns:
The text covered by annotation
Deprecated since version Use: annotation.get_covered_text()
- get_document_annotation()[source]
Get the DocumentAnnotation feature structure associated with this CAS view. If none exists, one is created.
- Return type:
FeatureStructure
- Returns:
The DocumentAnnotation associated with this CAS view.
- get_sofa()[source]
Get the Sofa feature structure associated with this CAS view.
- Return type:
- Returns:
The sofa associated with this CAS view.
- get_view(name)[source]
Gets an existing view.
- Parameters:
name (
str
) – The name of the view. This is the same as the associated Sofa name.- Return type:
- Returns:
The view corresponding to name
- remove(annotation)[source]
Removes an annotation from an index. This throws if the annotation was not present.
- Parameters:
annotation (
FeatureStructure
) – The annotation to remove.
- remove_annotation(annotation)[source]
Removes an annotation from an index. This throws if the annotation was not present.
- Parameters:
annotation (
FeatureStructure
) – The annotation to remove.
Deprecated since version Use: remove()
- select(type_)[source]
Finds all annotations of type type_name.
- Parameters:
type – The type or name of the type name whose annotation instances are to be found
- Return type:
List
[FeatureStructure
]- Returns:
A list of all feature structures of type type_name
- select_all()[source]
Finds all feature structures in this Cas
- Return type:
List
[FeatureStructure
]- Returns:
A list of all annotations in this Cas
- select_covered(type_, covering_annotation)[source]
Returns a list of covered annotations.
Return all annotations that are covered
Only returns annotations that are fully covered, overlapping annotations are ignored.
- Parameters:
type – The type or name of the type name whose annotation instances are to be found
covering_annotation (
FeatureStructure
) – The name of the annotation which covers
- Return type:
List
[FeatureStructure
]- Returns:
A list of covered annotations
- select_covering(type_, covered_annotation)[source]
Returns a list of annotations that cover the given annotation.
Return all annotations that are covering. This can be potentially be slow.
Only returns annotations that are fully covering, overlapping annotations are ignored.
- Parameters:
type – The type or name of the type name whose annotation instances are to be found
covered_annotation (
FeatureStructure
) – The name of the annotation which is covered
- Return type:
List
[FeatureStructure
]- Returns:
A list of covering annotations
- property sofa_array: str
The sofa byte array references a uima.cas.ByteArray feature structure
Returns: The sofa data byte array.
- property sofa_mime: str
The sofa mime contains the MIME type of the document text.
Returns: The sofa MIME type.
- property sofa_string: str
The sofa string contains the document text.
Returns: The sofa string.
- property sofa_uri: str
The sofa URI references external sofa data.
Returns: The sofa URI.
- property sofas: List[Sofa]
Finds all sofas that this CAS manages
- Returns:
The list of all sofas belonging to this CAS
- to_json(path=None, pretty_print=False, ensure_ascii=False, type_system_mode=TypeSystemMode.FULL)[source]
Creates a JSON representation of this CAS.
- Parameters:
path (
Union
[str
,Path
,None
]) – File path, if None is provided the result is returned as a stringpretty_print (
bool
) – True if the resulting JSON should be pretty-printed, else Falseensure_ascii – Whether to escape non-ASCII Unicode characters or not
type_system_mode (
TypeSystemMode
) – Whether to serialize the full type system (FUL), only the types used (MINIMAL), or no type system information at all (NONE)
- Return type:
Optional
[str
]- Returns:
If path is None, then the JSON representation of this CAS is returned as a string
- to_xmi(path=None, pretty_print=False)[source]
Creates a XMI representation of this CAS.
- Parameters:
path (
Union
[str
,Path
,None
]) – File path, if None is provided the result is returned as a stringpretty_print (
bool
) – True if the resulting XML should be pretty-printed, else False
- Return type:
Optional
[str
]- Returns:
If path is None, then the XMI representation of this CAS is returned as a string
- typecheck()[source]
Checks whether all feature structures in this CAS are type sound.
For more information, see cassis.TypesSystem::typecheck.
- Return type:
List
[TypeCheckError
]- Returns:
List of type errors found, empty list of no errors were found.
- property typesystem: TypeSystem
- class cassis.Sofa(type, sofaNum, xmiID, sofaID, sofaString=None, mimeType=None, sofaURI=None, sofaArray=None, offset_converter=NOTHING)[source]
Each CAS has one or more Subject of Analysis (SofA)
- mimeType
The mime type of sofaString
- Type:
str
- sofaArray
The sofa data byte array
- Type:
str
- sofaID
The name of the sofa, i.e. the sofa ID
- Type:
str
- sofaNum
The sofaNum
- Type:
int
- property sofaString: str
- sofaURI
The sofa URI, it references remote sofa data
- Type:
str
- type
The type
- Type:
“Type”
- xmiID
The XMI id
- Type:
int
- class cassis.TypeSystem(add_document_annotation_type=True)[source]
- add_feature(type_, name, rangeTypeName, elementType=None, description=None, multipleReferencesAllowed=None)[source]
Adds a feature to the given type. :param type_: The type to which the feature will be added :type name:
str
:param name: The name of the new feature :type rangeTypeName:str
:param rangeTypeName: The feature’s rangeTypeName specifies the type of value that the feature can take. :type elementType:str
:param elementType: The elementType of a feature is optional, and applies only when the rangeTypeNameis uima.cas.FSArray or uima.cas.FSList The elementType specifies what type of value can be assigned as an element of the array or list.
- Parameters:
description (
str
) – The description of the new featuremultipleReferencesAllowed (
bool
) – Setting this to true indicates that the array or list may be shared, so changes to it may affect other objects in the CAS.
- Raises:
Exception – If a feature with name name already exists in type_.
Deprecated since version Use: create_feature
- contains_type(typename)[source]
Checks whether this type system contains a type with name typename.
- Parameters:
typename (
str
) – The name of type whose existence is to be checked.- Returns:
True if a type with typename exists, else False.
- create_feature(domainType, name, rangeType, elementType=None, description=None, multipleReferencesAllowed=None)[source]
Adds a feature to the given type.
- Parameters:
domainType (
Union
[Type
,str
]) – The type to which the feature will be addedname (
str
) – The name of the new featurerangeType (
Union
[Type
,str
]) – The feature’s rangeTypeName specifies the type of value that the feature can take.elementType (
Union
[Type
,str
]) – The elementType of a feature is optional, and applies only when the rangeTypeName is uima.cas.FSArray or uima.cas.FSList The elementType specifies what type of value can be assigned as an element of the array or list.description (
str
) – The description of the new featuremultipleReferencesAllowed (
bool
) – Setting this to true indicates that the array or list may be shared, so changes to it may affect other objects in the CAS.
- Raises:
Exception – If a feature with name name already exists in type_.
- Return type:
- create_type(name, supertypeName='uima.tcas.Annotation', description=None)[source]
Creates a new type and return it.
- Parameters:
name (
str
) – The name of the new typesupertypeName (
str
) – The name of the new types’ supertype. Defaults to uima.cas.AnnotationBasedescription (
str
) – The description of the new type
- Return type:
- Returns:
The newly created type
- get_type(type_name)[source]
Finds a type by name in the type system of this CAS.
- Parameters:
typename – The name of the type to retrieve
- Return type:
- Returns:
The type with name typename
- Raises:
Exception – If no type with typename could be found.
- get_types(built_in=False)[source]
Returns all types of this type system. Normally, this excludes the built-in types
- Parameters:
built_in (
bool
) – Also include the built-in types- Return type:
Iterator
[Type
]
- is_array(type_)[source]
Checks if the type identified by type is an array.
- Parameters:
type – Type to query for (Type or name as string)
- Return type:
bool
- Returns:
Returns True if the type identified by type is an array type, else False
- is_collection(type_, feature)[source]
Checks if the given feature for the type identified by ``type_`is a collection, e.g. list or array.
- Parameters:
type – The type to which the feature belongs (Type or name as string)
feature (
Feature
) – The feature to query for.
- Return type:
bool
- Returns:
Returns True if the given feature is a collection type, else False
- is_list(type_)[source]
Checks if the type identified by type is a list.
- Parameters:
type – Type to query for (Type or name as string)
- Return type:
bool
- Returns:
Returns True if the type identified by type is a list type, else False
- is_primitive(type_)[source]
Checks if the type identified by type_name is a primitive type.
- Parameters:
type – Type to query for (Type or name as string)
- Return type:
bool
- Returns:
Returns True if the type identified by type is a primitive type, else False
- is_primitive_array(type_)[source]
Checks if the type identified by type is a primitive array, e.g. array of primitives.
- Parameters:
type – Type to query for (Type or name as string)
- Return type:
bool
- Returns:
Returns True if the type identified by type is a primitive array type, else False
- is_primitive_collection(type_)[source]
Checks if the type identified by type is a primitive collection, e.g. list or array of primitives.
- Parameters:
type – Type to query for (Type or name as string)
- Return type:
bool
- Returns:
Returns True if the type identified by type is a primitive collection type, else False
- is_primitive_list(type_)[source]
Checks if the type identified by type is a primitive list, e.g. list of primitives.
- Parameters:
type – Type to query for (Type or name as string)
- Return type:
bool
- Returns:
Returns True if the type identified by type is a primitive array type, else False
- subsumes(parent, child)[source]
Determines if the type child is a child of parent.
- Parameters:
parent_name – Parent type (Type or name as string)
child_name – Child type (Type or name as string)
- Return type:
bool
- Returns:
True if parent subsumes child else False
- to_xml(path=None)[source]
Creates a XMI representation of this type system.
- Parameters:
path (
Union
[str
,Path
,None
]) – File path or file-like object, if None is provided the result is returned as a string.- Return type:
Optional
[str
]- Returns:
If path is None, then the XML representation of this type system is returned as a string.
- class cassis.View(sofa)[source]
A view into a CAS contains a subset of feature structures and annotations.
- get_all_annotations()[source]
Gets all the annotations in this view.
- Return type:
List
[FeatureStructure
]- Returns:
A list of all annotations in this view.
- remove_annotation_from_index(annotation)[source]
Removes an annotation from an index. This throws if the annotation was not present.
- Parameters:
annotation (
FeatureStructure
) – The annotation to remove.
- property type_index: Dict[str, SortedKeyList]
Returns an index mapping type names to annotations of this type.
- Returns:
A dictionary mapping type names to annotations of this type.
- cassis.cas_to_comparable_text(cas, out=None, seeds=None, mark_indexed=True, covered_text=True, exclude_types=None)[source]
- Return type:
[<class ‘str’>, None]
- cassis.load_cas_from_json(source, typesystem=None, lenient=False, merge_typesystem=True)[source]
Loads a CAS from a JSON source.
- Parameters:
source (
Union
[IO
,str
]) – The JSON source. If source is a string, then it is assumed to be an JSON string. If source is a file-like object, then the data is read from it.typesystem (
TypeSystem
) – The type system that belongs to this CAS. If None, an empty type system is provided.lenient (
bool
) – If True, unknown Types will be ignored. If False, unknown Types will cause an exception. The default is False.
- Return type:
- Returns:
The deserialized CAS
- cassis.load_cas_from_xmi(source, typesystem=None, lenient=False, trusted=False)[source]
Loads a CAS from a XMI source.
- Parameters:
source (
Union
[IO
,Path
,str
]) – The XML source. If source is a string, then it is assumed to be an XML string. If source is a file-like object, then the data is read from it. If source is a Path, then load the file at the given location.typesystem (
TypeSystem
) – The type system that belongs to this CAS. If None, an empty type system is provided.lenient (
bool
) – If True, unknown Types will be ignored. If False, unknown Types will cause an exception. The default is False.trusted (
bool
) – If True, disables checks like XML parser security restrictions.
- Return type:
- Returns:
The deserialized CAS
- cassis.load_typesystem(source)[source]
Loads a type system from a XML source.
- Parameters:
source (
Union
[IO
,str
,Path
]) – The XML source. If source is a string, then it is assumed to be an XML string. If source is a file-like object, then the data is read from it. If source is a Path, then load the file at the given location.- Return type:
- Returns:
The deserialized type system
- cassis.merge_typesystems(*typesystems)[source]
Merges several type systems into one.
If a type is defined in two source file systems, then the features of all of the these types are joined together in+ the target type system. The exact rules are outlined in https://uima.apache.org/d/uimaj-2.10.4/references.html#ugr.ref.cas.typemerging .
- Parameters:
*typesystems (
TypeSystem
) – The type systems to merge- Return type:
- Returns:
A new type system that is the result of merging all of the type systems together.
- class cassis.typesystem.Type(name, supertype, description=None, typesystem=None, children=NOTHING, features=NOTHING, inherited_features=NOTHING, constructor=None, cached_all_features=None)[source]
Describes types in a type system.
Instances of this class should not be created by hand, instead the type system’s create_type should be used.
- __call__(**kwargs)[source]
Creates an feature structure of this type
When called with keyword arguments whose keys are the feature names and values are the respective feature values, then a new feature structure instance is created.
- Return type:
FeatureStructure
- Returns:
A new feature structure instance of this type.
-
description:
str
Description of this type
-
name:
str
Type name of this type