API Reference

UIMA CAS processing library in Python.

class cassis.Cas(typesystem=None, lenient=False, sofa_string=None, sofa_mime=None, document_language=None)[source]

A CAS object is a container for text (sofa) and annotations

add(annotation, keep_id=True)[source]

Adds an annotation to this Cas.

Parameters:
  • annotation (FeatureStructure) – The annotation to add.

  • keep_id (Optional[bool]) – Keep the XMI id of annotation if true, else generate a new one.

add_all(annotations)[source]

Adds several annotations at once to this CAS.

Parameters:

annotations (Iterable[FeatureStructure]) – An iterable of annotations to add.

add_annotation(annotation, keep_id=True)[source]

Adds an annotation to this Cas.

Parameters:
  • annotation (FeatureStructure) – The annotation to add.

  • keep_id (Optional[bool]) – Keep the XMI id of annotation if true, else generate a new one.

Deprecated since version Use: add()

add_annotations(annotations)[source]

Adds several annotations at once to this CAS.

Parameters:

annotations (Iterable[FeatureStructure]) – An iterable of annotations to add.

Deprecated since version Use: add_all()

create_view(name, xmiID=None, sofaNum=None)[source]

Create a view and its underlying Sofa (subject of analysis).

Parameters:
  • name (str) – The name of the view. This is the same as the associated Sofa name.

  • xmiID (Optional[int]) – If specified, use this XMI id instead of generating a new one.

  • sofaNum (Optional[int]) – If specified, use this sofaNum instead of generating a new one.

Return type:

Cas

Returns:

The newly created view.

Raises:

ValueError – If a view with name already exists.

property document_language: str

The document language contains the language code for the document.

Returns: The document language.

get_covered_text(annotation)[source]

Gets the text that is covered by annotation.

Parameters:

annotation (FeatureStructure) – The annotation whose covered text is to be retrieved.

Return type:

str

Returns:

The text covered by annotation

Deprecated since version Use: annotation.get_covered_text()

get_document_annotation()[source]

Get the DocumentAnnotation feature structure associated with this CAS view. If none exists, one is created.

Return type:

FeatureStructure

Returns:

The DocumentAnnotation associated with this CAS view.

get_sofa()[source]

Get the Sofa feature structure associated with this CAS view.

Return type:

Sofa

Returns:

The sofa associated with this CAS view.

get_view(name)[source]

Gets an existing view.

Parameters:

name (str) – The name of the view. This is the same as the associated Sofa name.

Return type:

Cas

Returns:

The view corresponding to name

remove(annotation)[source]

Removes an annotation from an index. This throws if the annotation was not present.

Parameters:

annotation (FeatureStructure) – The annotation to remove.

remove_annotation(annotation)[source]

Removes an annotation from an index. This throws if the annotation was not present.

Parameters:

annotation (FeatureStructure) – The annotation to remove.

Deprecated since version Use: remove()

select(type_)[source]

Finds all annotations of type type_name.

Parameters:

type – The type or name of the type name whose annotation instances are to be found

Return type:

List[FeatureStructure]

Returns:

A list of all feature structures of type type_name

select_all()[source]

Finds all feature structures in this Cas

Return type:

List[FeatureStructure]

Returns:

A list of all annotations in this Cas

select_covered(type_, covering_annotation)[source]

Returns a list of covered annotations.

Return all annotations that are covered

Only returns annotations that are fully covered, overlapping annotations are ignored.

Parameters:
  • type – The type or name of the type name whose annotation instances are to be found

  • covering_annotation (FeatureStructure) – The name of the annotation which covers

Return type:

List[FeatureStructure]

Returns:

A list of covered annotations

select_covering(type_, covered_annotation)[source]

Returns a list of annotations that cover the given annotation.

Return all annotations that are covering. This can be potentially be slow.

Only returns annotations that are fully covering, overlapping annotations are ignored.

Parameters:
  • type – The type or name of the type name whose annotation instances are to be found

  • covered_annotation (FeatureStructure) – The name of the annotation which is covered

Return type:

List[FeatureStructure]

Returns:

A list of covering annotations

property sofa_array: str

The sofa byte array references a uima.cas.ByteArray feature structure

Returns: The sofa data byte array.

property sofa_mime: str

The sofa mime contains the MIME type of the document text.

Returns: The sofa MIME type.

property sofa_string: str

The sofa string contains the document text.

Returns: The sofa string.

property sofa_uri: str

The sofa URI references external sofa data.

Returns: The sofa URI.

property sofas: List[Sofa]

Finds all sofas that this CAS manages

Returns:

The list of all sofas belonging to this CAS

to_json(path=None, pretty_print=False, ensure_ascii=False, type_system_mode=TypeSystemMode.FULL)[source]

Creates a JSON representation of this CAS.

Parameters:
  • path (Union[str, Path, None]) – File path, if None is provided the result is returned as a string

  • pretty_print (bool) – True if the resulting JSON should be pretty-printed, else False

  • ensure_ascii – Whether to escape non-ASCII Unicode characters or not

  • type_system_mode (TypeSystemMode) – Whether to serialize the full type system (FUL), only the types used (MINIMAL), or no type system information at all (NONE)

Return type:

Optional[str]

Returns:

If path is None, then the JSON representation of this CAS is returned as a string

to_xmi(path=None, pretty_print=False)[source]

Creates a XMI representation of this CAS.

Parameters:
  • path (Union[str, Path, None]) – File path, if None is provided the result is returned as a string

  • pretty_print (bool) – True if the resulting XML should be pretty-printed, else False

Return type:

Optional[str]

Returns:

If path is None, then the XMI representation of this CAS is returned as a string

typecheck()[source]

Checks whether all feature structures in this CAS are type sound.

For more information, see cassis.TypesSystem::typecheck.

Return type:

List[TypeCheckError]

Returns:

List of type errors found, empty list of no errors were found.

property typesystem: TypeSystem
property views: List[View]

Finds all views that this CAS manages.

Returns:

The list of all views belonging to this CAS.

class cassis.Sofa(type, sofaNum, xmiID, sofaID, sofaString=None, mimeType=None, sofaURI=None, sofaArray=None, offset_converter=NOTHING)[source]

Each CAS has one or more Subject of Analysis (SofA)

mimeType

The mime type of sofaString

Type:

str

sofaArray

The sofa data byte array

Type:

str

sofaID

The name of the sofa, i.e. the sofa ID

Type:

str

sofaNum

The sofaNum

Type:

int

property sofaString: str
sofaURI

The sofa URI, it references remote sofa data

Type:

str

type

The type

Type:

“Type”

xmiID

The XMI id

Type:

int

class cassis.TypeSystem(add_document_annotation_type=True)[source]
add_feature(type_, name, rangeTypeName, elementType=None, description=None, multipleReferencesAllowed=None)[source]

Adds a feature to the given type. :param type_: The type to which the feature will be added :type name: str :param name: The name of the new feature :type rangeTypeName: str :param rangeTypeName: The feature’s rangeTypeName specifies the type of value that the feature can take. :type elementType: str :param elementType: The elementType of a feature is optional, and applies only when the rangeTypeName

is uima.cas.FSArray or uima.cas.FSList The elementType specifies what type of value can be assigned as an element of the array or list.

Parameters:
  • description (str) – The description of the new feature

  • multipleReferencesAllowed (bool) – Setting this to true indicates that the array or list may be shared, so changes to it may affect other objects in the CAS.

Raises:

Exception – If a feature with name name already exists in type_.

Deprecated since version Use: create_feature

contains_type(typename)[source]

Checks whether this type system contains a type with name typename.

Parameters:

typename (str) – The name of type whose existence is to be checked.

Returns:

True if a type with typename exists, else False.

create_feature(domainType, name, rangeType, elementType=None, description=None, multipleReferencesAllowed=None)[source]

Adds a feature to the given type.

Parameters:
  • domainType (Union[Type, str]) – The type to which the feature will be added

  • name (str) – The name of the new feature

  • rangeType (Union[Type, str]) – The feature’s rangeTypeName specifies the type of value that the feature can take.

  • elementType (Union[Type, str]) – The elementType of a feature is optional, and applies only when the rangeTypeName is uima.cas.FSArray or uima.cas.FSList The elementType specifies what type of value can be assigned as an element of the array or list.

  • description (str) – The description of the new feature

  • multipleReferencesAllowed (bool) – Setting this to true indicates that the array or list may be shared, so changes to it may affect other objects in the CAS.

Raises:

Exception – If a feature with name name already exists in type_.

Return type:

Feature

create_type(name, supertypeName='uima.tcas.Annotation', description=None)[source]

Creates a new type and return it.

Parameters:
  • name (str) – The name of the new type

  • supertypeName (str) – The name of the new types’ supertype. Defaults to uima.cas.AnnotationBase

  • description (str) – The description of the new type

Return type:

Type

Returns:

The newly created type

get_type(type_name)[source]

Finds a type by name in the type system of this CAS.

Parameters:

typename – The name of the type to retrieve

Return type:

Type

Returns:

The type with name typename

Raises:

Exception – If no type with typename could be found.

get_types(built_in=False)[source]

Returns all types of this type system. Normally, this excludes the built-in types

Parameters:

built_in (bool) – Also include the built-in types

Return type:

Iterator[Type]

is_array(type_)[source]

Checks if the type identified by type is an array.

Parameters:

type – Type to query for (Type or name as string)

Return type:

bool

Returns:

Returns True if the type identified by type is an array type, else False

is_collection(type_, feature)[source]

Checks if the given feature for the type identified by ``type_`is a collection, e.g. list or array.

Parameters:
  • type – The type to which the feature belongs (Type or name as string)

  • feature (Feature) – The feature to query for.

Return type:

bool

Returns:

Returns True if the given feature is a collection type, else False

is_instance_of(type_, parent)[source]
Return type:

bool

is_list(type_)[source]

Checks if the type identified by type is a list.

Parameters:

type – Type to query for (Type or name as string)

Return type:

bool

Returns:

Returns True if the type identified by type is a list type, else False

is_primitive(type_)[source]

Checks if the type identified by type_name is a primitive type.

Parameters:

type – Type to query for (Type or name as string)

Return type:

bool

Returns:

Returns True if the type identified by type is a primitive type, else False

is_primitive_array(type_)[source]

Checks if the type identified by type is a primitive array, e.g. array of primitives.

Parameters:

type – Type to query for (Type or name as string)

Return type:

bool

Returns:

Returns True if the type identified by type is a primitive array type, else False

is_primitive_collection(type_)[source]

Checks if the type identified by type is a primitive collection, e.g. list or array of primitives.

Parameters:

type – Type to query for (Type or name as string)

Return type:

bool

Returns:

Returns True if the type identified by type is a primitive collection type, else False

is_primitive_list(type_)[source]

Checks if the type identified by type is a primitive list, e.g. list of primitives.

Parameters:

type – Type to query for (Type or name as string)

Return type:

bool

Returns:

Returns True if the type identified by type is a primitive array type, else False

subsumes(parent, child)[source]

Determines if the type child is a child of parent.

Parameters:
  • parent_name – Parent type (Type or name as string)

  • child_name – Child type (Type or name as string)

Return type:

bool

Returns:

True if parent subsumes child else False

to_xml(path=None)[source]

Creates a XMI representation of this type system.

Parameters:

path (Union[str, Path, None]) – File path or file-like object, if None is provided the result is returned as a string.

Return type:

Optional[str]

Returns:

If path is None, then the XML representation of this type system is returned as a string.

transitive_closure(seed_types, built_in=False)[source]
Return type:

Set[Type]

typecheck(fs)[source]

Checks whether a feature structure is type sound.

Currently only checks uima.cas.FSArray.

Parameters:

fs (FeatureStructure) – The feature structure to type check.

Return type:

List[TypeCheckError]

Returns:

List of type errors found, empty list of no errors were found.

class cassis.View(sofa)[source]

A view into a CAS contains a subset of feature structures and annotations.

add_annotation_to_index(annotation)[source]
get_all_annotations()[source]

Gets all the annotations in this view.

Return type:

List[FeatureStructure]

Returns:

A list of all annotations in this view.

remove_annotation_from_index(annotation)[source]

Removes an annotation from an index. This throws if the annotation was not present.

Parameters:

annotation (FeatureStructure) – The annotation to remove.

property type_index: Dict[str, SortedKeyList]

Returns an index mapping type names to annotations of this type.

Returns:

A dictionary mapping type names to annotations of this type.

cassis.cas_to_comparable_text(cas, out=None, seeds=None, mark_indexed=True, covered_text=True, exclude_types=None)[source]
Return type:

[<class ‘str’>, None]

cassis.load_cas_from_json(source, typesystem=None, lenient=False, merge_typesystem=True)[source]

Loads a CAS from a JSON source.

Parameters:
  • source (Union[IO, str]) – The JSON source. If source is a string, then it is assumed to be an JSON string. If source is a file-like object, then the data is read from it.

  • typesystem (TypeSystem) – The type system that belongs to this CAS. If None, an empty type system is provided.

  • lenient (bool) – If True, unknown Types will be ignored. If False, unknown Types will cause an exception. The default is False.

Return type:

Cas

Returns:

The deserialized CAS

cassis.load_cas_from_xmi(source, typesystem=None, lenient=False, trusted=False)[source]

Loads a CAS from a XMI source.

Parameters:
  • source (Union[IO, Path, str]) – The XML source. If source is a string, then it is assumed to be an XML string. If source is a file-like object, then the data is read from it. If source is a Path, then load the file at the given location.

  • typesystem (TypeSystem) – The type system that belongs to this CAS. If None, an empty type system is provided.

  • lenient (bool) – If True, unknown Types will be ignored. If False, unknown Types will cause an exception. The default is False.

  • trusted (bool) – If True, disables checks like XML parser security restrictions.

Return type:

Cas

Returns:

The deserialized CAS

cassis.load_dkpro_core_typesystem()[source]
Return type:

TypeSystem

cassis.load_typesystem(source)[source]

Loads a type system from a XML source.

Parameters:

source (Union[IO, str, Path]) – The XML source. If source is a string, then it is assumed to be an XML string. If source is a file-like object, then the data is read from it. If source is a Path, then load the file at the given location.

Return type:

TypeSystem

Returns:

The deserialized type system

cassis.merge_typesystems(*typesystems)[source]

Merges several type systems into one.

If a type is defined in two source file systems, then the features of all of the these types are joined together in+ the target type system. The exact rules are outlined in https://uima.apache.org/d/uimaj-2.10.4/references.html#ugr.ref.cas.typemerging .

Parameters:

*typesystems (TypeSystem) – The type systems to merge

Return type:

TypeSystem

Returns:

A new type system that is the result of merging all of the type systems together.

class cassis.typesystem.Type(name, supertype, description=None, typesystem=None, children=NOTHING, features=NOTHING, inherited_features=NOTHING, constructor=None, cached_all_features=None)[source]

Describes types in a type system.

Instances of this class should not be created by hand, instead the type system’s create_type should be used.

__call__(**kwargs)[source]

Creates an feature structure of this type

When called with keyword arguments whose keys are the feature names and values are the respective feature values, then a new feature structure instance is created.

Return type:

FeatureStructure

Returns:

A new feature structure instance of this type.

description: str

Description of this type

name: str

Type name of this type

class cassis.typesystem.Feature(name, domainType, rangeType, description=None, elementType=None, multipleReferencesAllowed=None, has_reserved_name=False)[source]

A feature defines one attribute of a feature structure