BatchElements and Tasks

The components of CHEESE process data in the form of the BatchElement object. These are communicated between the components using the Task object, which essentially serves as a container for some BatchElement.

class cheese.data.BatchElement(client_id: int = - 1, trip: int = 0, trip_start: str = 'client', trip_max: int = 1, error: bool = False, start_time: float = - 1.0, end_time: float = - 1.0)[source]

Abstract base class for BatchElements. Store all kinds of data being passed around CHEESE

Parameters
  • client_id (int) – The ID of the last client that touched this data

  • trip (int) – How many targets have touched/accessed this data so far

  • trip_start (str) – First target for data (“client” or “model”) after it is queued by pipeline. Defaults to “client”.

  • trip_max (int) – How many targets can touch/access this data before it goes back to pipeline to be posted

  • error (bool) – A flag for frontend to mark the data as being erroneous (i.e. if it couldn’t be labelled properly). While it doesn’t do this by default, it is reccomended you account for errors in Pipeline.post()

  • start_time (float) – Timestamp for when data was first given to a client

  • end_time (float) – Timestamp for when data was sent back to pipeline

early_finish()[source]

Calling this sets trip to trip_max, resulting in data immediately being sent to pipeline.

total_time()[source]

Returns the time taken for this data to be processed, in seconds. Returns -1 if either timestamps are unset.

As an explanation for the trip attribute to BatchElement, consider the following cases. Suppose we want user to just label some data being read from a dataset, then write their labels to a new dataset. Then trip_max of 1 would result in the data visiting user then immediately going back to pipeline. Now suppose instead we want user to look at data and prompt a generation from a generative model, then label the generation along with the original data. We’d set trip_max of 3 for this since the data is visiting user, model then user again. Once trip becomes trip_max, the data is sent back to pipeline. In the case where it is 2, then the data will be sent back to the pipeline from the model.

class cheese.tasks.Task(data: Optional[cheese.data.BatchElement] = None, client_id: int = - 1, terminate: bool = False)[source]

Tasks to communicate between the components in the cheese.

Parameters
  • data (BatchElement) – The data contained in the task

  • client_id (int) – The ID of the client that is meant to receive this task

  • terminate (bool) – A flag to tell the client to terminate