GreatAI reference#
Core#
GreatAI
#
Bases: Generic[T, V]
Wrapper for a prediction function providing the implementation of SE4ML best practices.
Provides caching (with argument freezing), a TracingContext during execution, the scaffolding of HTTP endpoints using FastAPI and a dashboard using Dash.
IMPORTANT: when a request is served from cache, no new trace is created. Thus, the
same trace can be returned multiple times. If this is undesirable turn off caching
using configure(prediction_cache_size=0)
.
Supports wrapping async and synchronous functions while also maintaining correct typing.
Attributes:
Name | Type | Description |
---|---|---|
app |
FastAPI instance wrapping the scaffolded endpoints and the Dash app. |
|
version |
SemVer derived from the app's version and the model names and versions registered through use_model. |
Source code in great_ai/deploy/great_ai.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 |
|
create(func)
staticmethod
#
Decorate a function by wrapping it in a GreatAI instance.
The function can be typed, synchronous or async. If it has unwrapped parameters (parameters not affected by a @parameter or @use_model decorator), those will be automatically wrapped.
The return value is replaced by a Trace (or Awaitable[Trace]),
while the original return value is available under the .output
property.
For configuration options, see great_ai.configure.
Examples:
>>> @GreatAI.create
... def my_function(a: int) -> int:
... return a + 2
>>> my_function(3)
Trace[int]...
>>> my_function('3').output
Traceback (most recent call last):
...
typeguard.TypeCheckError: str is not an instance of int
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func |
Union[Callable[..., Awaitable[V]], Callable[..., V]]
|
The prediction function that needs to be decorated. |
required |
Returns:
Type | Description |
---|---|
Union[GreatAI[Awaitable[Trace[V]], V], GreatAI[Trace[V], V]]
|
A GreatAI instance wrapping |
Source code in great_ai/deploy/great_ai.py
process_batch(batch, *, concurrency=None, unpack_arguments=False, do_not_persist_traces=False)
#
Map the wrapped function over a list of input_values (batch
).
A wrapper over parallel_map providing type-safety and a progressbar through tqdm.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch |
Sequence
|
A list of arguments for the original (wrapped) function. If the
function expects multiple arguments, provide a list of tuples and set
|
required |
concurrency |
Optional[int]
|
Number of processes to start. Don't set it too much higher than the number of available CPU cores. |
None
|
unpack_arguments |
bool
|
Expect a list of tuples and unpack the tuples before giving them to the wrapped function. |
False
|
do_not_persist_traces |
bool
|
Don't save the traces in the database. Useful for evaluations run part of the CI. |
False
|
Source code in great_ai/deploy/great_ai.py
configure(*, version='0.0.1', log_level=DEBUG, seed=42, tracing_database_factory=None, large_file_implementation=None, should_log_exception_stack=None, prediction_cache_size=512, disable_se4ml_banner=False, dashboard_table_size=50, route_config=RouteConfig())
#
Set the global configuration used by the great-ai library.
You must call configure
before calling (or decorating with) any other great-ai
function.
If tracing_database_factory
or large_file_implementation
is not specified, their
default value is determined based on which TracingDatabase and LargeFile has been
configured (e.g.: LargeFileS3.configure_credentials_from_file('s3.ini')), or whether
there is any file named s3.ini or mongo.ini in the working directory.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
version |
Union[int, str]
|
The version of your application (using SemVer is recommended). |
'0.0.1'
|
log_level |
int
|
Set the default logging level of |
DEBUG
|
seed |
int
|
Set seed of |
42
|
tracing_database_factory |
Optional[Type[TracingDatabaseDriver]]
|
Specify a different TracingDatabaseDriver than the one already configured. |
None
|
large_file_implementation |
Optional[Type[LargeFileBase]]
|
Specify a different LargeFile than the one already configured. |
None
|
should_log_exception_stack |
Optional[bool]
|
Log the traces of unhandled exceptions. |
None
|
prediction_cache_size |
int
|
Size of the LRU cache applied over the prediction functions. |
512
|
disable_se4ml_banner |
bool
|
Turn off the warning about the importance of SE4ML best- practices. |
False
|
dashboard_table_size |
int
|
Number of rows to display in the dashboard's table. |
50
|
route_config |
RouteConfig
|
Enable or disable specific HTTP API endpoints. |
RouteConfig()
|
Source code in great_ai/context.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
save_model(model, key, *, keep_last_n=None)
#
Save (and optionally serialise) a model in order to use by use_model
.
The model
can be a Path or string representing a path in which case the
local file/folder is read and saved using the current LargeFile implementation.
In case model
is an object, it is serialised using dill
before uploading it.
Examples:
>>> @use_model('my_number')
... def my_function(a, model):
... return a + model
>>> my_function(4)
7
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model |
Union[Path, str, object]
|
The object or path to be uploaded. |
required |
key |
str
|
The model's name. |
required |
keep_last_n |
Optional[int]
|
If specified, remove old models and only keep the latest n. Directly passed to LargeFile. |
None
|
Returns: The key and version of the saved model separated by a colon. Example: "key:version"
Source code in great_ai/models/save_model.py
use_model(key, *, version='latest', model_kwarg_name='model')
#
Inject a model into a function.
Load a model specified by key
and version
using the currently active LargeFile
implementation. If it's a single object, it is deserialised using dill
. If it's a
directory of files, a pathlib.Path
instance is given.
By default, the function's model
parameter is replaced by the loaded model. This
can be customised by changing model_kwarg_name
. Multiple models can be loaded by
decorating the same function with use_model
multiple times.
Examples:
>>> from great_ai import save_model
>>> save_model(3, 'my_number')
'my_number:...'
>>> @use_model('my_number')
... def my_function(a, model):
... return a + model
>>> my_function(4)
7
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
The model's name as stored by the LargeFile implementation. |
required |
version |
Union[int, Literal['latest']]
|
The model's version as stored by the LargeFile implementation. |
'latest'
|
model_kwarg_name |
str
|
the parameter to use for injecting the loaded model |
'model'
|
Returns: A decorator for model injection.
Source code in great_ai/models/use_model.py
parameter(parameter_name, *, validate=lambda : True, disable_logging=False)
#
Control the validation and logging of function parameters.
Basically, a parameter decorator. Unfortunately, Python does not have that concept, thus, it's a method decorator that expects the name of the to-be-decorated parameter.
Examples:
>>> @parameter('a')
... def my_function(a: int):
... return a + 2
>>> my_function(4)
6
>>> my_function('3')
Traceback (most recent call last):
...
typeguard.TypeCheckError: str is not an instance of int
>>> @parameter('positive_a', validate=lambda v: v > 0)
... def my_function(positive_a: int):
... return a + 2
>>> my_function(-1)
Traceback (most recent call last):
...
great_ai.errors.argument_validation_error.ArgumentValidationError: ...
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parameter_name |
str
|
Name of parameter to consider. |
required |
validate |
Callable[[Any], bool]
|
Optional validate to run against the concrete argument. ArgumentValidationError is thrown when the return value is False. |
lambda : True
|
disable_logging |
bool
|
Do not save the value in any active TracingContext. |
False
|
Returns: A decorator for argument validation.
Source code in great_ai/parameters/parameter.py
log_metric(argument_name, value, disable_logging=False)
#
Log a key (argument_name)-value pair that is persisted inside the trace.
The name of the function from where this is called is also stored.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
argument_name |
str
|
The key for storing the value. |
required |
value |
Any
|
Value to log. Must be JSON-serialisable. |
required |
disable_logging |
bool
|
If True, only persist in trace but don't show in console |
False
|
Source code in great_ai/parameters/log_metric.py
Remote calls#
call_remote_great_ai(base_uri, data, retry_count=4, timeout_in_seconds=300, model_class=None)
#
Communicate with a GreatAI object through an HTTP request.
Wrapper over call_remote_great_ai_async
making it synchronous. For more info, see
call_remote_great_ai_async
.
Source code in great_ai/remote/call_remote_great_ai.py
call_remote_great_ai_async(base_uri, data, retry_count=4, timeout_in_seconds=300, model_class=None)
async
#
Communicate with a GreatAI object through an HTTP request.
Send a POST request using httpx to implement a remote call. Error-handling and retries are provided by httpx.
The return value is inflated into a Trace. If model_class
is specified, the
original output is deserialised.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
base_uri |
str
|
Address of the remote instance, example: 'http://localhost:6060' |
required |
data |
Mapping[str, Any]
|
The input sent as a json to the remote instance. |
required |
retry_count |
int
|
Retry on any HTTP communication failure. |
4
|
timeout_in_seconds |
Optional[int]
|
Overall permissible max length of the request. |
300
|
model_class |
Optional[Type[T]]
|
A subtype of BaseModel to be used for deserialising the |
None
|
Source code in great_ai/remote/call_remote_great_ai_async.py
Ground-truth#
add_ground_truth(inputs, expected_outputs, *, tags=[], train_split_ratio=1, test_split_ratio=0, validation_split_ratio=0)
#
Add training data (with optional train-test splitting).
Add and tag data-points, wrap them into traces. The inputs
are available via the
.input
property, while expected_outputs
under both the .output
and .feedback
properties.
All generated traces are tagged with ground_truth
by default. Additional tags can
be also provided. Using the split_ratio
arguments, tags can be given randomly with
a user-defined distribution. Only the ratio of the splits matter, they don't have to
add up to 1.
Examples:
>>> add_ground_truth(
... [1, 2, 3],
... ['odd', 'even', 'odd'],
... tags='my_tag',
... train_split_ratio=1,
... test_split_ratio=1,
... validation_split_ratio=0.5,
... )
>>> add_ground_truth(
... [1, 2],
... ['odd', 'even', 'odd'],
... tags='my_tag',
... train_split_ratio=1,
... test_split_ratio=1,
... validation_split_ratio=0.5,
... )
Traceback (most recent call last):
...
AssertionError: The length of the inputs and expected_outputs must be equal
Parameters:
Name | Type | Description | Default |
---|---|---|---|
inputs |
Iterable[Any]
|
The inputs. (X in scikit-learn) |
required |
expected_outputs |
Iterable[T]
|
The ground-truth values corresponding to the inputs. (y in scikit-learn) |
required |
tags |
Union[List[str], str]
|
A single tag or a list of tags to append to each generated trace's tags. |
[]
|
train_split_ratio |
float
|
The probability-weight of giving each trace the |
1
|
test_split_ratio |
float
|
The probability-weight of giving each trace the |
0
|
validation_split_ratio |
float
|
The probability-weight of giving each trace the
|
0
|
Source code in great_ai/tracing/add_ground_truth.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
query_ground_truth(conjunctive_tags=[], *, since=None, until=None, return_max_count=None)
#
Return training samples.
Combines, filters, and returns data-points that have been either added by
add_ground_truth
or were the result of a prediction after which their trace got
feedback through the RESP API-s /traces/{trace_id}/feedback
endpoint
(end-to-end feedback).
Filtering can be used to only return points matching all given tags (or the single given tag) and by time of creation.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conjunctive_tags |
Union[List[str], str]
|
Single tag or a list of tags which the returned traces have to match. The relationship between the tags is conjunctive (AND). |
[]
|
since |
Optional[datetime]
|
Only return traces created after the given timestamp. |
None
|
until |
Optional[datetime]
|
Only return traces created before the given timestamp. |
None
|
return_max_count |
Optional[int]
|
Return at-most this many traces. (take, limit) |
None
|
Source code in great_ai/tracing/query_ground_truth.py
delete_ground_truth(conjunctive_tags=[], *, since=None, until=None)
#
Delete traces matching the given criteria.
Takes the same arguments as query_ground_truth
but instead of returning them,
it simply deletes them.
You can rely on the efficiency of the delete's implementation.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
conjunctive_tags |
Union[List[str], str]
|
Single tag or a list of tags which the deleted traces have to match. The relationship between the tags is conjunctive (AND). |
[]
|
since |
Optional[datetime]
|
Only delete traces created after the given timestamp. |
None
|
until |
Optional[datetime]
|
Only delete traces created before the given timestamp. |
None
|
Source code in great_ai/tracing/delete_ground_truth.py
Tracing databases#
TracingDatabaseDriver
#
Bases: ABC
Interface expected from a database to be used for storing traces.
Source code in great_ai/persistence/tracing_database_driver.py
MongoDbDriver
#
Bases: TracingDatabaseDriver
TracingDatabaseDriver implementation using MongoDB as a backend.
A production-ready database driver suitable for efficiently handling semi-structured data.
Checkout MongoDB Atlas for a hosted MongoDB solution.
Source code in great_ai/persistence/mongodb_driver.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
configure_credentials(*, mongo_connection_string, mongo_database, **_)
classmethod
#
Configure the connection details for MongoDB.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mongo_connection_string |
str
|
For example: 'mongodb://my_user:my_pass@localhost:27017' |
required |
mongo_database |
str
|
Name of the database to use. If doesn't exist, it is created and initialised. |
required |
Source code in great_ai/persistence/mongodb_driver.py
ParallelTinyDbDriver
#
Bases: TracingDatabaseDriver
TracingDatabaseDriver with TinyDB as a backend.
Saves the database as a JSON into a single file. Highly inefficient on inserting, not advised for production use.
A multiprocessing lock protects the database file to avoid parallelisation issues.
Source code in great_ai/persistence/parallel_tinydb_driver.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
|