To access the filters entity click here.
11. The Dataloop Query Language - DQL¶
Using The Dataloop Query Language, you may navigate through massive amounts of data.
You can filter, sort, and update your metadata with it.
11.1. Filters¶
Using filters, you can filter items and get a generator of the filtered items. The filters entity is used to build such filters.
11.1.1. Filters - Field & Value¶
Filter your items or annotations using the parameters in the JSON code that represent its data within our system.Access your item/annotation JSON using to_json()
.
11.1.1.1. Field¶
Field refers to the attributes you filter by.
For example, “dir” would be used if you wish to filter items by their folder/directory.
11.1.1.2. Value¶
Value refers to the input by which you want to filter.For example, “/new_folder” can be the directory/folder name where the items you wish to filter are located.
11.1.2. Sort - Field & Value¶
11.1.2.1. Field¶
Field refers to the field you sort your items/annotations list by.For example, if you sort by filename, you will get the item list sorted in alphabetical order by filename.See the full list of the available fields here.
11.1.2.2. Value¶
Value refers to the list order direction. Either ascending or descending.
11.2. Filter Items¶
Filter items by the item’s JSON fields.In this example, you will get all annotated items in a dataset sorted by the filename.
See all of the items iterator options on the Iterator of Items page.
import dtlpy as dl
# Get project and dataset
project = dl.projects.get(project_name='project_name')
dataset = project.datasets.get(dataset_name='dataset_name')
# Create filters instance
filters = dl.Filters()
# Filter only annotated items
filters.add(field='annotated', values=True)
# optional - return results sorted by ascending file name
filters.sort_by(field="filename")
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.3. Filter Items by the Items’ Annotations¶
add_join
- filter items by the items’ annotations JSON fields. For example, filter only items with ‘box’ annotations.
See all of the items iterator options on the Iterator of Items page.
filters = dl.Filters()
# Filter all approved items
filters.add(field='metadata.system.annotationStatus', values="approved")
# AND filter items by their annotation - only items with 'box' annotations
# Meaning you will get approved items with 'box' annotations
filters.add_join(field='type', values='box')
# optional - return results sorted by descending creation date
filters.sort_by(field='createdAt', value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.4. Filters Method - “Or” and “And”¶
For more advanced filters operators visit the Advanced SDK Filters page.
11.4.1. And¶
If you wish to filter annotations with the “and” logical operator, you can do so by specifying which filters will be checked with “and”.
filters = dl.Filters() # filters with and
filters.add(field='annotated', values=True, method=dl.FiltersMethod.AND)
filters.add(field='metadata.user.is_automated', values=True,
method=dl.FiltersMethod.AND) # optional - return results sorted by ascending file name
filters.sort_by(field='name')
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.4.2. Or¶
If you wish to filter annotations with the “or” logical operator, you can do so by specifying which filters will be checked with “or”.In this example, you will get a list of items that are either on “folder1” or “folder2” directories.
filters = dl.Filters()
# filters with or
filters.add(field='dir', values='/folderName1', method=dl.FiltersMethod.OR)
filters.add(field='dir', values='/folderName2',
method=dl.FiltersMethod.OR) # optional - return results sorted by descending directory name
filters.sort_by(field='dir', value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered items list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.5. Update User Metadata of Filtered Items¶
Update Filtered Items - The ‘update_value’ must be a dictionary.The dictionary will only update user metadata.Understand more about user metadata <a href=https://github.com/dataloop-ai/dtlpy-documentation/blob/main/tutorials/data_management/working_with_metadata/chapter.md/” target=”_blank”>here.In this example, you will update/add user metadata (with the field “BlackDogs” and value True), to items in a specific folder ‘dogs’ with an attribute ‘black’.
filters = dl.Filters()
# For example - filter only items in a specific folder - like 'dogs'
filters.add(field='dir', values='/dogs')
# For example - filter items by their annotation - only items with 'black' attribute
filters.add_join(field='attributes', values='black')
# to add filed BlackDogs to all filtered items and give value True
# this field will be added to user metadata
# create update order
update_values = {'BlackDogs': True}
# update
pages = dataset.items.update(filters=filters, update_values=update_values)
11.6. Delete Filtered Items¶
In this example, you will delete items that were created on 30/8/2020 at 8:17 AM.
filters = dl.Filters()
# For example - filter only annotated items
filters.add(field='createdAt', values="2020-08-30T08:17:08.000Z")
dataset.items.delete(filters=filters)
11.7. Item Filtering Fields¶
11.7.1. More Filter Options¶
{
"id": "5f4b60848ced1d50c3df114a",
"datasetId": "5f4b603d9825b9f191bbd3b3",
"createdAt": "2020-08-30T08:17:08.000Z",
"dir": "/new_folder",
"filename": "/new_folder/optional.jpg",
"type": "file",
"hidden": false,
"metadata": {
"system": {
"originalname": "file",
"size": 3290035,
"encoding": "7bit",
"mimetype": "image/jpeg",
"annotationStatus": [
"completed"
],
"refs": [
{
"type": "task",
"id": "5f4b61f8f81ab6238c331bd2"
},
{
"type": "assignment",
"id": "5f4b61f8f81ab60508331bd3"
}
],
"executionLogs": {
"image-metadata-extractor": {
"default_module": {
"run": {
"5f4b60841b892d82eaa2d95b": {
"progress": 100,
"status": "success"
}
}
}
}
},
"exif": {},
"height": 2734,
"width": 4096,
"statusLog": [
{
"status": "completed",
"timestamp": "2020-08-30T14:54:17.014Z",
"creator": "user@dataloop.ai",
"action": "created"
}
],
"isBinary": true
}
},
"name": "optional.jpg",
"url": "https://gate.dataloop.ai/api/v1/items/5f4b60848ced1d50c3df114a",
"dataset": "https://gate.dataloop.ai/api/v1/datasets/5f4b603d9825b9f191bbd3b3",
"annotationsCount": 18,
"annotated": "discarded",
"stream": "https://gate.dataloop.ai/api/v1/items/5f4b60848ced1d50c3df114a/stream",
"thumbnail": "https://gate.dataloop.ai/api/v1/items/5f4b60848ced1d50c3df114a/thumbnail",
"annotations": "https://gate.dataloop.ai/api/v1/items/5f4b60848ced1d50c3df114a/annotations"
}
11.8. Full Examples¶
11.8.1. How to filter items by their annotations label?¶
filters = dl.Filters()
filters.add_join(field='label', values='your_label_value')
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
11.8.2. How to filter items by completed and approved status?¶
filters = dl.Filters()
filters.add(field='metadata.system.annotationStatus', values=["completed", "approved"])
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.8.3. How to filter items by completed status (with items who are approved as well)?¶
filters = dl.Filters()
# set resource
filters.add(field='metadata.system.annotationStatus', values="completed")
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.8.4. How to filter items by only completed status?¶
filters = dl.Filters()
filters.add(field='metadata.system.annotationStatus', values=["completed"])
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.8.5. How to filter unassigned items?¶
filters = dl.Filters()
filters.add(field='metadata.system.refs', values=[])
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.8.6. How to filter items by a specific folder?¶
filters = dl.Filters()
filters.add(field='dir', values="/folderName")
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of items in dataset: {}'.format(pages.items_count))
11.8.7. Get all items named foo.bar¶
filters = dl.Filters()
filters.add(field='name', values='foo.bar.*')
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
11.8.8. Sort files of size 0-5 MB by name, in ascending order¶
filters = dl.Filters()
filters.add(field='metadata.system.size', values='0', operator='gt')
filters.add(field='metadata.system.size', values='5242880', operator='lt')
filters.sort_by(field='filename', value=dl.FILTERS_ORDERBY_DIRECTION_ASCENDING)
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
11.8.9. Sort with multiple fields: Sort Items by labels ascending and createdAt descending¶
filters = dl.Filters()
# set annotation resource
filters.resource = dl.FiltersResource.ANNOTATION
# return results sorted by descending label
filters.sort_by(field='label', value=dl.FILTERS_ORDERBY_DIRECTION_ASCENDING)
filters.sort_by(field='createdAt', value=dl.FILTERS_ORDERBY_DIRECTION_DESCENDING)
# Get filtered item list in a page object
pages = dataset.items.list(filters=filters)
# Count the items
print('Number of filtered items in dataset: {}'.format(pages.items_count))
11.9. Advanced Filtering Operators¶
Explore advanced filtering options on this page.
11.10. Response to DQL Query¶
A typical response to a DQL query will look like the following:
{
"totalItemsCount": number,
"items": Array,
"totalPagesCount": number,
"hasNextPage": boolean,
}
# A possible result:
{
"totalItemsCount": 2,
"totalPagesCount": 1,
"hasNextPage": false,
"items": [
{
"id": "5d0783852dbc15306a59ef6c",
"createdAt": "2019-06-18T23:29:15.775Z",
"filename": "/5546670769_8df950c6b6.jpg",
"type": "file"
// ...
},
{
"id": "5d0783852dbc15306a59ef6d",
"createdAt": "2019-06-19T23:29:15.775Z",
"filename": "/5551018983_3ce908ac98.jpg",
"type": "file"
// ...
}
]
}
11.11. Using Custom DQL Filter¶
If you have a DQL JSON copied from the platform you can create an SDK Filter directly with it using the “custom_filter” attribute:
filters = dl.Filters(custom_filter={"$and": [{"hidden": False},
{"type": "file"},
{"annotated": True}]},
)
pages = dataset.items.list(filters=filters)
print('Number of filtered items in dataset: {}'.format(pages.items_count))