Data Catalog

This module allows the consultation and enrichment of metadata obtained from the organization's systems. For this, search, filtering and browsing options are provided through the catalog. Additionally, this module connects with both the glossary of concepts and the data quality glossary to allow navigation between the different modules.

When accessing the Catalog we will obtain a list of the available systems, showing for each one of them the volume of structures that have been imported as well as the count by type of structures when hovering over the number.

Systems can be grouped in order to have an organised view and you can collapse and expand each group.

You may have additional views of the catalog where structures are grouped by a criteria other than system if your admin has set them up in your installation. These alternative views will be displayed as additional menu options in the Catalog module.

From this point on we will have several options to find the structure that we want to consult.

  • Discover your system metadata: by clicking on the system you will find a tree-like navigation where you can navigate in order to find your metadata. There is also a search box that will allow you to filter the list of current structures to easily find the one you are looking for.

  • Filtering: You will be able to filter using information discovered from the data source and aditional information documented in your installation using the data catalog templates. In order to filter by technical metadata fields, these need to be configured by an admin user in structure types screen. Some filter examples: structure type, last changed date, linked to business concept, with/without extra information (notes or functional metadata).

  • Search: We will be able to search inside the complete metadata, including any manual fields, structure path and description.

  • Save filters: User will have the option to save their most used filters. In order to do that user will need to provide a name to the combination of filters and will be provided with this as an additional quickfilter every time they enter the data catalog.

Structures

When viewing the detail of a structure we can see the following information:

  • Domain: the data domain(s) the structure belongs to.

  • System: this is the metadata repository where the information is retrieved from.

  • Imported: date when the structure was included in the catalog for the first time.

  • Updated: date when the structure information was last modified in the catalog.

The detail of structures and fields will make a series of tabs available to us, showing or not depending on the casuistry.

  • Fields: Those structures that are composed of fields will have a tab that shows them. If it is a structure that does not have fields, this tab will not be displayed.

  • Notes: In the event that the type of structure has an associated template (see the corresponding administration chapter), a tab will be displayed to manage the notes information. Notes are manually managed metadata for a data structure. We include a workflow in case that this information needs to be reviewed before approval.

  • Grants: Display the permission granted to different users to the corresponding structure

To populate grants local integration need to be included

  • Profiling: It will be shown if we have profiling information for the children fields of the structure or data distribution information for the structure we are viewing.

  • Quality: The list of implementations / quality rules in which the structure we are consulting is being validated will be displayed.

  • Linkage: Tab that will always be shown showing existing links and giving the option to authorized users to generate new links.

  • Versions: The different versions available for the structure will be displayed with the option of being able to browse to consult them.

  • Audit: Details of all manual changes that have been made to the structure are displayed.

Domain modification

When metadata are loaded into truedat, a domain is assigned by the connector as specified in the data source. In the data catalog you will be able to reassign the domain of a specific data structure or assign it to more than one domain. This is specially important if you are planning to manage different permissions to data structures depending on the data domain the are assigned to. When changing the domain of a structure you will have the option to automatically change the domain of all its children.

A bulk update is also possible by uploading a csv file. From the main screen of the Data Catalog, users with the permission to 'Manage Structure Domains' will have the option to perform a bulk update of the domain(s) of several structures by uploading a csv file. The csv file must have the following format:

Column headers: external_id, domain_external_ids

external_id: the external id of the structure which domain(s) is to be updated

domain_external_ids: the external id of the new domain. If the structure has more than one domain, they must be included as a list separated by pipes (‘|’)

Fields

It shows a list of the fields of the application if you have any. For these fields, it shows the name of the field, links to business concepts, link to the traceability of the field and other metadata columns that may have been loaded when obtaining the information from the system.

From this field listing screen we can perform the following actions:

  • Navigate to the detail of a field by clicking on the field name.

  • Navigate to the traceability of a field if you have traceability for that field available.

  • Navigate to the concept associated with our field if there is any (Terms Column)

Notes

Notes are used to record extra metadata information in addition to the metadata obtained automatically by the connectors. In order to use this functionality you will have to define a template and assign it to the corresponding structure type.

The creation of notes can go through an approval workflow. There are some permissions linked to this functionality which will allow the users to complete different steps of the process in order to publish a note. In case that you do not need an approval workflow you can just setup a permission to directly publish notes and give it to roles that are allowed to modify notes.

Users will be able to perform different actions depending on permission and note status using following workflow to manage structures notes and depending on user permissions.

Viewing pending notes

Users with permission to manage notes (edit, send for approval, publish, reject or un-reject) can view a list of notes pending an action from them. You can filter by status, system and domain. From the list you can navigate to the structure by clicking on it and carry out the relevant action.

When reviewing the note, you will be able to easily identify the changes from previous versions as these will be highlighted in a different colour.

Notes download

You will be able to download a file with all the notes information for the structures filtered on your search. This file will download in the same format that you need to upload the information and it is used to easily modify notes information in a bulk way before uploading it again.

Bulk load of notes

For users with permission to create notes it will be possible to bulk upload this information from a csv file. In order to do that, a template linked to the data structure type to which you want to add that information must have been set up previously in Data Structures Type menu option.

The file to be uploaded needs to be a text file with fields separated by semicolon. The first column needs to be "external_id" and the rest of columns correspond to the fields defined in the corresponding template in no particular order.

You can get the external ids corresponding to your structures by downloading the list of structures from the data catalog screen. You could download the existing notes using the option "Download editable structures metadata", modify this file by adding additional notes and upload it again.

File example:

external_id;extended_description;gdpr
glue:/all_glue/default/table_without_columns;This an example table without any columns;Yes

When creating the file to be uploaded you need to keep attention to the following points:

  • Columns headers must match a field in the template of the structure to be updated. In case that they do not match, those columns will be ignored.

  • If you include a field which has a list of predefined values in the template, when filling out those fields in the file, their values must be one of the list. Otherwise the process will fail.

You will not be allowed to upload note information for notes which are currently "pending approval"

When loading the information you will get different options based on your permissions level:

  • If you can create notes you will get the option "Upload notes". This will create a new version of the notes in draft status.

  • If you have permission to publish notes from draft status you will get the option to "Upload and publish". This will create a new version of the note and publish it directly without having to go through the approval process.

Once you click on the upload button, the process will start and you can see the progress in the menu option Data Catalog > My loads.

Bulk update of notes

Bulk update of structure notes is possible but only for admin roles. See Bulk Updates in Administration section for more information.

Profiling

The profiling of a structure allows us to better understand our data, in such a way that we can better understand the structure that we have and the uses that can be given to it to obtain information within the organization. It will also help us to identify quality problems that we are having with our data in the first instance. In order to have the profiling information loaded, it will be necessary that the corresponding integration tasks have been carried out in order to obtain said profiling data from those systems for which we have such a need.

Two types of profiling display are currently available:

Outlined Summary of Fields of a Structure

The list of fields in the structure that have profiling information and the following information will be displayed, if available:

  • % unique values

  • % null values

  • Lowest data

  • Highest data

  • Mode (most common value)

  • If there is a distribution of values, an icon will be displayed that will link to the outlined field.

Distribution of values ​​for a field

In case of having information on the distribution of values, the profiling tab will be displayed, containing a graph with the different values ​​that we have for said field.

Value patterns

If we have loaded pattern information using the profiling connector, we can also consult this distribution helping us to better understand our data.

This feature is not available for all data sources

Lineage and Impact

If traceability information is available for the selected structure, lineage and impact tabs will be displayed where both diagrams can be viewed quickly. It will also allow us to navigate through the graph as explained in the Data Lineage section

Lineage: It will inform the origin of the data. What information are we using to generate the data?

Impact: It will be shown in which other data structures the data we are consulting is being used. Where would it impact if we make a change in this data?

Grants

Using integration pieces we will be able to document in Truedat the users that have access a certain structures. This will allow a user to see to which structures he/she has been granted access.

Users with permission to view grants of a structure will get a tab in which all the grants that apply to this structure will be displayed. This will include all grants given directly to this structure or inherited from any ascendant.

Quality

A list is shown with the quality implementations where the structure we are consulting is being referenced. Additionally, information on the rule to which this implementation belongs is displayed and navigation is allowed both for the quality rule and for the implementation to which the list refers. The last quality result will also be displayed if we have it.

Linkage

This tab will allow us to consult and manage the links that are available for a structure / field.

Versions

In case of having several versions loaded in the application, it will allow us to navigate between them. This navigation between versions will allow us to consult the fields that made up the structure in previous versions.

When consulting previous versions we will see marked those fields of our structure that have undergone modifications with respect to the current version. They will be marked in the event that these fields have been deleted or modified.

Audit

The data structure audit will show us all the manual actions carried out with said structures, being able to view the changes that have been made in the additional information of said structure as well as the people who have made said changes.

Structure confidentiality

For those users with the appropriate permissions, there is the option of marking a structure as confidential. A structure that is marked confidential will only be visible to people who have permission to manage confidential structures and view confidential structures about the domain in which the structure is located. If you do not want to use this functionality, you simply should not activate the permission in any of the defined roles.

Structure tagging

If your administrator has created structure tags and permissions have been assigned to you, you will be able to link a tag to structure and can include any comments on why this tag has been assigned to the structure.

A tag can be assigned to all the structures underneath by ticking the box "Inherited by children".

Once a tag has been linked to a structure any user with permission to see the given structure will also be able to see the tag and the description and comments (hovering over the tag).

Sharing structures

You will be able to share your structures with others using the corresponding action on the right upper corner of the structure information. This will produce a notification and an email for the users specified in the form.

You will need to include SMTP server configuration in your installation in order to receive notifications via email

Catalog information export

Carrying out any search or filtering within the data catalog, we will have the option of exporting the metadata information contained in the catalog in csv format, so that these data can be processed in a third Excel type application.