Blog

Blog

What is TCSV?

November 07, 2013 posted by Phil

Tags: Techie, Company, Beliefs, Big Data, TCSV

TCSV

What is TCSV? – The Science

In short, TCSV is our Content Agnostic Master Ontology (CAMO).

Content is the part of data explicitly referring to or representing an entity or properties of an entity. An entity is, very simply, something that exists by itself in either the concrete or abstract (Wikipedia). The easiest entities to conceptualise are people, products and companies. However, colours, feelings also count.

Agnostic as related to data, means not tied to a specific set of truths about specific entities. In regards to a CAMO data model, content agnosticism refers to how data is structured and stored. A content agnostic data structure uses no element of content as an explicit feature of itself. For example, a table would have no element of content as a column heading.

Master could equally be described as ‘single’ from the article below about types of ontology. However, we use the term master as TCSV represents an ontology for all data where values change over time. When used to hold data, a master ontology creates a conceptual master set of all data.

Ontology, in information science, is described simply on Wikipedia as “formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources” (Wikipedia).

What is TCSV? - The TCSV Ontology

Datashaka_tshirt

The letters of TCSV stand for Time, Context, Signal and Value. It is used to represent values that change over time. These values describe entities and their features. Time represents the timestamp of the entities activity. Context is a variably sized set of textural Key Value pairs (Context, and Context Type). Signal and Value are a key value pair of a textural name and a numeric value. Context, Signal and Value together represent a data space of entities and describing factors tracked over Time. Context and Signal are comparable to traditional dimensions and metrics in relational start schemas.

Because TCSV is content agnostic, the entities represented are not specified upfront. Entity features described in the heterogeneous source schemas are brought together into one unified data set that is queried to populate conceptual entities that support the questions. This means that the schema of source data and the schema of questions are completely decoupled. Source data schemas and formats can change and the same questions can still be asked. Similarly, questions can change and this does not impact the underlying data structure.

Data sources can supply new data into the TCSV ontological space and also supply enrichments that described new relationships within the data enhancing the connectedness and making questions about certain entities easier. An enrichment, for example, can be used to unify heterogeneous naming conventions between sources. For example, twitter data may have a handle @NgUK which may represent the Brand “Nutrigum” in the UK. Enrichments can add the new Context of Brand and Country.

Why TCSV? – The business use case

Traditional data integration is hard, complex and slow. Even the most straight forward ETL workflow has at least 6 design points. These are all caused by the required ‘schema first’ approach of highly structured relation databases. This highly structured approach leads to complex change management, which slows down access to new data and reduces overall business agility.
The root cause of this reduced agility is the inability to handle changes in content and question requirements fast.

Taking a content agnostic approach decouples data acquisition, validation and storage from each other and from the questions asked of data.
For a business to take a content agnostic master ontology approach to data through TCSV increases agility, future proofing and decreases time-to-value and barriers to entry of trying new data sources – an approach to data, no company today should do without.

We are proud of what we have achieved with creating TCSV and are keen to develop it further. We truly believe our ontology is the best way around the heterogeneity problem. If you have any questions or comments, please do not hesitate to get in touch with me directly phil@datashaka.com

comments powered by Disqus

Free White Paper

DataShaka_White_Paper_-_Solving_Variety_Challenge

Download It Now >

Help, feedback & support

We'd love to hear from you. Just click to step through to the wonderful world of UserVoice for all things Help, support and feedback related.

Get in touch >