Due to the number of possibilities, ambiguity can often be discerned in the data warehousing playing fields. More choices do not immediately lead to better and more efficient variations. Sometimes things go wrong with a basic term. In IT, primary storage, like first line or hot data, means that the data is recent or heavily used while secondary data, you guessed it, is data that is older, less accessed or not structured in form, also called primary data.
Within the world of research, there is a different interpretation of the same “raw data”; Primary data is data, or raw data, best described as raw data that a researcher collects for research by means of, for example, an interview, survey, or observation.
You will understand that this difference in terminology does not contribute to a good understanding of business (research) and IT.
Why do we actually make backups? A backup or backup is a copy of the data on a data carrier or application to restore if it is corrupted.
By making backups, we want to limit data loss to an acceptable and predetermined level. This level is also called RPO (recovery point target) and indicates the maximum allowable amount of data loss, indicated in a period of time. So if you can afford to lose up to 4 hours of work, you should back up your data securely every 4 hours.
In short, the RPO specifies which storage structure is backed up. To make a good and secure backup, it is recommended to store it in a number of different media and locations. This media can contain flash, disk, tape, and/or cloud elements. In addition, we also talk about RTO. With an RTO you express the maximum amount of time it can take before you want to be operational after a problem or disaster.
Then store the secondary data in short.
The name says it all, this is about second line data, data that no longer has to be kept in a raw environment due to age considerations, because it is not referenced again and has to be kept, or that consists of non-essential items such as raw research data, algorithms It makes reinventing the wheel unnecessary.
The secondary data environments in particular are the fastest growing in relative terms. This is happening due to the accelerating digitization of our society and the vast majority of this data is unstructured in nature and partly generated by machines (M2M, phones, telecoms). Our health care, which has to deal with multiple data growth factors, because health care becomes more efficient but grows more complex, is seeing its share of unstructured data grow, data consisting of scans, medical images and large amounts of often scripted treatment plans It will also be shared among practitioners and supplemented. The same growth can also be seen in the film/broadcast industry. The data formats in the form of movie images have been improved up to a factor of 8 (8K), which automatically means that the amount of data has also increased by eight times. In short, the wild growth of unstructured data is being discussed.
It is therefore crucial to separate and store primary and secondary data. With the birth of various cloud technologies, data management systems, object storage, efficiency, security, and controllability have increased dramatically.
The use of metadata in particular offers great opportunities, findability and scientific compatibility are perfect compared to traditional storage systems.
NoSQL now offers a wide range of database management systems that differ significantly from the classic relational database management system. Data systems do not always consist of static database schemas, and most scale horizontally rather than vertically as is typical of SQL databases.
If you want to read more about unstructured data, I would like to refer to this article. https://dutchitchannel.nl/707415/alle-data-in-een-platform.html
By: Harold Koenders, data warehousing expert
“Thinker. Coffeeaholic. Award-winning gamer. Web trailblazer. Pop culture scholar. Beer guru. Food specialist.”