Google Datastore Namespaces

The Google Datastore is one of the Google publicly available scallable No-SQL solution. The other is the BigTable - which can be installed on a swarm of Compute Engine virtual machines. The Datastore is build on top of the BigTable. So with the BigTable you could get the raw performance, but you miss some goodies, like transactions and SQL like query language available with the Datastore.

The Datastore terminology gets me confused. They introduce "Dataset" and "Namespace" - which is not immediately clear, and had me some issues till I figure it out. That is why started this post. From the welcome page of the Datastore project the terminology is defined as
ConceptDatastoreRelational database
Category of objectKindTable
One objectEntityRow
Individual data for an objectPropertyField
Unique ID for an objectKeyPrimary key
But the Dataset is not defined what it means in the documentation. Through trial and error I found out that the Dataset is the same as your Project ID, which can e found in your Home page of the Google Developer Console:
In this case the projet ID is: bionic-mercury-89314
So in theory by using different Datasets you can access the Datastore data for different project - that is if you have the appropriate permissions. I am at loss why the "Dataset" concept is introduced, as it is exactly the same as the Project and adds confusion. All Datastore API endpoints require you to specify the Dataset, which you must remember is your Project ID.

The "Namespace" is actually a cool way to logically partition your unstructured data. Every Dataset (i.e. project) has a a default empty Namespace. So the reletionship is as follows:
Every Project has one or more namespaced Datsets.
Each Datasets has one or more Kinds (i.e. similar to Tables in RDBMS).
Further down each Kind is a collection of Entities. Each Entity could have the same or different set of properties.

Under the covers the Dataset, the Namesace and the Key is concatenated in the BigTable keys! All Google Cloud applications share one big BigTable for the Entitites and sevaral more for the indexes. That is.. i.e. when you create you new Google Project - you do not get a separate BigTable installed for your project - but rather you can access only the data for your Dataset, i.e. only for your Project.

All the keys for aaaaal your entities have as prefix:
  1. The Dataset or as already set 100 times the "project id".

    In this way you have your own little portion of the huuuge BigTable where you store your data close to each other.  You, and only you have access to this portion. So even though all google customers store their data in one BigTable - there is no messing around with other people's data.

    Since BigTable stores the data based on the keys - and since all your entities start with the project id prefix - this means that your data is "dense" - closely together one after another in consecutive BigTable Tablet serves.
  2. Optional namespace.

    Usefull for having data for different sources, or for development/testing/staging/production environements, or wherever you wish to namespace your data.
  3. The optional parents and the key itself.

var gcloud = require('gcloud');
// Select the Dataset based on the projectID
// the keyFilename is for authorisatoin
var dataset = gcloud.datastore.dataset({ projectId: 'Your-Project', keyFilename: '/path/to/keyfile.json' }); var blogPostData = { title: 'How to make the perfect homemade pasta', author: 'Andrew Chilton', isDraft: true }; var blogPostKey = dataset.key('BlogPost'); dataset.save({ key: blogPostKey, data: blogPostData }, function(err) { // ... handle the error });

The example creates an Entitiy about a blog in the "BlogPost" Kind and the default  (i.e. the empty) Namespace. If we want to insert the same Entity into "Staging" Namespace, then:
var gcloud = 
var dataset = gcloud.datastore.dataset({
  projectId: 'Your-Project',
  keyFilename: '/path/to/keyfile.json',
  namespace: 'Staging'
});


Ok, good that we cleared the Dataset/Namespace concept. Any questions ? Let me know.

Comments

Popular posts from this blog

Data types: Backend DB architecture

Node.js: Optimisations on parsing large JSON file

Back to teaching