In this post we will not talk about Cassandra – Data Model, There are many blogs on them and one of the best describe is by Animesh Kumar, oh yup Animesh sir is my collage(ISM) senior.
poor approach:
Serialize the Object into JSON.
Now store this JSON as a string into Cassandra.
When need to get the object, read JSON string, deserialize the JSON.
The above approach is fine and works well, but It have its own disadvantage. Say we haveObject(as JSON):
{
name: "nijju"
country: "IN"
email: "niraj.nijju@gmail.com"
}
and we store it into a column into cassandra( as string).
When we need to update a field say country then we need to read the whole JSON string, then we need to deserialize it, and then to change the object attribute and again serialize the resulting object into JSON and storing this JSON at that key.
As in cassandra write is much faster than read operation, so making a read before write will not be a good approach.
Even if your application is mostly write-once data then for reading a single field will make to read all.
In conclusion, we can better performance by storing all the fields into a separate column. For making an update on a field we only need to re-write the corresponding column.
It totally depends on our read behavior, as rows will be store on random disk, so reading from various rows at same time will be slow.
and If rows will be too long then these columns will be stored together on same node so we may have scalability challenges.
Let me know your thoughts