Posts

Showing posts from 2015

Document Modeling Basics

An often asked question of developers those are new to NoSQL is how to start with the document modeling. This article does not aim to give you answers to all document modeling related questions. It is more a starting point. Flexible Schema I am personally not a big fan of the word 'schema free'. My personal opinion is that if we talk about structured data, then we also talk about how to structure the data. (BTW: Couchbase also allows to store unstructured data as binaries. Also semi-structured is supported by e.g. embedding base64 encoded strings into JSON documents.) Couchbase Server does not enforce (on the database side) to follow a specific schema. This brings you more flexibility. Some documents might have a specific property, others might not have it. You don't have to specify upfront that a property might be there and then set it to a NULL value if it is not. So what you have is a flexible schema (or better data model), whereby the application is implicitly prov

Using a Key-Value Store for Full Text Indexing and Search

Image
Couchbase Server is a multi-purpose Database System. One of the purposes is to use it as a simple key-value store. A key-value store allows you to store/retrieve any value by its key. Such a value can be a JSON document (Couchbase allows you to index and query based on such JSON documents and so another purpose is the one as document database.), a small binary or a full text index entry. This article explains why such a key-value store can be also used for full text indexing purposes. Let's explain how full text indexing works in general. A full text index is a so called inverted index. The table below shows how the following sentences would be indexed: 'Tim is sitting next to Bob' and 'Jim is sitting next to Bob'. The word 'Tim' is only existing in the first sentence and there is exactly one occurrence of it. Term | Count | Reference ------------------------ Tim | 1 | #1 is | 2 | #1, #2 sitting | 2 | #1, #2 next | 2

Understanding Couchbase's Document Expiration

I just had to investigate Couchbase's T(ime)T(o)L(ive) feature a bit because I was asked how to react externally on document expirations via DCP . The TTL feature allows you to create documents by specifying when the document should be automatically deleted. Here two example use cases: Caching: The items in a cache should be only valid for a specific period of time. The TTL feature allows you to invalidate items in the cache automatically. Let's assume that you are caching some product details (as a retailer). Then the prices of some products might change daily (or even more often). It now would make sense to make sure that the product price gets automatically updated once a day. So if the item is expired in cache then it should be fetched again from the original source system. The newly fetched item then has the updated price information. Session management: Couchbase is often used as a session cache/store. A session is just realized as a key value pair whereby the k

Document Versioning in Couchbase

Image
Couchbase Server does out of the box not support document revisions but it would be quite simple to implement it on the application side. This article describes ways how to do this. The following topics are covered: Handling concurrent access Relevant attributes One document per version Embedded revision tree Combined approaches  Handling concurrent access In a context of versioning multiple users/threads are creating new versions (and this maybe nearly the same time). So I think it makes sense to spot a light on concurrent access before we talk about versioning approaches. You will most probably need to combine concurrency handling with versioning.  Couchbase supports 2 ways of handling concurrent access to the same document.  C(ompare) A(nd) S(wap): This is the optimistic approach. Each document has a built-in property which is the CAS-value. The CAS-value changes as soon as somebody updates the document. So the idea is to implement something like the following on th

How many Buckets?

Image
In Couchbase the equivalent of a database is called a bucket. A bucket is basically a data container which is split into 1024 partitions, the so called vBuckets. Each partition is stored as a Couchstore file on disk and the partitioning is also reflected in memory. Each configured replica leads to additional 1024 replica vBuckets. So if you have a bucket with 1 replica configured then this bucket has 1024 active vBuckets and 1024 replica vBuckets. The vBuckets are spread across the cluster nodes. So in a 2 node cluster each node would have 512 active vBuckets. In a n-node cluster each node would by default have nearly 1024/n vBuckets. An often asked question is 'How many Buckets should I create?'. There are multiple aspects to take into account here. Max. possible number of Buckets Logical data separation Load separation Physical data separation Multi-tenancy Max. possible number of Buckets Before we start talking about why it might make sense to use mult

Using the Rexter 2.7 Graph Server with Couchbase Server

Image
One of my side projects at Couchbase is a Graph-API on top of it. This article explains how you can use the Rexter Graph Server with this implementation of the Blueprints 2.x API. There is a forked version of Rexter available here:  https://github.com/dmaier-couchbase/rexster  . However, the modifications are quite simple. I disabled the enforcer plug-in in the main pom.xml file. The 'rexter-server/pom.xml' file contains additional dependencies. <dependency> <groupId>com.couchbase.graph</groupId> <artifactId>couchbase-blueprints</artifactId> <version>1.0-SNAPSHOT</version> </dependency> <dependency> <groupId>com.couchbase.client</groupId> <artifactId>java-client</artifactId> <version>2.1.3</version> </dependency> Also the file 'rexter-server/config/rexter.xml' was edited in order to use the Couchbase instance. <graph> <graph-enabled>