There are most three types of data: 1, Binary(Image/Video/Audio/...), 2 Text(Article...), 3 Structured or semi-structured records(address/...). Generally, Type 1 and 2 are stored in (distributed) file system and query/process by text/binary search and batch process. Type 3 is stored in (distributed) RDBMS or NoSQL data store and query/process by specific query language/expression and online APIs. It's much easier to process structured records than binary/text data. However, how to store the structured records efficiently is a common problem to resolve. Here are couple of general principles we should adopt:

  • If possible, always define your data with good model/structure, no matter it's for SQL, or NoSQL.
  • Theoretically, RDBMS is much more efficient than NoSQL data store to process/analyze the structured data with SQL.
  • Don't distribute your data until it's more than billion/TB or can't be handled by single machine. There is few reason to distribute tens of GB/million data.

  • Refer to the comments about SQL/NoSQL from Adam D'Angelo at Quora.