More than 3000 questions in repository. There are more than 900 unanswered questions. Click here and help us by providing the answer. Have a video suggestion. Click Correct / Improve and please let us know.
Q5. If we have more threads than partitions in a kafka consumer, How can we model it efficiently ?
Ans. We will have to use multiple consumer groups in that case as threads will remain idle if we use single consumer group. A more sophisticated algorithm could be required with multiple groups if we have to ensure the order of consumption.
Help us improve. Please let us know the company, where you were asked this question :
Ans. Apache Cassandra is a free and open source distributed NoSQL database management system designed to handle large amounts of data providing high availability with no single point of failure.
Help us improve. Please let us know the company, where you were asked this question :
Q7. What is the difference between Java SE Map and Apache Commons MultiMap ? How can we implement functionality similar to multimap using Java SE map ?
It supports only MapReduce (MR) processing model.
It has limited scaling of nodes. Limited to 4000 nodes per cluster.
It has single Namenode to manage the entire namespace.
It has Single-Point-of-Failure (SPOF)
Works on concepts of slots – slots can run either a Map task or a Reduce task
MR has to do both processing and cluster resource management.
Hadoop 2.x
It supports MR as well as other distributed computing models like Spark, Hama, etc
It has better scalability. Scalable up to 10000 nodes per cluster.
Works on concepts of containers. Using containers can run generic tasks.
It has Multiple Namenode servers manage multiple namespace.
YARN does cluster resource management and processing is done using different processing models.
Help us improve. Please let us know the company, where you were asked this question :