More than 3000 questions in repository. There are more than 900 unanswered questions. Click here and help us by providing the answer. Have a video suggestion. Click Correct / Improve and please let us know.
Ans. Kafka is a distributed, partitioned, replicated commit log service. It provides the functionality of a messaging system through messages being written to logs.
Help us improve. Please let us know the company, where you were asked this question :
Q6. Have you used Kafka in your project ? If Yes, for what ?
Ans. We were using Kafka as a replacement for JMS Message Queue for better throughput. We were just using a simple Java multi threaded client as order of message consumption didn't matter to us.
Help us improve. Please let us know the company, where you were asked this question :
a. A Sequence File contains a binary encoding of an arbitrary number of homogeneous writable objects. b. A Sequence File contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be of same type. c. A Sequence File contains a binary encoding of an arbitrary number of heterogeneous writeable objects. d. A Sequence File contains a binary encoding of an arbitrary number of Writable Comparable objects, in sorted order.
Ans. A Sequence File contains a binary encoding of an arbitrary number key-value pairs. Each key must be the same type. Each value must be of same type.
Help us improve. Please let us know the company, where you were asked this question :
a. One Key and One Value b. Multiple Keys and Multiple associated Values c. Multiple Keys and One associated values with each d. One key and associated values.
Ans. One key and associated values.
Help us improve. Please let us know the company, where you were asked this question :
LikeDiscussCorrect / Improve  hadoop   bigdata   big data   map-reduce   map reduce   reduce function
Q10. Which of the following is the implementation language for Map Reduce Framework ?
a. Big Data b. Hadoop c. Java d. C++
Ans. Java
Help us improve. Please let us know the company, where you were asked this question :
LikeDiscussCorrect / Improve  hadoop   bigdata   big data   map-reduce   map reduce framework
Q11. Can we have multiple threads consuming message stream from a single partition ?
Ans. Yes, by having multiple Consumer Groups.
Help us improve. Please let us know the company, where you were asked this question :
Q13. If we have more threads than partitions in a kafka consumer, How can we model it efficiently ?
Ans. We will have to use multiple consumer groups in that case as threads will remain idle if we use single consumer group. A more sophisticated algorithm could be required with multiple groups if we have to ensure the order of consumption.
Help us improve. Please let us know the company, where you were asked this question :
Ans. Hadoop is an open source framework , written in java by apche software foundation. This framework is used to write applications to process vast amount of data. Processing happens in parallel on large clusters which could have 1000 of computers. It processes data in a very reliable and fault tolerant manner.
Help us improve. Please let us know the company, where you were asked this question :
Ans. One common usage is predictive analytic using huge current or past data. For example - Using recent medical data ( diagnosis and procedure ), one can identify the pattern of diseases or the procedures that has to eventually applied upon certain diagnosis. This analysis might help in predicting the diseases that might occur to a patient.
the other usage could be to identify the future spending patterns of the population by analyzing the past and current habits.
Help us improve. Please let us know the company, where you were asked this question :
Ans. Combiners are used to increase the efficiency of a Map Reduce program. They are used to aggregate intermediate map output locally on individual mapper outputs. Combiners can help you reduce the amount of data that needs to be transferred across to the reducers.
Help us improve. Please let us know the company, where you were asked this question :
Ans. Load the file in chunks and then process. If we need to do analytic, we can process analytic information for those chunks and then reprocess the processed information from each chunk.
For example - we need to average all marks in the file. We can divide the file and load into 5 chunks and calculate average for each chunk. Then we can collect averages for all 5 chunks and then calculate the final average.
Help us improve. Please let us know the company, where you were asked this question :
Ans. Yes, it's a project by confluent that provides in built mechanism for streaming records from bigdata data source to apache kafka message queues and vice versa. It provides a variety of source and sink connectors to achieve this.
Help us improve. Please let us know the company, where you were asked this question :
Ans. Source connectors are the connectors that are used to get information from the source whereas sink connectors are used to deploy information to the destination.
Help us improve. Please let us know the company, where you were asked this question :