How do we use HiveKa?
- Make sure you have a Kafka cluster and a Kafka topic
- Write Avro byte-arrays to the Kafka topic.
This can be done from your own app, or you can try our demo produce:
java -classpath hive-kafka-1.0-SNAPSHOT.jar org.apache.hadoop.hive.kafka.demoproducer.FakeTweetProducer tweet 10 hivekafka-1:9092
Note that you'll also need Avro and Kafka libs on the classpath.
The parameters are: topic-name, number of messages to producer and the Kafka broker URI. - Put tweet.avsc from our github in HDFS
- Create Hive table:
create external table tweets (username string, text string, timestamp bigint)
stored by 'org.apache.hadoop.hive.kafka.KafkaStorageHandler'
tblproperties('kafka.topic'='test',
'kafka.service.uri'='hivekafka-1.ent.cloudera.com:9092',
'kafka.whitelist.topics'='tweet',
'kafka.avro.schema.file'='/tmp/tweet.avsc'); - And query!
select username,count(*) from tweets group by username;
Find the code on github: https://github.com/HiveKa/HiveKa
Interactive Testing
We also packaged a utility that converts data from standard input to Avro messages, so they can be queried in Hive.
To use this:
To use this:
- java -classpath hive-kafka-1.0-SNAPSHOT.jar org.apache.hadoop.hive.kafka.demoproducer.AvroConsoleProducer console hivekafka-1:9092
Note that you'll also need Avro and Kafka libs on the classpath.
The parameters are: topic-name, number of messages to producer and the Kafka broker URI. - Put Console.avsc from our repository on HDFS
- Create table:
create external table console (message string) stored by 'org.apache.hadoop.hive.kafka.KafkaStorageHandler' tblproperties('kafka.service.uri'='hivekafka-1.ent.cloudera.com:9092', 'kafka.whitelist.topics'='console','kafka.avro.schema.file'='/tmp/console.avsc'); - And query!
select * from console;