In this blog, we will see how can we make Apache Spark communicate with Apache Cassandra. Tools Used: Apache Spark 2.1.1 Apache Cassandra 2.2.10 Part 1: Adding some data in Cassandra Connect to cqlsh and add some sample data as shown below: cqlsh> CREATE KEYSPACE bigdataclassmumbai WITH replication = {‘class’: ‘SimpleStrategy’, ‘replication_factor’: 1}; cqlsh> […]
Category: Apache Spark
Integrating Apache Spark with MongoDB
In this tutorial, we will learn how to integrate Apache Spark with MongoDB database. We will be using spark-shell for interacting with MongoDB database and will perform read from MongoDB and write to MongoDB using spark-shell. Tools Used: Apache Spark 2.1.0 MongoDB 3.4.2 Pre-requisites: I am assuming you have downloaded and extracted Apache Spark and […]
Integrating Apache Spark with Apache Hive
Apache Hive is a widely used and demanded Hadoop Ecosystem component for performing data analysis. Its simplicity lies in using existing SQL-type syntax for performing data crunching, cleansing, and analysis. Tools used : Apache Hadoop 2.7.3 Apache Spark 1.6.3 Apache Hive 1.2.2 Steps: Step 1: Ensure Hadoop and Spark Services are live and active Step2: […]
Scala Integration in Eclipse
The first thing that I always hear from my participants (in most of my Apache Spark training sessions) is how to practice Scala coding for Apache Spark in a 4GB RAM laptop. My usual answer is using Eclipse by adding a Scala Plugin. In this section of the Blog, we will learn how to setup the […]