Integrating Apache Spark with MongoDB

In this tutorial, we will learn how to integrate Apache Spark with MongoDB database. We will be using spark-shell for interacting with MongoDB database and will perform read from MongoDB and write to MongoDB using spark-shell. Tools Used: Apache Spark 2.1.0 MongoDB 3.4.2 Pre-requisites: I am assuming you have downloaded and extracted Apache Spark and […]

Read More

Understanding Sqoop Eval Command

In this blog, we will understand Sqoop’s eval command. Sqoop’s eval command parameter allows a user to perform DDL and DML queries against the DB and previews the results in the console. We will see and understand two evaluations, Select Query Eval Insert Query Eval   Assumptions: I am assuming that you are using MySQL […]

Read More

Learning Hive-HBase Integration

Most people get extremely frustrated when it comes to working with the traditional native HBase commands for data interaction with HBase. Don’t worry – don’t stress! There exist two more techniques to perform data interaction. These are: Using Hive-HBase integration Using Apache Phoenix In this blog, we will see how to perform Hive-HBase integration. Step1: […]

Read More

HDFS copyFromLocal v/s put Command

“What’s the difference between copyFromLocal and Put command in HDFS CLI?” A very common interview question, isn’t it? Let’s try to figure out the notable difference between Put and copyFromLocal. Both commands have only one objective i.e. to load data in HDFS. Let’s demonstrate the functionality now. Variation 1: Loading data from local file system and storing the same […]

Read More

HDFS Storage Balancer Part 1

In this tutorial, we shall learn how to use HDFS Storage Balancer effectively. We will also effectively understand all possible permutations and combinations that can be applied in the Hadoop-Balancer command. HDFS allows us to store data using ‘Write Once’ paradigm where only appends are allowed. In production, there exists a scenario, where there might […]

Read More