EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Hive Commands

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Hive Tutorial » Hive Commands

Hive Commands

Introduction to Hive Commands

Hive command is a data warehouse infrastructure tool that sits on top Hadoop to summarize Big data. It processes structured data. It makes data querying and analyzing easier. Hive command is also called as “schema on reading;” It doesn’t verify data when it is loaded, verification happens only when a query is issued. This property of Hive makes it fast for initial loading. It’s like copying or simply moving a file without putting any constraints or checks. It was first developed by Facebook. Apache Software Foundation took it up later and developed it further.

All in One Data Science Bundle (360+ Courses, 50+ projects)
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (77,953 ratings)
View Course

Components of Hive Commands

The components 0f hive command is as shown below:

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Components of Hive

Sources images: https://www.developer.com/

Features of Hive Commands

Here are a few of the features discussed below:

  1. Hive stores are raw and processed dataset in Hadoop.
  2. It is designed for Online Transaction Processing (OLTP). OLTP is the systems that facilitate high volume data in very less time with no reliance on the single server.
  3. It is fast, scalable and reliable.
  4. The SQL type querying language provided here is called HiveQL or HQL. This makes ETL tasks and other analysis easier.

Hive Properties

Sources images:- Google

There are few limitations of Hive command as well, which are listed below:

  1. Hive doesn’t support subqueries.
  2. Hive surely supports over-writing, but unfortunately, it doesn’t support deletion and updates.
  3. Hive is not designed for OLTP, but it is used for it.

To enter the Hive’s interactive shell:

$HIVE_HOME/bin/hive

Basic Hive Commands

The basic commands are as explained below.

1. Create: This will create the new database in the Hive.

Basic Hive Commands 1

2. Drop: The drop will remove a table from Hive

3. Alter: Alter command will help you rename the table or table columns.

For example:

ALTER TABLE employee RENAME TO employee1;

4. Show: show command will show all the databases residing in the Hive.

Basic Hive Commands 2

5. Describe: Describe command will help you with the information about the schema of the table.

Intermediate Hive Commands

Hive divides a table into variously related partitions based on columns. Using these partitions, it gets easier to query data. These partitions further get divided into buckets, to run query efficiently on to data.

In other words, buckets distribute data into the set of clusters by calculating the hash code of key mentioned in the query.

1. Adding Partition

Adding partition can be accomplished by altering the table. Say you have table “EMP”, with fields such as Id, Name, Salary, Dept, Designation, and yoj.

ALTER TABLE employee
ADD PARTITION (year=’2012’)
location '/2012/part2012';

2. Renaming Partition

ALTER TABLE employee PARTITION (year=’1203’)
RENAME TO PARTITION (Yoj=’1203’);

3. Drop Partition

ALTER TABLE employee DROP [IF EXISTS] PARTITION (year=’1203’);

4. Relational Operators

Relational operators consist of a certain set of operators, that helps in fetching relevant information.

For example: Say your “EMP” table look like this:

Relational Operators

Let’s execute a Hive query which will fetch us the employee whose salary is greater than 30000.

SELECT * FROM EMP WHERE Salary>=40000;

5. Arithmetic Operators

These are operators which help in help in executing arithmetic operations on the operands, and in turn, always return number types.

For example: To add two number such as 22 & 33

SELECT 22+33 ADD FROM temp;

6. Logical Operator

These operators are to execute logical operations, which in return, always return True/False.

SELECT * FROM EMP WHERE Salary>40000 && Dept=TP;

Advanced Hive Commands

The advanced commands are as explained below.

1. View

View concept in Hive is similar like in SQL. The view can be created at the time of executing a SELECT statement.

Example:

CREATE VIEW EMP_30000 AS
SELECT * FROM EMP
WHERE salary>30000;

Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)
  20 Online Courses |  14 Hands-on Projects|  135+ Hours|  Verifiable Certificate of Completion
4.5
Course Price

View Course

Hive Training (2 Courses, 5+ Projects)

4.9

All in One Data Science Bundle (360+ Courses, 50+ projects)

4.8


2. Loading Data into Table

Load data local inpath 'https://cdn.educba.com/home/hduser/Desktop/AllStates.csv' into table States;

Here “States” is the already created table in Hive.

https://www.tutorialspoint.com/hive/

Hive has some built-in functions which help you in fetching your result in a better way.

Like round, floor, BIGINT etc.

3. Join

Join clause can help in joining two tables based on the same column name.

Example:

SELECT c.ID, c.NAME, c.AGE, o.AMOUNT
FROM CUSTOMERS c JOIN ORDERS o
ON (c.ID = o.CUSTOMER_ID);

All kinds of joins are supported by Hive: Left outer join, right outer join, full outer join.

Tips and Tricks

Hive makes data processing that easy, straightforward and extensible, that user pay less attention towards optimizing the Hive queries. But paying attention towards a few things while writing Hive query, will surely bring great success in managing the workload and saving money. Below are a few tips regarding that:

1. Partitions & Buckets

It is a big data tool, which can query on large datasets. However, writing the query without understanding the domain can bring great partitions in Hive.

If the user is aware of the dataset, then relevant and highly used columns could be grouped into the same partition. This will help in running the query faster and inefficient way.

Ultimately the no. of mapper and I/O operations will also be reduced.

Partitioning

Fig: Partitioning

Sources images: Google image

Bucketing

Fig: Bucketing

Sources images:- Google image

2. Parallel Execution

It runs the query in multiple stages. In some cases these stages may depend on other stages, a hence can’t get started, once the previous stage is completed. However, independent tasks can run parallelly to save overall run time. To enable the parallel run in Hive:

set hive.exec.parallel=true;

Hence, this will enhance the cluster utilization.

3. Block Sampling

Sampling data from a table will allow exploration of queries on data.

Despite bucking, we rather want to sample dataset more randomly. Block sampling comes with various powerful syntax, which helps in sampling the data in a various way.

Sampling can be used for finding approx. info from dataset like the average distance between origin and destination.

Querying 1% of big data will give near to the perfect answer. Exploration gets way easier & effective.

Conclusion

Hive is a higher-level abstraction on top of HDFS, which provides flexible query language. It helps in querying and processing data in an easier way.

It can be clubbed with other Big data elements, to harness its functionality in a full-fledged way.

Recommended Articles

This has been a guide to Hive Commands. Here we have discussed components, features, basic, advanced as well as intermediate of Hive Commands along with tips and tricks. You may also look at the following article to learn more –

  1. Hive Interview Questions
  2. Hive VS Hue – Top 6 Useful Comparisons
  3. Tableau Commands
  4. Adobe Photoshop Commands
  5. Using ORDER BY Function in Hive
  6. Download and Install Hive Step by Step
  7. Bucketing in Hive | Examples | Features

Hadoop Training Program (20 Courses, 14+ Projects)

20 Online Courses

14 Hands-on Projects

135+ Hours

Verifiable Certificate of Completion

Lifetime Access

4 Quizzes with Solutions

Learn More


0 Shares
Share
Tweet
Share
Primary Sidebar
Hive Tutorial
  • Basics
    • Hive JDBC Driver
    • What is a Hive
    • Hive Architecture
    • Hive Installation
    • How To Install Hive
    • Hive Versions
    • Hive Commands
    • Hive Data Types
    • Hive Built-in Functions
    • Hive Function
    • Hive String Functions
    • Date Functions in Hive
    • Hive Table
    • Hive Drop Table
    • Hive Show Tables
    • Hive Group By
    • Hive Order By
    • Hive Cluster By
    • Joins in Hive
    • Hive Inner Join
    • Map Join in Hive
    • Hive nvl
    • Hive UDF
    • Dynamic Partitioning in Hive
    • HiveQL
    • HiveQL Queries
    • HiveQL Group By
    • Partitioning in Hive
    • Bucketing in Hive
    • Views in Hive
    • Indexes in Hive
    • External Table in Hive
    • Hive TimeStamp
    • Hive Database
    • Hive Interview Questions
    • Hive insert into

Related Courses

Hive Certification Course

Hadoop Course Training

All in One Data Science Course

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

Special Offer - Hadoop Training Program (20 Courses, 14+ Projects) Learn More