EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Hadoop Admin Interview Questions

By Priya PedamkarPriya Pedamkar

Home » Data Science » Data Science Tutorials » Hadoop Tutorial » Hadoop Admin Interview Questions

Hadoop Admin Interview Questions

Introduction To Hadoop Admin Interview Questions And Answers

So you have finally found your dream job in Hadoop Admin but are wondering how to crack the Hadoop Admin Interview and what could be the probable Hadoop Admin Interview Questions. Every interview is different, and the scope of a job is different too. Keeping this in mind, we have designed the most common Hadoop Admin Interview Questions and Answers to help you get success in your interview.

All in One Data Science Bundle (360+ Courses, 50+ projects)
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (78,011 ratings)
View Course

Following are the Hadoop Admin Interview Questions that will help you in cracking an interview with Hadoop.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Hadoop Admin Interview Questions & Answers

Below are some useful Hadoop Admin Interview Question and Answers

1. What is Rack awareness? And why is it necessary?

Answer:
Rack awareness is about distributing data nodes across multiple racks.HDFS follows the rack awareness algorithm to place the data blocks. A rack holds multiple servers. And for a cluster, there could be multiple racks. Let’s say there is a Hadoop cluster set up with 12 nodes. There could be 3 racks with 4 servers on each. All 3 racks are connected so that all 12 nodes are connected and that form a cluster. While deciding on the track count, the important point to consider is the replication factor. Suppose there is 100GB of data that will flow every day with the replication factor 3. Then it’s 300GB of data that will have to reside on the cluster. It’s a better option to have the data replicated across the racks. Even if any node goes down, the replica will be in another rack.

2. What is the default block size, and how is it defined?

Answer:
128MB, and it is defined in hdfs-site.xml, and also this is customizable depending on the volume of the data and the level of access.  Say, 100GB of data flowing in a day, and the data gets segregated and stored across the cluster. What will be the number of files? 800 files. (1024*100/128) [1024 à converted a GB to MB.] There are two ways to set the customize data block size.

  1. hadoop fs -D fs.local.block.size=134217728 (in bits)
  2. In hdfs-site.xml add this property à block.size with the size of the bits.

If you change the default size to 512MB as the data size is huge, then the no.of files generated will be 200. (1024*100/512)

3. How do you get the report of the hdfs file system? About disk availability and no.of active nodes?

Answer:
Command: sudo -u hdfs dfsadmin –report

These are the list of information it displays,

  1. Configured Capacity – Total capacity available in hdfs
  2. Present Capacity – This is the total amount of space allocated for the resources to reside beside the metastore and fsimage usage of space.
  3. DFS Remaining – It is the amount of storage space still available to the HDFS to store more files
  4. DFS Used – It is the storage space that HDFS has used up.
  5. DFS Used% – In percentage
  6. Under replicated blocks – No. of blocks
  7. Blocks with corrupt replicas – If any corrupted blocks
  8. Missing blocks
  9. Missing blocks (with replication factor 1)

4. What is Hadoop balancer, and why is it necessary?

Answer:
The data spread across the nodes are not distributed in the right proportion, meaning each node’s utilisation might not be balanced. One node might be over-utilized, and the other could be under-utilized. This leads to having high costing effect while running any process, and it would end up running on heavy usage of those nodes. To solve this, Hadoop balancer is used to balance the utilization of the data in the nodes. So whenever a balancer is executed, the data gets moved across where the under-utilized nodes get filled up, and the over-utilized nodes will be freed up.

5. Difference between Cloudera and Ambari?

 Answer:

Cloudera Manager Ambari
Administration tool for Cloudera Administration tool for Horton works
Monitors and manages the entire cluster and reports the usage and any issues Monitors and manages the entire cluster and reports the usage and any issues
Comes with Cloudera paid service Open-source

6. What are the main actions performed by the Hadoop admin?

Answer:
Monitor health of cluster  -Many application pages have to be monitored if any processes run. (Job history server, YARN resource manager, Cloudera manager/ambary depending on the distribution)

turn on security – SSL or Kerberos

Tune performance  – Hadoop balancer

Add new data nodes as needed  – Infrastructure changes and configurations.

Optional to turn on MapReduce Job History Tracking Server à Sometimes restarting the services would help release up cache memory. This is when the cluster with an empty process.

7. What is Kerberos?

Answer:
It’s an authentication required for each service to sync up to run the process. It is recommended to enable Kerberos.  Since we are dealing with distributed computing, it is always good practice to have encryption while accessing the data and processing it. As each node are connected, and any information passage is across a network. As Hadoop uses Kerberos, passwords not sent across the networks. Instead, passwords are used to compute the encryption keys. The messages are exchanged between the client and the server. In simple terms, Kerberos provides identity to each other (nodes) in a secure manner with the encryption.

Configuration in core-site.xml
Hadoop.security.authentication: Kerberos

8. What is the important list of hdfs commands?

Answer:

Commands Purpose
hdfs dfs –ls <hdfs path> To list the files from the hdfs filesystem.
Hdfs dfs –put <local file> <hdfs folder> Copy file from the local system to the hdfs filesystem
Hdfs dfs –chmod 777 <hdfs file> Give a read, write, execute permission to the file.
Hdfs dfs –get <hdfs folder/file> <local filesystem> Copy the file from hdfs filesystem to the local filesystem
Hdfs dfs –cat <hdfs file> View the file content from the hdfs filesystem
Hdfs dfs –rm <hdfs file> Remove the file from the hdfs filesystem. But it will be moved to trash file path (it’s like a recycle bin in windows)
Hdfs dfs –rm –skipTrash <hdfs filesystem> Removes the file permanently from the cluster.
Hdfs dfs –touchz <hdfs file> Create a file in the hdfs filesystem

9. How to check the logs of a Hadoop job submitted in the cluster and how to terminate already running process?

Answer:
yarn logs –applicationId <application_id>    — The application master generates logs on its container, and it will be appended with the id it generates. This will be helpful to monitor the process running status and the log information.

Yarn application –kill <application_id>     — If an existing process that was running in the cluster needs to be terminated, kill command is used where the application id is used to terminate the job in the cluster.

Recommended Articles

This has been a guide to List Of Hadoop Admin Interview Questions and Answers. Here we have listed the most useful 9 interview sets of questions so that the jobseeker can crack the interview with ease. You may also look at the following articles to learn more.

Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)
  20 Online Courses |  14 Hands-on Projects|  135+ Hours|  Verifiable Certificate of Completion
4.5
Course Price

View Course

Data Scientist Training (85 Courses, 67+ Projects)

4.9

Machine Learning Training (20 Courses, 29+ Projects)

4.8

MapReduce Training (2 Courses, 4+ Projects)

4.7


  1. Hadoop Cluster Interview Questions and Answer – Top 10 Most Useful
  2. Data Modeling Interview Questions – 10 Important Question
  3. Hadoop Versions
  4. Hadoop Administrator Jobs

Hadoop Training Program (20 Courses, 14+ Projects)

20 Online Courses

14 Hands-on Projects

135+ Hours

Verifiable Certificate of Completion

Lifetime Access

4 Quizzes with Solutions

Learn More


11 Shares
Share
Tweet
Share
Primary Sidebar
Hadoop Tutorial
  • Interview Questions
    • Hadoop Admin Interview Questions
    • Hadoop Cluster Interview Questions
    • Hadoop developer interview Questions
    • HBase Interview Questions
  • Basics
    • What is Hadoop
    • Career in Hadoop
    • Advantages of Hadoop
    • Uses of Hadoop
    • Hadoop Versions
    • HADOOP Framework
    • Hadoop Architecture
    • Hadoop Configuration
    • Hadoop Components
    • Hadoop Database
    • Hadoop Ecosystem
    • Hadoop Tools
    • Install Hadoop
    • Is Hadoop Open Source
    • What is Hadoop Cluster
    • Hadoop Namenode
    • Hadoop data lake
    • Hadoop fsck
    • HDFS File System
    • Hadoop Distributed File System
  • Commands
    • Hadoop Commands
    • Hadoop fs Commands
    • Hadoop FS Command List
    • HDFS Commands
    • HDFS ls
    • Hadoop Stack
    • HBase Commands
  • Advanced
    • What is Yarn in Hadoop
    • Hadoop Administrator
    • Hadoop Administrator Jobs
    • Hadoop Schedulers
    • Hadoop Streaming
    • Apache Hadoop Ecosystem
    • Distributed Cache in Hadoop
    • Hadoop Ecosystem Components
    • Hadoop YARN Architecture
    • HDFS Architecture
    • What is HDFS
    • HDFS Federation
    • Apache HBase
    • HBase Architecture
    • What is Hbase
    • HBase Shell Commands
    • What is MapReduce in Hadoop
    • Mapreduce Combiner
    • MapReduce Architecture
    • MapReduce Word Count
    • Impala Shell
    • HBase Create Table

Related Courses

Data Science Certification

Online Machine Learning Training

Hadoop Certification

MapReduce Certification Course

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

Special Offer - Hadoop Training Program (20 Courses, 14+ Projects) Learn More