How to Install Hadoop on Ubuntu 24.04: Step-by-Step Guide

Thinking about processing large datasets on Ubuntu 24.04, but not sure where to start? You don’t need expensive hardware or a complex cluster to begin. If your goal is to store and analyze massive amounts of structured or unstructured data, Apache Hadoop offers a practical and cost-effective solution. It distributes both storage and processing using a simple programming model, making it ideal for big data workloads even on modest infrastructure.
This guide walks you through how to install and run Hadoop on Ubuntu 24.04. You’ll set it up in pseudo-distributed mode on a single machine so you can learn the fundamentals without managing multiple servers.
Whether you’re using a local desktop or a Cherry Servers instance, you’ll walk through each command and configuration step to master setting up Hadoop in a pseudo-distributed mode.
#What is Hadoop?
Apache Hadoop is an open-source framework designed to store and process large amounts of datasets, what we call "Big Data" across clusters of computers. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly. Think of it as a powerful toolkit that allows many machines to work together to solve data problems too big for a single server.
At its core, Hadoop gives you:
-
HDFS (Hadoop Distributed File System): This is Hadoop's storage solution. It cleverly breaks down massive files into smaller pieces and spreads them across multiple machines, ensuring your data is safe and accessible even if a server goes down.
-
YARN (Yet Another Resource Negotiator): YARN manages all the resources in your Hadoop cluster, like CPU and memory, and schedules all the tasks you throw at it.
-
MapReduce: MapReduce is Hadoop’s original processing engine. It works by breaking a large data processing job into smaller tasks, mapping those tasks across different nodes for parallel execution, and then reducing the results into a final output.
#What is Hadoop used for?
Hadoop is the backbone for countless Big Data applications, including:
-
Large-scale data analysis: When you need to crunch numbers from massive datasets.
-
Building data warehouses and data lakes: Essential for storing and organizing vast amounts of information for future use.
-
Log analysis: Understanding what's happening with your web servers or applications by sifting through mountains of log files.
-
Recommendation engines: Think Netflix or Amazon suggesting what you might like next.
-
Machine learning tasks: Providing the infrastructure for training complex AI models on huge datasets.
In short, if you're dealing with a lot of data and need a scalable, fault-tolerant way to store and process it, Hadoop is your go-to solution. It lets you extract valuable insights from information that would otherwise be unmanageable.
Deploy and scale your projects with Cherry Servers' cost-effective dedicated or virtual servers. Enjoy seamless scaling, pay-as-you-go pricing, and 24/7 expert support—all within a hassle-free cloud environment.
#Prerequisites
This guide includes a hands-on demonstration. To follow along with setting up Apache Hadoop on Ubuntu 24.04 in pseudo-distributed mode, ensure you have:
-
Ubuntu 24.04 server with at least 2 CPU cores, 4GB RAM, and 20GB of free disk space.
-
Java Development Kit (JDK) installed (OpenJDK 11 is recommended).
-
OpenSSH-server and OpenSSH-client installed and SSH service running.
#How to install Hadoop on Ubuntu 24.04 (Pseudo-Distributed Mode)
In pseudo-distributed mode, all Hadoop daemons (NameNode, DataNode, ResourceManager, NodeManager) run on a single machine. This setup is configured to mimic a multi-node cluster, allowing you to test HDFS operations and run MapReduce or YARN applications as if you had a small cluster. It's the ideal starting point for learning Hadoop.
#Step 1: Update system packages on Ubuntu 24.04
First, SSH into your server if you’re working remotely. Then, update your system’s package list to ensure it’s ready for the installation process. Open a terminal and run:
sudo apt update && sudo apt upgrade -y
OutputHit:1 http://archive.ubuntu.com/ubuntu noble InRelease
...
Fetched 34.0 MB in 14s (2364 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
6 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following NEW packages will be installed:
linux-headers-6.8.0-59 linux-headers-6.8.0-59-generic linux-image-6.8.0-59-generic
linux-modules-6.8.0-59-generic linux-tools-6.8.0-59 linux-tools-6.8.0-59-generic
The following packages will be upgraded:
linux-headers-generic linux-headers-virtual linux-image-virtual linux-libc-dev
linux-tools-common linux-virtual
6 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
...
Fetched 80.1 MB in 2s (38.5 MB/s)
Selecting previously unselected package linux-headers-6.8.0-59.
...
Setting up linux-virtual (6.8.0-59.61) ...
Processing triggers for man-db (2.12.0-4build2) ...
Processing triggers for linux-image-6.8.0-59-generic (6.8.0-59.61) ...
...
Generating grub configuration file ...
...
done
Scanning processes...
Scanning linux images...
Pending kernel upgrade!
Running kernel version:
6.8.0-58-generic
Diagnostics:
The currently running kernel version is not the expected kernel version 6.8.0-59-generic.
Restarting the system to load the new kernel will not be handled automatically, so you should
consider rebooting.
No services need to be restarted.
...
#Step 2: Install OpenJDK for Hadoop
Hadoop is built on Java so you need a Java Development Kit (JDK) installed. Install OpenJDK 11 by executing the following command:
sudo apt install openjdk-11-jdk -y
OutputReading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
... (a long list of libraries and other necessary files)
Suggested packages:
...
Recommended packages:
...
The following NEW packages will be installed:
... (many dependencies) ... openjdk-11-jdk ... (more dependencies)
0 upgraded, 92 newly installed, 0 to remove and 0 not upgraded.
Need to get 184 MB of archives.
After this operation, 553 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu noble/main amd64 alsa-topology-conf all 1.2.5.1-2 [15.5 kB]
... (many more Get lines) ...
Fetched 184 MB in 8s (22.9 MB/s)
Extracting templates from packages: 100%
Selecting previously unselected package alsa-topology-conf.
(Reading database ...)
Preparing to unpack .../alsa-topology-conf_1.2.5.1-2_all.deb ...
Unpacking alsa-topology-conf (1.2.5.1-2) ...
... (many more unpacking steps) ...
Setting up libgraphite2-3:amd64 (1.3.14-2build1) ...
Setting up ... (many more setting up steps) ...
Setting up openjdk-11-jdk:amd64 (11.0.27+6~us1-0ubuntu1~24.04) ...
Processing triggers for ...
...
Once that completes, check the version to confirm the installation:
java --version
You should see something similar to:
Outputopenjdk 11.0.27 2025-04-15
OpenJDK Runtime Environment (build 11.0.27+6-post-Ubuntu-0ubuntu124.04)
OpenJDK 64-Bit Server VM (build 11.0.27+6-post-Ubuntu-0ubuntu124.04, mixed mode, sharing)
#Step 3: Create a dedicated user for Hadoop
Next, create a dedicated user for Hadoop. This helps isolate permissions and makes management easier. Create a user named hadoop
(you can use something else if you prefer):
sudo adduser hadoop
You'll be prompted to set a password and provide some optional user information.
Outputinfo: Adding user `hadoop' ...
info: Selecting UID/GID from range 1000 to 59999 ...
info: Adding new group `hadoop' (1000) ...
info: Adding new user `hadoop' (1000) with group `hadoop (1000)' ...
info: Creating home directory `/home/hadoop' ...
info: Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] Y
info: Adding new user `hadoop' to supplemental / extra groups `users' ...
info: Adding user `hadoop' to group `users' ...
Give the user sudo privileges:
sudo usermod -aG sudo hadoop
Now switch to the hadoop
user using:
su - hadoop
Enter the password for the user when prompted. Your prompt should now reflect that you are logged in as hadoop
.
OutputTo run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
hadoop@demo:~$
#Step 4: Configure passwordless SSH
Next, you need to set up passwordless SSH. Hadoop uses SSH to manage its services across nodes (or within a single node for pseudo-distributed mode, connecting to localhost
).
Ubuntu servers usually have SSH client and server installed, you can confirm using the following command:
Confirm SSH client & server are installed using the following commands:
ssh -V
OutputOpenSSH_9.6p1 Ubuntu-3ubuntu13.11, OpenSSL 3.0.13 30 Jan 2024
systemctl status ssh
Output● ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/usr/lib/systemd/system/ssh.service; disabled; preset: enabled)
Active: active (running) since Wed 2025-05-14 04:42:10 EEST; 31min ago
TriggeredBy: ● ssh.socket
Docs: man:sshd(8)
man:sshd_config(5)
Main PID: 1076 (sshd)
Tasks: 1 (limit: 4655)
Memory: 4.1M (peak: 5.3M)
CPU: 154ms
CGroup: /system.slice/ssh.service
└─1076 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"
May 14 04:42:10 demo systemd[1]: Starting ssh.service - OpenBSD Secure Shell server...
May 14 04:42:10 demo sshd[1076]: Server listening on :: port 22.
May 14 04:42:10 demo systemd[1]: Started ssh.service - OpenBSD Secure Shell server.
May 14 04:42:10 demo sshd[1079]: Connection closed by 188.214.133.131 port 37150
May 14 04:52:19 demo sshd[1125]: Accepted password for root from 102.90.80.130 port 2768 ssh2
May 14 04:52:19 demo sshd[1125]: pam_unix(sshd:session): session opened for user root(uid=0) by>
May 14 05:06:19 demo sshd[7904]: Accepted password for root from 102.90.80.130 port 2636 ssh2
May 14 05:06:19 demo sshd[7904]: pam_unix(sshd:session): session opened for user root(uid=0) by>
If it's not active or the service is not found, you'll need to install or start it.
You can install and enable SSH if not already installed using the following commands:
sudo apt install openssh-server openssh-client -y
sudo systemctl enable ssh
Now, generate an SSH keypair:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
OutputGenerating public/private rsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:kin50YdBqah1j/RG6KeGAF6zKIppkxEOU14vcBiuEuo hadoop@demo
The key's randomart image is:
+---[RSA 3072]----+
| .o .. |
| .+ o .. |
|.o.+ o o. |
|=+.o+.*+.o |
|Bo+o====S . |
|+=o. ooo=. |
|+Eo. ..+ |
|++ . o |
|. . . |
+----[SHA256]-----+
You need to add the generated public key to the list of authorized keys for the user hadoop
so it can log into itself (localhost
) without a password. Do this using:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Then set the necessary permissions for the file to make sure only the owner of the file can read and write to it:
chmod 600 ~/.ssh/authorized_keys
Now verify the hadoop
can SSH to localhost
without being prompted for a password:
ssh localhost
The first time you connect, you might see a message like "The authenticity of host 'localhost (127.0.0.1)' can't be established. ED25519 key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. Are you sure you want to continue connecting (yes/no/[fingerprint])?" Type yes
and press Enter. You should then be logged in without being asked for a password.
OutputThe authenticity of host 'localhost (127.0.0.1)' can't be established.
ED25519 key fingerprint is SHA256:ImVq/XOAViXspPZg1grGdz0E1Q8u1OZ9Cdsk45HAuWY.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-58-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/pro
System information as of Mon May 12 20:47:26 EEST 2025
System load: 0.0 Processes: 137
Usage of /: 3.3% of 76.45GB Users logged in: 1
Memory usage: 8% IPv4 address for eth0: 5.199.161.101
Swap usage: 0%
Expanded Security Maintenance for Applications is not enabled.
0 updates can be applied immediately.
Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status
*** System restart required ***
Last login: Mon May 12 20:29:21 2025 from 105.113.10.163
Once you've confirmed the passwordless login, exit the SSH session to return to your original prompt:
exit
#Step 5: Download and extract Hadoop
Now, still as the hadoop
user, you'll download and extract Apache Hadoop.
Open a web browser and go to the official Apache Hadoop Releases page. Look for the latest stable binary release and copy its address. Then, download it using wget
.
cd ~
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
Output--2025-05-12 20:55:06-- https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132, 2a04:4e42::644
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 974002355 (929M) [application/x-gzip]
Saving to: ‘hadoop-3.4.1.tar.gz’
hadoop-3.4.1.tar.gz 100%[===============================>] 928.88M 72.4MB/s in 12s
2025-05-12 20:55:26 (80.4 MB/s) - ‘hadoop-3.4.1.tar.gz’ saved [974002355/974002355]
Once the download is complete, extract the downloaded file:
tar -xzf hadoop-*.tar.gz
It's common practice to place Hadoop in a standard location like /usr/local/
. Move the extracted folder there and rename it to hadoop
for simplicity.
sudo mv hadoop-3.4.1 /usr/local/hadoop
Now ensure hadoop
owns the /usr/local/hadoop
directory and its contents:
sudo chown -R hadoop:hadoop /usr/local/hadoop
#Step 6: Configure Hadoop environment variables
You need to set several environment variables so that your system and Hadoop can locate necessary files and configurations.
Open the .bashrc
file for editing:
nano ~/.bashrc
Add the following lines to the end of the file. These variables tell your system where your Hadoop installation resides and are used by Hadoop scripts.
# Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Next, you need to set the environment variables for Java to allow other applications to find it. To do so, access the~/.bashrc
file.
sudo nano ~/.bashrc
Add the following lines, which will specify the JAVA_HOME
environment variable.
export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
export PATH=$PATH:$JAVA_HOME/bin
Save the changes and exit. Then source the ~/.bashrc
file to apply the changes made.
source ~/.bashrc
Be sure to verify that the JAVA_HOME
environment variable has been correctly set.
echo $JAVA_HOME
Output/usr/lib/jvm/java-11-openjdk-amd64
``
Save and close the file (`Ctrl+X`, then `Y`, then `Enter`).
You can verify your Hadoop installation using:
```bash command
hadoop version
OutputHadoop 3.4.1
Source code repository https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1
Compiled by mthakur on 2024-10-09T14:57Z
Compiled on platform linux-x86_64
Compiled with protoc 3.23.4
From source with checksum 7292fe9dba5e2e44e3a9f763fce3e680
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.4.1.jar
Verify that HADOOP_HOME
is set correctly:
echo $HADOOP_HOME
Output/usr/local/hadoop
Next, you need to ensure that Hadoop knows the location of your Java installation. Now, explicitly set JAVA_HOME
within Hadoop's configuration. Open the hadoop-env.sh
file for editing:
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Look for a line that starts with # export JAVA_HOME=
. Uncomment it (remove the #
) and set it to your JAVA_HOME
path:
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
Save and close the file (Ctrl+X
, then Y
, then Enter
).
#Step 7: Configure Hadoop XML Files for Pseudo-Distributed Mode
In this step, you'll configure the core XML files located in $HADOOP_HOME/etc/hadoop/
. These files dictate how Hadoop functions in pseudo-distributed mode.
Start by creating the directories Hadoop will use for HDFS data storage:
mkdir -p $HADOOP_HOME/hdfs/namenode
mkdir -p $HADOOP_HOME/hdfs/datanode
Then set the ownership of the directory and its contents to the hadoop
user.
sudo chown -R hadoop:hadoop $HADOOP_HOME/hdfs
Now, let's edit the XML configuration files.
-
Core-site.xml
Open the
core-site.xml
file:Command Linenano $HADOOP_HOME/etc/hadoop/core-site.xml
Replace the empty
<configuration></configuration>
tags with the following:”core-site.xml”<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> <description>The default file system URI</description> </property> </configuration>
hdfs://localhost:9000
tells Hadoop to use HDFS running onlocalhost
at port9000
.
-
hdfs-site.xml
Open the
hdfs-site.xml
file with the Nano editor:Command Linenano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following between the
<configuration>
tags:”hdfs-site.xml”<configuration> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication.</description> </property> <property> <name>dfs.name.dir</name> <value>file:///usr/local/hadoop/hdfs/namenode</value> <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs.</description> </property> <property> <name>dfs.data.dir</name> <value>file:///usr/local/hadoop/hdfs/datanode</value> <description>Path on the local filesystem where the DataNode stores its blocks.</description> </property> </configuration>
This configuration sets up a single-replica HDFS environment using the data directories you created earlier.
-
mapred-site.xml
Open the
mapred-site.xml
file for edit:Command Linenano $HADOOP_HOME/etc/hadoop/mapred-site.xml
Replace the empty
<configuration></configuration>
tags with:”mapred-site.xml”<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>The runtime framework for MapReduce. Can be local, classic or yarn.</description> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value> </property> </configuration>
This enables MapReduce to run on YARN and sets the appropriate environment variables.
-
yarn-site.xml
Finally, edit the
yarn-site.xml
file:Command Linenano $HADOOP_HOME/etc/hadoop/yarn-site.xml
Add the following between the
<configuration>
tags:”yarn-site.xml”<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <description>Auxilliary services required by the NodeManager.</description> </property> </configuration>
This enables the shuffle service, which is necessary for running MapReduce jobs on YARN.
#Step 8: Format HDFS NameMode
Before HDFS can be used, the NameNode must be formatted. This step initializes the HDFS file system and sets up the directory structure defined in your hdfs-site.xml
.
Still as hadoop
, run:
hdfs namenode -format
Only run this command once during the initial setup. Reformatting will erase all data stored in HDFS.
This will output several log messages. Toward the end, look for confirmation lines like: INFO
common.Storage: Storage directory /usr/local/hadoop/data/namenode has been successfully formatted.
And also: INFO namenode.NameNode: SHUTDOWN_MSG: Shutting down NameNode at ...
OutputWARNING: /usr/local/hadoop/logs does not exist. Creating.
2025-05-13 01:56:40,856 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = demo/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.4.1
STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-5.2.0.jar:... (truncated for brevity) ...
STARTUP_MSG: build = https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1; compiled by 'mthakur' on 2024-10-09T14:57Z
STARTUP_MSG: java = 11.0.27
************************************************************/
2025-05-13 01:56:40,873 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2025-05-13 01:56:41,072 INFO namenode.NameNode: createNameNode [-format]
2025-05-13 01:56:42,003 INFO namenode.NameNode: **Formatting using clusterid: CID-8e15e115-84e9-4400-8014-68ad4b72a38f**
2025-05-13 01:56:42,080 INFO namenode.FSEditLog: Edit logging is async:true
2025-05-13 01:56:42,142 INFO namenode.FSNamesystem: KeyProvider: null
2025-05-13 01:56:42,145 INFO namenode.FSNamesystem: fsLock is fair: true
2025-05-13 01:56:42,182 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
2025-05-13 01:56:42,183 INFO namenode.FSNamesystem: isPermissionEnabled = true
... (configuration details and GSet info omitted for brevity) ...
2025-05-13 01:56:42,963 INFO namenode.FSImage: Allocated new BlockPoolId: BP-42078627-127.0.1.1-1747090602953
2025-05-13 01:56:42,999 INFO common.Storage: **Storage directory /usr/local/hadoop/hdfs/namenode has been successfully formatted.**
2025-05-13 01:56:43,056 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
2025-05-13 01:56:43,263 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2025-05-13 01:56:43,342 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at demo/127.0.1.1
************************************************************/
#Step 9: Start Hadoop services
With the configuration complete, it’s time to start the core Hadoop services. Hadoop provides handy scripts to start both HDFS and YARN services.
Run the following script to start the NameNode, DataNode, and SecondaryNameNode:
start-dfs.sh
OutputStarting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [demo]
demo: Warning: Permanently added 'demo' (ED25519) to the list of known hosts.
Next, start the ResourceManager and NodeManager:
start-yarn.sh
OutputStarting resourcemanager
Starting nodemanagers
After running both scripts, your Hadoop services should be up and running.
#Step 10: Verify that Hadoop daemons are running
To confirm that all Hadoop services started successfully, use the jps
command. This tool lists all Java processes currently running on the system.
jps
You should see output similar to this (Process IDs will vary):
Output9300 NameNode
10373 Jps
9435 DataNode
10029 NodeManager
9885 ResourceManager
9647 SecondaryNameNode
Seeing these processes confirms that Hadoop is running correctly.
#Step 11: Access Web UIs
Hadoop provides built-in web interfaces to help you monitor the health and activity of your cluster. You can view real-time status, nodes, file system usage, and running applications.
Open a web browser and navigate to the Hadoop web interfaces:
-
If you’re running Hadoop locally on your own machine, use:
-
HDFS NameNode UI: http://localhost:9870
-
YARN ResourceManager UI: http://localhost:8088
-
-
If your Hadoop is running on a remote server (like a Cherry Servers instance), replace
localhost
with your server’s public IP address:-
HDFS NameNode UI:
http://<server-ip>:9870
-
YARN ResourceManager UI:
http://<server-ip>:8088
-
Once you open these URLs, you should see dashboards showing cluster health, node status, and running applications (initially empty if no jobs have been submitted yet).
#Step 12: Stopping Hadoop services
When you're done, you can stop the Hadoop services.
Stop the YARN daemons by running:
stop-yarn.sh
You should see something like:
OutputStopping nodemanagers
Stopping resourcemanager
Stop the HDFS daemons using:
stop-dfs.sh
Expected output:
OutputStopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [demo]
Run jps
again to confirm that all Hadoop processes have been terminated:
jps
You should only see the Jps
process in the output, like this:
Output18704 Jps
#Conclusion
You’ve now successfully installed and configured Apache Hadoop in pseudo-distributed mode on your Ubuntu 24.04 server. From installing dependencies and configuring environment variables to starting the core services and verifying everything with the web UIs, you now have a functional mini-cluster running on a single machine. This setup is ideal for learning how Hadoop works under the hood and for running smaller-scale MapReduce jobs without the need for multiple servers.
Now that you’re familiar with the core components, why not try setting up a multi-node Hadoop cluster? This will give you hands-on experience with real-world distributed data processing and node-to-node communication.
Cloud VPS Hosting
Starting at just $3.24 / month, get virtual servers with top-tier performance.