Home
Blog
Linux
How to Install Hadoop on Ubuntu 24.04: Step-by-Step Guide

How to Install Hadoop on Ubuntu 24.04: Step-by-Step Guide

Published on Aug 4, 2025 Updated on Aug 4, 2025

Thinking about processing large datasets on Ubuntu 24.04, but not sure where to start? You don’t need expensive hardware or a complex cluster to begin. If your goal is to store and analyze massive amounts of structured or unstructured data, Apache Hadoop offers a practical and cost-effective solution. It distributes both storage and processing using a simple programming model, making it ideal for big data workloads even on modest infrastructure.

This guide walks you through how to install and run Hadoop on Ubuntu 24.04. You’ll set it up in pseudo-distributed mode on a single machine so you can learn the fundamentals without managing multiple servers.

Whether you’re using a local desktop or a Cherry Servers instance, you’ll walk through each command and configuration step to master setting up Hadoop in a pseudo-distributed mode.

#What is Hadoop?

Apache Hadoop is an open-source framework designed to store and process large amounts of datasets, what we call "Big Data" across clusters of computers. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly. Think of it as a powerful toolkit that allows many machines to work together to solve data problems too big for a single server.

At its core, Hadoop gives you:

HDFS (Hadoop Distributed File System): This is Hadoop's storage solution. It cleverly breaks down massive files into smaller pieces and spreads them across multiple machines, ensuring your data is safe and accessible even if a server goes down.
YARN (Yet Another Resource Negotiator): YARN manages all the resources in your Hadoop cluster, like CPU and memory, and schedules all the tasks you throw at it.
MapReduce: MapReduce is Hadoop’s original processing engine. It works by breaking a large data processing job into smaller tasks, mapping those tasks across different nodes for parallel execution, and then reducing the results into a final output.

#What is Hadoop used for?

Hadoop is the backbone for countless Big Data applications, including:

Large-scale data analysis: When you need to crunch numbers from massive datasets.
Building data warehouses and data lakes: Essential for storing and organizing vast amounts of information for future use.
Log analysis: Understanding what's happening with your web servers or applications by sifting through mountains of log files.
Recommendation engines: Think Netflix or Amazon suggesting what you might like next.
Machine learning tasks: Providing the infrastructure for training complex AI models on huge datasets.

In short, if you're dealing with a lot of data and need a scalable, fault-tolerant way to store and process it, Hadoop is your go-to solution. It lets you extract valuable insights from information that would otherwise be unmanageable.

Deploy and scale your projects with Cherry Servers' cost-effective dedicated or virtual servers. Enjoy seamless scaling, pay-as-you-go pricing, and 24/7 expert support—all within a hassle-free cloud environment.

Explore Dedicated Bare Metal Servers

#Prerequisites

This guide includes a hands-on demonstration. To follow along with setting up Apache Hadoop on Ubuntu 24.04 in pseudo-distributed mode, ensure you have:

Ubuntu 24.04 server with at least 2 CPU cores, 4GB RAM, and 20GB of free disk space.
sudo (administrator) privileges.
Java Development Kit (JDK) installed (OpenJDK 11 is recommended).
OpenSSH-server and OpenSSH-client installed and SSH service running.

#How to install Hadoop on Ubuntu 24.04 (Pseudo-Distributed Mode)

In pseudo-distributed mode, all Hadoop daemons (NameNode, DataNode, ResourceManager, NodeManager) run on a single machine. This setup is configured to mimic a multi-node cluster, allowing you to test HDFS operations and run MapReduce or YARN applications as if you had a small cluster. It's the ideal starting point for learning Hadoop.

#Step 1: Update system packages on Ubuntu 24.04

First, SSH into your server if you’re working remotely. Then, update your system’s package list to ensure it’s ready for the installation process. Open a terminal and run:

sudo apt update && sudo apt upgrade -y

OutputHit:1 http://archive.ubuntu.com/ubuntu noble InRelease
...
Fetched 34.0 MB in 14s (2364 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
6 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following NEW packages will be installed:
  linux-headers-6.8.0-59 linux-headers-6.8.0-59-generic linux-image-6.8.0-59-generic
  linux-modules-6.8.0-59-generic linux-tools-6.8.0-59 linux-tools-6.8.0-59-generic
The following packages will be upgraded:
  linux-headers-generic linux-headers-virtual linux-image-virtual linux-libc-dev
  linux-tools-common linux-virtual
6 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
...
Fetched 80.1 MB in 2s (38.5 MB/s)
Selecting previously unselected package linux-headers-6.8.0-59.
...
Setting up linux-virtual (6.8.0-59.61) ...
Processing triggers for man-db (2.12.0-4build2) ...
Processing triggers for linux-image-6.8.0-59-generic (6.8.0-59.61) ...
...
Generating grub configuration file ...
...
done
Scanning processes...
Scanning linux images...

Pending kernel upgrade!
Running kernel version:
  6.8.0-58-generic
Diagnostics:
  The currently running kernel version is not the expected kernel version 6.8.0-59-generic.

Restarting the system to load the new kernel will not be handled automatically, so you should
consider rebooting.

No services need to be restarted.
...

#Step 2: Install OpenJDK for Hadoop

Hadoop is built on Java so you need a Java Development Kit (JDK) installed. Install OpenJDK 11 by executing the following command:

sudo apt install openjdk-11-jdk -y

OutputReading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  ... (a long list of libraries and other necessary files)
Suggested packages:
  ...
Recommended packages:
  ...
The following NEW packages will be installed:
  ... (many dependencies) ... openjdk-11-jdk ... (more dependencies)
0 upgraded, 92 newly installed, 0 to remove and 0 not upgraded.
Need to get 184 MB of archives.
After this operation, 553 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu noble/main amd64 alsa-topology-conf all 1.2.5.1-2 [15.5 kB]
... (many more Get lines) ...
Fetched 184 MB in 8s (22.9 MB/s)
Extracting templates from packages: 100%
Selecting previously unselected package alsa-topology-conf.
(Reading database ...)
Preparing to unpack .../alsa-topology-conf_1.2.5.1-2_all.deb ...
Unpacking alsa-topology-conf (1.2.5.1-2) ...
... (many more unpacking steps) ...
Setting up libgraphite2-3:amd64 (1.3.14-2build1) ...
Setting up ... (many more setting up steps) ...
Setting up openjdk-11-jdk:amd64 (11.0.27+6~us1-0ubuntu1~24.04) ...
Processing triggers for ...
...

Once that completes, check the version to confirm the installation:

java --version

You should see something similar to:

Outputopenjdk 11.0.27 2025-04-15
OpenJDK Runtime Environment (build 11.0.27+6-post-Ubuntu-0ubuntu124.04)
OpenJDK 64-Bit Server VM (build 11.0.27+6-post-Ubuntu-0ubuntu124.04, mixed mode, sharing)

#Step 3: Create a dedicated user for Hadoop

Next, create a dedicated user for Hadoop. This helps isolate permissions and makes management easier. Create a user named hadoop (you can use something else if you prefer):

sudo adduser hadoop

You'll be prompted to set a password and provide some optional user information.

Outputinfo: Adding user `hadoop' ...
info: Selecting UID/GID from range 1000 to 59999 ...
info: Adding new group `hadoop' (1000) ...
info: Adding new user `hadoop' (1000) with group `hadoop (1000)' ...
info: Creating home directory `/home/hadoop' ...
info: Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
        Full Name []:
        Room Number []:
        Work Phone []:
        Home Phone []:
        Other []:
Is the information correct? [Y/n] Y
info: Adding new user `hadoop' to supplemental / extra groups `users' ...
info: Adding user `hadoop' to group `users' ...

Give the user sudo privileges:

sudo usermod -aG sudo hadoop

Now switch to the hadoop user using:

su - hadoop

Enter the password for the user when prompted. Your prompt should now reflect that you are logged in as hadoop.

OutputTo run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

hadoop@demo:~$

#Step 4: Configure passwordless SSH

Next, you need to set up passwordless SSH. Hadoop uses SSH to manage its services across nodes (or within a single node for pseudo-distributed mode, connecting to localhost).

Ubuntu servers usually have SSH client and server installed, you can confirm using the following command:

Confirm SSH client & server are installed using the following commands:

ssh -V

OutputOpenSSH_9.6p1 Ubuntu-3ubuntu13.11, OpenSSL 3.0.13 30 Jan 2024

systemctl status ssh

Output● ssh.service - OpenBSD Secure Shell server
     Loaded: loaded (/usr/lib/systemd/system/ssh.service; disabled; preset: enabled)
     Active: active (running) since Wed 2025-05-14 04:42:10 EEST; 31min ago
TriggeredBy: ● ssh.socket
       Docs: man:sshd(8)
             man:sshd_config(5)
   Main PID: 1076 (sshd)
      Tasks: 1 (limit: 4655)
     Memory: 4.1M (peak: 5.3M)
        CPU: 154ms
     CGroup: /system.slice/ssh.service
             └─1076 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"

May 14 04:42:10 demo systemd[1]: Starting ssh.service - OpenBSD Secure Shell server...
May 14 04:42:10 demo sshd[1076]: Server listening on :: port 22.
May 14 04:42:10 demo systemd[1]: Started ssh.service - OpenBSD Secure Shell server.
May 14 04:42:10 demo sshd[1079]: Connection closed by 188.214.133.131 port 37150
May 14 04:52:19 demo sshd[1125]: Accepted password for root from 102.90.80.130 port 2768 ssh2
May 14 04:52:19 demo sshd[1125]: pam_unix(sshd:session): session opened for user root(uid=0) by>
May 14 05:06:19 demo sshd[7904]: Accepted password for root from 102.90.80.130 port 2636 ssh2
May 14 05:06:19 demo sshd[7904]: pam_unix(sshd:session): session opened for user root(uid=0) by>

If it's not active or the service is not found, you'll need to install or start it.

You can install and enable SSH if not already installed using the following commands:

sudo apt install openssh-server openssh-client -y
sudo systemctl enable ssh

Now, generate an SSH keypair:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

OutputGenerating public/private rsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:kin50YdBqah1j/RG6KeGAF6zKIppkxEOU14vcBiuEuo hadoop@demo
The key's randomart image is:
+---[RSA 3072]----+
|  .o    ..       |
| .+ o  ..        |
|.o.+ o o.        |
|=+.o+.*+.o       |
|Bo+o====S .      |
|+=o. ooo=.       |
|+Eo. ..+         |
|++  . o          |
|. .  .           |
+----[SHA256]-----+

You need to add the generated public key to the list of authorized keys for the user hadoop so it can log into itself (localhost) without a password. Do this using:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Then set the necessary permissions for the file to make sure only the owner of the file can read and write to it:

chmod 600 ~/.ssh/authorized_keys

Now verify the hadoop can SSH to localhost without being prompted for a password:

ssh localhost

The first time you connect, you might see a message like "The authenticity of host 'localhost (127.0.0.1)' can't be established. ED25519 key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. Are you sure you want to continue connecting (yes/no/[fingerprint])?" Type yes and press Enter. You should then be logged in without being asked for a password.

OutputThe authenticity of host 'localhost (127.0.0.1)' can't be established.
ED25519 key fingerprint is SHA256:ImVq/XOAViXspPZg1grGdz0E1Q8u1OZ9Cdsk45HAuWY.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-58-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 System information as of Mon May 12 20:47:26 EEST 2025

  System load:  0.0               Processes:             137
  Usage of /:   3.3% of 76.45GB   Users logged in:       1
  Memory usage: 8%                IPv4 address for eth0: 5.199.161.101
  Swap usage:   0%


Expanded Security Maintenance for Applications is not enabled.

0 updates can be applied immediately.

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status

*** System restart required ***
Last login: Mon May 12 20:29:21 2025 from 105.113.10.163

Once you've confirmed the passwordless login, exit the SSH session to return to your original prompt:

exit

#Step 5: Download and extract Hadoop

Now, still as the hadoop user, you'll download and extract Apache Hadoop.

Open a web browser and go to the official Apache Hadoop Releases page. Look for the latest stable binary release and copy its address. Then, download it using wget.

cd ~
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz

Output--2025-05-12 20:55:06--  https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132, 2a04:4e42::644
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 974002355 (929M) [application/x-gzip]
Saving to: ‘hadoop-3.4.1.tar.gz’

hadoop-3.4.1.tar.gz     100%[===============================>] 928.88M  72.4MB/s    in 12s

2025-05-12 20:55:26 (80.4 MB/s) - ‘hadoop-3.4.1.tar.gz’ saved [974002355/974002355]

Once the download is complete, extract the downloaded file:

tar -xzf hadoop-*.tar.gz

It's common practice to place Hadoop in a standard location like /usr/local/. Move the extracted folder there and rename it to hadoop for simplicity.

sudo mv hadoop-3.4.1 /usr/local/hadoop

Now ensure hadoop owns the /usr/local/hadoop directory and its contents:

sudo chown -R hadoop:hadoop /usr/local/hadoop

#Step 6: Configure Hadoop environment variables

You need to set several environment variables so that your system and Hadoop can locate necessary files and configurations.

Open the .bashrc file for editing:

nano ~/.bashrc

Add the following lines to the end of the file. These variables tell your system where your Hadoop installation resides and are used by Hadoop scripts.

# Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Next, you need to set the environment variables for Java to allow other applications to find it. To do so, access the~/.bashrcfile.

sudo nano ~/.bashrc

Add the following lines, which will specify the JAVA_HOME environment variable.

export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
export PATH=$PATH:$JAVA_HOME/bin

Save the changes and exit. Then source the ~/.bashrc file to apply the changes made.

source ~/.bashrc

Be sure to verify that the JAVA_HOME environment variable has been correctly set.

echo  $JAVA_HOME

Output/usr/lib/jvm/java-11-openjdk-amd64
``
Save and close the file (`Ctrl+X`, then `Y`, then `Enter`).

You can verify your Hadoop installation using:

```bash command
hadoop version

OutputHadoop 3.4.1
Source code repository https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1
Compiled by mthakur on 2024-10-09T14:57Z
Compiled on platform linux-x86_64
Compiled with protoc 3.23.4
From source with checksum 7292fe9dba5e2e44e3a9f763fce3e680
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.4.1.jar

Verify that HADOOP_HOME is set correctly:

echo $HADOOP_HOME

Output/usr/local/hadoop

Next, you need to ensure that Hadoop knows the location of your Java installation. Now, explicitly set JAVA_HOME within Hadoop's configuration. Open the hadoop-env.sh file for editing:

nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Look for a line that starts with # export JAVA_HOME=. Uncomment it (remove the #) and set it to your JAVA_HOME path:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Save and close the file (Ctrl+X, then Y, then Enter).

#Step 7: Configure Hadoop XML Files for Pseudo-Distributed Mode

In this step, you'll configure the core XML files located in $HADOOP_HOME/etc/hadoop/. These files dictate how Hadoop functions in pseudo-distributed mode.

Start by creating the directories Hadoop will use for HDFS data storage:

mkdir -p $HADOOP_HOME/hdfs/namenode
mkdir -p $HADOOP_HOME/hdfs/datanode

Then set the ownership of the directory and its contents to the hadoop user.

sudo chown -R hadoop:hadoop $HADOOP_HOME/hdfs

Now, let's edit the XML configuration files.

Core-site.xml

Open the core-site.xml file:

nano $HADOOP_HOME/etc/hadoop/core-site.xml

Replace the empty <configuration></configuration> tags with the following:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
        <description>The default file system URI</description>
    </property>
</configuration>

hdfs://localhost:9000 tells Hadoop to use HDFS running on localhost at port 9000.

hdfs-site.xml

Open the hdfs-site.xml file with the Nano editor:

nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Add the following between the <configuration> tags:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
        <description>Default block replication.</description>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>file:///usr/local/hadoop/hdfs/namenode</value>
        <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs.</description>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>file:///usr/local/hadoop/hdfs/datanode</value>
        <description>Path on the local filesystem where the DataNode stores its blocks.</description>
    </property>
</configuration>

This configuration sets up a single-replica HDFS environment using the data directories you created earlier.

mapred-site.xml

Open the mapred-site.xml file for edit:

nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Replace the empty <configuration></configuration> tags with:

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
        <description>The runtime framework for MapReduce. Can be local, classic or yarn.</description>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
    </property>
</configuration>

This enables MapReduce to run on YARN and sets the appropriate environment variables.

yarn-site.xml

Finally, edit the yarn-site.xml file:

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Add the following between the <configuration> tags:

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>Auxilliary services required by the NodeManager.</description>
    </property>
</configuration>

This enables the shuffle service, which is necessary for running MapReduce jobs on YARN.

#Step 8: Format HDFS NameMode

Before HDFS can be used, the NameNode must be formatted. This step initializes the HDFS file system and sets up the directory structure defined in your hdfs-site.xml.

Still as hadoop, run:

hdfs namenode -format

Only run this command once during the initial setup. Reformatting will erase all data stored in HDFS.

This will output several log messages. Toward the end, look for confirmation lines like: INFO common.Storage: Storage directory /usr/local/hadoop/data/namenode has been successfully formatted. And also: INFO namenode.NameNode: SHUTDOWN_MSG: Shutting down NameNode at ...

OutputWARNING: /usr/local/hadoop/logs does not exist. Creating.
2025-05-13 01:56:40,856 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = demo/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.4.1
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-5.2.0.jar:... (truncated for brevity) ...
STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1; compiled by 'mthakur' on 2024-10-09T14:57Z
STARTUP_MSG:   java = 11.0.27
************************************************************/
2025-05-13 01:56:40,873 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2025-05-13 01:56:41,072 INFO namenode.NameNode: createNameNode [-format]
2025-05-13 01:56:42,003 INFO namenode.NameNode: **Formatting using clusterid: CID-8e15e115-84e9-4400-8014-68ad4b72a38f**
2025-05-13 01:56:42,080 INFO namenode.FSEditLog: Edit logging is async:true
2025-05-13 01:56:42,142 INFO namenode.FSNamesystem: KeyProvider: null
2025-05-13 01:56:42,145 INFO namenode.FSNamesystem: fsLock is fair: true
2025-05-13 01:56:42,182 INFO namenode.FSNamesystem: fsOwner                = hadoop (auth:SIMPLE)
2025-05-13 01:56:42,183 INFO namenode.FSNamesystem: isPermissionEnabled    = true
... (configuration details and GSet info omitted for brevity) ...
2025-05-13 01:56:42,963 INFO namenode.FSImage: Allocated new BlockPoolId: BP-42078627-127.0.1.1-1747090602953
2025-05-13 01:56:42,999 INFO common.Storage: **Storage directory /usr/local/hadoop/hdfs/namenode has been successfully formatted.**
2025-05-13 01:56:43,056 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
2025-05-13 01:56:43,263 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2025-05-13 01:56:43,342 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at demo/127.0.1.1
************************************************************/

#Step 9: Start Hadoop services

With the configuration complete, it’s time to start the core Hadoop services. Hadoop provides handy scripts to start both HDFS and YARN services.

Run the following script to start the NameNode, DataNode, and SecondaryNameNode:

start-dfs.sh

OutputStarting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [demo]
demo: Warning: Permanently added 'demo' (ED25519) to the list of known hosts.

Next, start the ResourceManager and NodeManager:

start-yarn.sh

OutputStarting resourcemanager
Starting nodemanagers

After running both scripts, your Hadoop services should be up and running.

#Step 10: Verify that Hadoop daemons are running

To confirm that all Hadoop services started successfully, use the jps command. This tool lists all Java processes currently running on the system.

jps

You should see output similar to this (Process IDs will vary):

Output9300 NameNode
10373 Jps
9435 DataNode
10029 NodeManager
9885 ResourceManager
9647 SecondaryNameNode

Seeing these processes confirms that Hadoop is running correctly.

#Step 11: Access Web UIs

Hadoop provides built-in web interfaces to help you monitor the health and activity of your cluster. You can view real-time status, nodes, file system usage, and running applications.

Open a web browser and navigate to the Hadoop web interfaces:

If you’re running Hadoop locally on your own machine, use:
- HDFS NameNode UI: http://localhost:9870
- YARN ResourceManager UI: http://localhost:8088
If your Hadoop is running on a remote server (like a Cherry Servers instance), replace localhost with your server’s public IP address:
- HDFS NameNode UI: http://<server-ip>:9870
- YARN ResourceManager UI: http://<server-ip>:8088

Once you open these URLs, you should see dashboards showing cluster health, node status, and running applications (initially empty if no jobs have been submitted yet).

#Step 12: Stopping Hadoop services

When you're done, you can stop the Hadoop services.

Stop the YARN daemons by running:

stop-yarn.sh

You should see something like:

OutputStopping nodemanagers
Stopping resourcemanager

Stop the HDFS daemons using:

stop-dfs.sh

Expected output:

OutputStopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [demo]

Run jps again to confirm that all Hadoop processes have been terminated:

jps

You should only see the Jps process in the output, like this:

Output18704 Jps

#Conclusion

You’ve now successfully installed and configured Apache Hadoop in pseudo-distributed mode on your Ubuntu 24.04 server. From installing dependencies and configuring environment variables to starting the core services and verifying everything with the web UIs, you now have a functional mini-cluster running on a single machine. This setup is ideal for learning how Hadoop works under the hood and for running smaller-scale MapReduce jobs without the need for multiple servers.

Now that you’re familiar with the core components, why not try setting up a multi-node Hadoop cluster? This will give you hands-on experience with real-world distributed data processing and node-to-node communication.

#Linux

Published on Jun 7, 2021 Updated on Jun 29, 2022

AlmaLinux Review: a CentOS Clone Supported by CloudLinux

AlmaLinux is an open-source Linux distribution focused on long-term stability, that is a 1:1 binary compatible fork of Red Hat Enterprise Linux (RHEL)

#Linux

Published on May 31, 2022 Updated on May 5, 2023

A Complete Guide to Linux Bash History

Learn how to work with Bash history to become more efficient with any modern *nix operating system.

#Linux

Published on Jan 26, 2022 Updated on Jun 15, 2023

How to Use Cron to Automate Linux Jobs on Ubuntu 20.04

Learn how to use Cron - the most popular Linux workload automation tool that is widely used in Linux community - to automate Linux jobs on Ubuntu 20.04.

How to Install Hadoop on Ubuntu 24.04: Step-by-Step Guide

#What is Hadoop?

#What is Hadoop used for?

#Prerequisites

#How to install Hadoop on Ubuntu 24.04 (Pseudo-Distributed Mode)

#Step 1: Update system packages on Ubuntu 24.04

#Step 2: Install OpenJDK for Hadoop

#Step 3: Create a dedicated user for Hadoop

#Step 4: Configure passwordless SSH

#Step 5: Download and extract Hadoop

#Step 6: Configure Hadoop environment variables

#Step 7: Configure Hadoop XML Files for Pseudo-Distributed Mode

#Step 8: Format HDFS NameMode

#Step 9: Start Hadoop services

#Step 10: Verify that Hadoop daemons are running

#Step 11: Access Web UIs

#Step 12: Stopping Hadoop services

#Conclusion

Cloud VPS Hosting

Related Articles

AlmaLinux Review: a CentOS Clone Supported by CloudLinux

A Complete Guide to Linux Bash History

How to Use Cron to Automate Linux Jobs on Ubuntu 20.04