Hot Summer Sale - up to 36% OFF

How to Install Hadoop on Ubuntu 24.04: Step-by-Step Guide

How to Install Hadoop on Ubuntu 24.04: Step-by-Step Guide
Published on Aug 4, 2025 Updated on Aug 4, 2025

Thinking about processing large datasets on Ubuntu 24.04, but not sure where to start? You don’t need expensive hardware or a complex cluster to begin. If your goal is to store and analyze massive amounts of structured or unstructured data, Apache Hadoop offers a practical and cost-effective solution. It distributes both storage and processing using a simple programming model, making it ideal for big data workloads even on modest infrastructure.

This guide walks you through how to install and run Hadoop on Ubuntu 24.04. You’ll set it up in pseudo-distributed mode on a single machine so you can learn the fundamentals without managing multiple servers.

Whether you’re using a local desktop or a Cherry Servers instance, you’ll walk through each command and configuration step to master setting up Hadoop in a pseudo-distributed mode.

#What is Hadoop?

Apache Hadoop is an open-source framework designed to store and process large amounts of datasets, what we call "Big Data" across clusters of computers. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly. Think of it as a powerful toolkit that allows many machines to work together to solve data problems too big for a single server.

At its core, Hadoop gives you:

  • HDFS (Hadoop Distributed File System): This is Hadoop's storage solution. It cleverly breaks down massive files into smaller pieces and spreads them across multiple machines, ensuring your data is safe and accessible even if a server goes down.

  • YARN (Yet Another Resource Negotiator): YARN manages all the resources in your Hadoop cluster, like CPU and memory, and schedules all the tasks you throw at it.

  • MapReduce: MapReduce is Hadoop’s original processing engine. It works by breaking a large data processing job into smaller tasks, mapping those tasks across different nodes for parallel execution, and then reducing the results into a final output.

#What is Hadoop used for?

Hadoop is the backbone for countless Big Data applications, including:

  • Large-scale data analysis: When you need to crunch numbers from massive datasets.

  • Building data warehouses and data lakes: Essential for storing and organizing vast amounts of information for future use.

  • Log analysis: Understanding what's happening with your web servers or applications by sifting through mountains of log files.

  • Recommendation engines: Think Netflix or Amazon suggesting what you might like next.

  • Machine learning tasks: Providing the infrastructure for training complex AI models on huge datasets.

In short, if you're dealing with a lot of data and need a scalable, fault-tolerant way to store and process it, Hadoop is your go-to solution. It lets you extract valuable insights from information that would otherwise be unmanageable.

Deploy and scale your projects with Cherry Servers' cost-effective dedicated or virtual servers. Enjoy seamless scaling, pay-as-you-go pricing, and 24/7 expert support—all within a hassle-free cloud environment.

#Prerequisites

This guide includes a hands-on demonstration. To follow along with setting up Apache Hadoop on Ubuntu 24.04 in pseudo-distributed mode, ensure you have:

#How to install Hadoop on Ubuntu 24.04 (Pseudo-Distributed Mode)

In pseudo-distributed mode, all Hadoop daemons (NameNode, DataNode, ResourceManager, NodeManager) run on a single machine. This setup is configured to mimic a multi-node cluster, allowing you to test HDFS operations and run MapReduce or YARN applications as if you had a small cluster. It's the ideal starting point for learning Hadoop.

#Step 1: Update system packages on Ubuntu 24.04

First, SSH into your server if you’re working remotely. Then, update your system’s package list to ensure it’s ready for the installation process. Open a terminal and run:

Command Line
sudo apt update && sudo apt upgrade -y
OutputHit:1 http://archive.ubuntu.com/ubuntu noble InRelease
...
Fetched 34.0 MB in 14s (2364 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
6 packages can be upgraded. Run 'apt list --upgradable' to see them.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following NEW packages will be installed:
  linux-headers-6.8.0-59 linux-headers-6.8.0-59-generic linux-image-6.8.0-59-generic
  linux-modules-6.8.0-59-generic linux-tools-6.8.0-59 linux-tools-6.8.0-59-generic
The following packages will be upgraded:
  linux-headers-generic linux-headers-virtual linux-image-virtual linux-libc-dev
  linux-tools-common linux-virtual
6 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
...
Fetched 80.1 MB in 2s (38.5 MB/s)
Selecting previously unselected package linux-headers-6.8.0-59.
...
Setting up linux-virtual (6.8.0-59.61) ...
Processing triggers for man-db (2.12.0-4build2) ...
Processing triggers for linux-image-6.8.0-59-generic (6.8.0-59.61) ...
...
Generating grub configuration file ...
...
done
Scanning processes...
Scanning linux images...

Pending kernel upgrade!
Running kernel version:
  6.8.0-58-generic
Diagnostics:
  The currently running kernel version is not the expected kernel version 6.8.0-59-generic.

Restarting the system to load the new kernel will not be handled automatically, so you should
consider rebooting.

No services need to be restarted.
...

#Step 2: Install OpenJDK for Hadoop

Hadoop is built on Java so you need a Java Development Kit (JDK) installed. Install OpenJDK 11 by executing the following command:

Command Line
sudo apt install openjdk-11-jdk -y
OutputReading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  ... (a long list of libraries and other necessary files)
Suggested packages:
  ...
Recommended packages:
  ...
The following NEW packages will be installed:
  ... (many dependencies) ... openjdk-11-jdk ... (more dependencies)
0 upgraded, 92 newly installed, 0 to remove and 0 not upgraded.
Need to get 184 MB of archives.
After this operation, 553 MB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu noble/main amd64 alsa-topology-conf all 1.2.5.1-2 [15.5 kB]
... (many more Get lines) ...
Fetched 184 MB in 8s (22.9 MB/s)
Extracting templates from packages: 100%
Selecting previously unselected package alsa-topology-conf.
(Reading database ...)
Preparing to unpack .../alsa-topology-conf_1.2.5.1-2_all.deb ...
Unpacking alsa-topology-conf (1.2.5.1-2) ...
... (many more unpacking steps) ...
Setting up libgraphite2-3:amd64 (1.3.14-2build1) ...
Setting up ... (many more setting up steps) ...
Setting up openjdk-11-jdk:amd64 (11.0.27+6~us1-0ubuntu1~24.04) ...
Processing triggers for ...
...

Once that completes, check the version to confirm the installation:

Command Line
java --version

You should see something similar to:

Outputopenjdk 11.0.27 2025-04-15
OpenJDK Runtime Environment (build 11.0.27+6-post-Ubuntu-0ubuntu124.04)
OpenJDK 64-Bit Server VM (build 11.0.27+6-post-Ubuntu-0ubuntu124.04, mixed mode, sharing)

#Step 3: Create a dedicated user for Hadoop

Next, create a dedicated user for Hadoop. This helps isolate permissions and makes management easier. Create a user named hadoop (you can use something else if you prefer):

Command Line
sudo adduser hadoop

You'll be prompted to set a password and provide some optional user information.

Outputinfo: Adding user `hadoop' ...
info: Selecting UID/GID from range 1000 to 59999 ...
info: Adding new group `hadoop' (1000) ...
info: Adding new user `hadoop' (1000) with group `hadoop (1000)' ...
info: Creating home directory `/home/hadoop' ...
info: Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
        Full Name []:
        Room Number []:
        Work Phone []:
        Home Phone []:
        Other []:
Is the information correct? [Y/n] Y
info: Adding new user `hadoop' to supplemental / extra groups `users' ...
info: Adding user `hadoop' to group `users' ...

Give the user sudo privileges:

Command Line
sudo usermod -aG sudo hadoop

Now switch to the hadoop user using:

Command Line
su - hadoop

Enter the password for the user when prompted. Your prompt should now reflect that you are logged in as hadoop.

OutputTo run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

hadoop@demo:~$

#Step 4: Configure passwordless SSH

Next, you need to set up passwordless SSH. Hadoop uses SSH to manage its services across nodes (or within a single node for pseudo-distributed mode, connecting to localhost).

Ubuntu servers usually have SSH client and server installed, you can confirm using the following command:

Confirm SSH client & server are installed using the following commands:

Command Line
ssh -V
OutputOpenSSH_9.6p1 Ubuntu-3ubuntu13.11, OpenSSL 3.0.13 30 Jan 2024
Command Line
systemctl status ssh
Output● ssh.service - OpenBSD Secure Shell server
     Loaded: loaded (/usr/lib/systemd/system/ssh.service; disabled; preset: enabled)
     Active: active (running) since Wed 2025-05-14 04:42:10 EEST; 31min ago
TriggeredBy: ● ssh.socket
       Docs: man:sshd(8)
             man:sshd_config(5)
   Main PID: 1076 (sshd)
      Tasks: 1 (limit: 4655)
     Memory: 4.1M (peak: 5.3M)
        CPU: 154ms
     CGroup: /system.slice/ssh.service
             └─1076 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"

May 14 04:42:10 demo systemd[1]: Starting ssh.service - OpenBSD Secure Shell server...
May 14 04:42:10 demo sshd[1076]: Server listening on :: port 22.
May 14 04:42:10 demo systemd[1]: Started ssh.service - OpenBSD Secure Shell server.
May 14 04:42:10 demo sshd[1079]: Connection closed by 188.214.133.131 port 37150
May 14 04:52:19 demo sshd[1125]: Accepted password for root from 102.90.80.130 port 2768 ssh2
May 14 04:52:19 demo sshd[1125]: pam_unix(sshd:session): session opened for user root(uid=0) by>
May 14 05:06:19 demo sshd[7904]: Accepted password for root from 102.90.80.130 port 2636 ssh2
May 14 05:06:19 demo sshd[7904]: pam_unix(sshd:session): session opened for user root(uid=0) by>

If it's not active or the service is not found, you'll need to install or start it.

You can install and enable SSH if not already installed using the following commands:

Command Line
sudo apt install openssh-server openssh-client -y
sudo systemctl enable ssh

Now, generate an SSH keypair:

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
OutputGenerating public/private rsa key pair.
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:kin50YdBqah1j/RG6KeGAF6zKIppkxEOU14vcBiuEuo hadoop@demo
The key's randomart image is:
+---[RSA 3072]----+
|  .o    ..       |
| .+ o  ..        |
|.o.+ o o.        |
|=+.o+.*+.o       |
|Bo+o====S .      |
|+=o. ooo=.       |
|+Eo. ..+         |
|++  . o          |
|. .  .           |
+----[SHA256]-----+

You need to add the generated public key to the list of authorized keys for the user hadoop so it can log into itself (localhost) without a password. Do this using:

Command Line
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Then set the necessary permissions for the file to make sure only the owner of the file can read and write to it:

Command Line
chmod 600 ~/.ssh/authorized_keys

Now verify the hadoop can SSH to localhost without being prompted for a password:

Command Line
ssh localhost

The first time you connect, you might see a message like "The authenticity of host 'localhost (127.0.0.1)' can't be established. ED25519 key fingerprint is SHA256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx. Are you sure you want to continue connecting (yes/no/[fingerprint])?" Type yes and press Enter. You should then be logged in without being asked for a password.

OutputThe authenticity of host 'localhost (127.0.0.1)' can't be established.
ED25519 key fingerprint is SHA256:ImVq/XOAViXspPZg1grGdz0E1Q8u1OZ9Cdsk45HAuWY.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'localhost' (ED25519) to the list of known hosts.
Welcome to Ubuntu 24.04.2 LTS (GNU/Linux 6.8.0-58-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/pro

 System information as of Mon May 12 20:47:26 EEST 2025

  System load:  0.0               Processes:             137
  Usage of /:   3.3% of 76.45GB   Users logged in:       1
  Memory usage: 8%                IPv4 address for eth0: 5.199.161.101
  Swap usage:   0%


Expanded Security Maintenance for Applications is not enabled.

0 updates can be applied immediately.

Enable ESM Apps to receive additional future security updates.
See https://ubuntu.com/esm or run: sudo pro status

*** System restart required ***
Last login: Mon May 12 20:29:21 2025 from 105.113.10.163

Once you've confirmed the passwordless login, exit the SSH session to return to your original prompt:

Command Line
exit

#Step 5: Download and extract Hadoop

Now, still as the hadoop user, you'll download and extract Apache Hadoop.

Open a web browser and go to the official Apache Hadoop Releases page. Look for the latest stable binary release and copy its address. Then, download it using wget.

Command Line
cd ~
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
Output--2025-05-12 20:55:06--  https://dlcdn.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz
Resolving dlcdn.apache.org (dlcdn.apache.org)... 151.101.2.132, 2a04:4e42::644
Connecting to dlcdn.apache.org (dlcdn.apache.org)|151.101.2.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 974002355 (929M) [application/x-gzip]
Saving to: ‘hadoop-3.4.1.tar.gz’

hadoop-3.4.1.tar.gz     100%[===============================>] 928.88M  72.4MB/s    in 12s

2025-05-12 20:55:26 (80.4 MB/s) - ‘hadoop-3.4.1.tar.gz’ saved [974002355/974002355]

Once the download is complete, extract the downloaded file:

Command Line
tar -xzf hadoop-*.tar.gz

It's common practice to place Hadoop in a standard location like /usr/local/. Move the extracted folder there and rename it to hadoop for simplicity.

Command Line
sudo mv hadoop-3.4.1 /usr/local/hadoop

Now ensure hadoop owns the /usr/local/hadoop directory and its contents:

Command Line
sudo chown -R hadoop:hadoop /usr/local/hadoop

#Step 6: Configure Hadoop environment variables

You need to set several environment variables so that your system and Hadoop can locate necessary files and configurations.

Open the .bashrc file for editing:

Command Line
nano ~/.bashrc

Add the following lines to the end of the file. These variables tell your system where your Hadoop installation resides and are used by Hadoop scripts.

”Hadoop
# Hadoop Environment Variables
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Next, you need to set the environment variables for Java to allow other applications to find it. To do so, access the~/.bashrcfile.

Command Line
sudo nano ~/.bashrc

Add the following lines, which will specify the JAVA_HOME environment variable.

Command Line
export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
export PATH=$PATH:$JAVA_HOME/bin

Save the changes and exit. Then source the ~/.bashrc file to apply the changes made.

Command Line
source ~/.bashrc

Be sure to verify that the JAVA_HOME environment variable has been correctly set.

Command Line
echo  $JAVA_HOME
Output/usr/lib/jvm/java-11-openjdk-amd64
``
Save and close the file (`Ctrl+X`, then `Y`, then `Enter`).

You can verify your Hadoop installation using:

```bash command
hadoop version
OutputHadoop 3.4.1
Source code repository https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1
Compiled by mthakur on 2024-10-09T14:57Z
Compiled on platform linux-x86_64
Compiled with protoc 3.23.4
From source with checksum 7292fe9dba5e2e44e3a9f763fce3e680
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.4.1.jar

Verify that HADOOP_HOME is set correctly:

Command Line
echo $HADOOP_HOME
Output/usr/local/hadoop

Next, you need to ensure that Hadoop knows the location of your Java installation. Now, explicitly set JAVA_HOME within Hadoop's configuration. Open the hadoop-env.sh file for editing:

Command Line
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Look for a line that starts with # export JAVA_HOME=. Uncomment it (remove the #) and set it to your JAVA_HOME path:

”JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

Save and close the file (Ctrl+X, then Y, then Enter).

#Step 7: Configure Hadoop XML Files for Pseudo-Distributed Mode

In this step, you'll configure the core XML files located in $HADOOP_HOME/etc/hadoop/. These files dictate how Hadoop functions in pseudo-distributed mode.

Start by creating the directories Hadoop will use for HDFS data storage:

Command Line
mkdir -p $HADOOP_HOME/hdfs/namenode
mkdir -p $HADOOP_HOME/hdfs/datanode

Then set the ownership of the directory and its contents to the hadoop user.

Command Line
sudo chown -R hadoop:hadoop $HADOOP_HOME/hdfs

Now, let's edit the XML configuration files.

  • Core-site.xml

    Open the core-site.xml file:

    Command Line
    nano $HADOOP_HOME/etc/hadoop/core-site.xml
    

    Replace the empty <configuration></configuration> tags with the following:

    ”core-site.xml”
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:9000</value>
            <description>The default file system URI</description>
        </property>
    </configuration>
    
    • hdfs://localhost:9000 tells Hadoop to use HDFS running on localhost at port 9000.
  • hdfs-site.xml

    Open the hdfs-site.xml file with the Nano editor:

    Command Line
    nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    

    Add the following between the <configuration> tags:

    ”hdfs-site.xml”
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
            <description>Default block replication.</description>
        </property>
        <property>
            <name>dfs.name.dir</name>
            <value>file:///usr/local/hadoop/hdfs/namenode</value>
            <description>Path on the local filesystem where the NameNode stores the namespace and transaction logs.</description>
        </property>
        <property>
            <name>dfs.data.dir</name>
            <value>file:///usr/local/hadoop/hdfs/datanode</value>
            <description>Path on the local filesystem where the DataNode stores its blocks.</description>
        </property>
    </configuration>
    

    This configuration sets up a single-replica HDFS environment using the data directories you created earlier.

  • mapred-site.xml

    Open the mapred-site.xml file for edit:

    Command Line
    nano $HADOOP_HOME/etc/hadoop/mapred-site.xml
    

    Replace the empty <configuration></configuration> tags with:

    ”mapred-site.xml”
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
            <description>The runtime framework for MapReduce. Can be local, classic or yarn.</description>
        </property>
        <property>
            <name>yarn.app.mapreduce.am.env</name>
            <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
        </property>
        <property>
            <name>mapreduce.map.env</name>
            <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
        </property>
        <property>
            <name>mapreduce.reduce.env</name>
            <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
        </property>
    </configuration>
    

    This enables MapReduce to run on YARN and sets the appropriate environment variables.

  • yarn-site.xml

    Finally, edit the yarn-site.xml file:

    Command Line
    nano $HADOOP_HOME/etc/hadoop/yarn-site.xml
    

    Add the following between the <configuration> tags:

    ”yarn-site.xml”
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
            <description>Auxilliary services required by the NodeManager.</description>
        </property>
    </configuration>
    

    This enables the shuffle service, which is necessary for running MapReduce jobs on YARN.

#Step 8: Format HDFS NameMode

Before HDFS can be used, the NameNode must be formatted. This step initializes the HDFS file system and sets up the directory structure defined in your hdfs-site.xml.

Still as hadoop, run:

Command Line
hdfs namenode -format

Only run this command once during the initial setup. Reformatting will erase all data stored in HDFS.

This will output several log messages. Toward the end, look for confirmation lines like: INFO common.Storage: Storage directory /usr/local/hadoop/data/namenode has been successfully formatted. And also: INFO namenode.NameNode: SHUTDOWN_MSG: Shutting down NameNode at ...

OutputWARNING: /usr/local/hadoop/logs does not exist. Creating.
2025-05-13 01:56:40,856 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = demo/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 3.4.1
STARTUP_MSG:   classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-5.2.0.jar:... (truncated for brevity) ...
STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r 4d7825309348956336b8f06a08322b78422849b1; compiled by 'mthakur' on 2024-10-09T14:57Z
STARTUP_MSG:   java = 11.0.27
************************************************************/
2025-05-13 01:56:40,873 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2025-05-13 01:56:41,072 INFO namenode.NameNode: createNameNode [-format]
2025-05-13 01:56:42,003 INFO namenode.NameNode: **Formatting using clusterid: CID-8e15e115-84e9-4400-8014-68ad4b72a38f**
2025-05-13 01:56:42,080 INFO namenode.FSEditLog: Edit logging is async:true
2025-05-13 01:56:42,142 INFO namenode.FSNamesystem: KeyProvider: null
2025-05-13 01:56:42,145 INFO namenode.FSNamesystem: fsLock is fair: true
2025-05-13 01:56:42,182 INFO namenode.FSNamesystem: fsOwner                = hadoop (auth:SIMPLE)
2025-05-13 01:56:42,183 INFO namenode.FSNamesystem: isPermissionEnabled    = true
... (configuration details and GSet info omitted for brevity) ...
2025-05-13 01:56:42,963 INFO namenode.FSImage: Allocated new BlockPoolId: BP-42078627-127.0.1.1-1747090602953
2025-05-13 01:56:42,999 INFO common.Storage: **Storage directory /usr/local/hadoop/hdfs/namenode has been successfully formatted.**
2025-05-13 01:56:43,056 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression
2025-05-13 01:56:43,263 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2025-05-13 01:56:43,342 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at demo/127.0.1.1
************************************************************/

#Step 9: Start Hadoop services

With the configuration complete, it’s time to start the core Hadoop services. Hadoop provides handy scripts to start both HDFS and YARN services.

Run the following script to start the NameNode, DataNode, and SecondaryNameNode:

Command Line
start-dfs.sh
OutputStarting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [demo]
demo: Warning: Permanently added 'demo' (ED25519) to the list of known hosts.

Next, start the ResourceManager and NodeManager:

Command Line
start-yarn.sh
OutputStarting resourcemanager
Starting nodemanagers

After running both scripts, your Hadoop services should be up and running.

#Step 10: Verify that Hadoop daemons are running

To confirm that all Hadoop services started successfully, use the jps command. This tool lists all Java processes currently running on the system.

Command Line
jps

You should see output similar to this (Process IDs will vary):

Output9300 NameNode
10373 Jps
9435 DataNode
10029 NodeManager
9885 ResourceManager
9647 SecondaryNameNode

Seeing these processes confirms that Hadoop is running correctly.

#Step 11: Access Web UIs

Hadoop provides built-in web interfaces to help you monitor the health and activity of your cluster. You can view real-time status, nodes, file system usage, and running applications.

Open a web browser and navigate to the Hadoop web interfaces:

  • If you’re running Hadoop locally on your own machine, use:

  • If your Hadoop is running on a remote server (like a Cherry Servers instance), replace localhost with your server’s public IP address:

    • HDFS NameNode UI: http://<server-ip>:9870

    • YARN ResourceManager UI: http://<server-ip>:8088

Once you open these URLs, you should see dashboards showing cluster health, node status, and running applications (initially empty if no jobs have been submitted yet).

NameNode UI

Reseource Manager UI

#Step 12: Stopping Hadoop services

When you're done, you can stop the Hadoop services.

Stop the YARN daemons by running:

Command Line
stop-yarn.sh

You should see something like:

OutputStopping nodemanagers
Stopping resourcemanager

Stop the HDFS daemons using:

Command Line
stop-dfs.sh

Expected output:

OutputStopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [demo]

Run jps again to confirm that all Hadoop processes have been terminated:

jps

You should only see the Jps process in the output, like this:

Output18704 Jps

#Conclusion

You’ve now successfully installed and configured Apache Hadoop in pseudo-distributed mode on your Ubuntu 24.04 server. From installing dependencies and configuring environment variables to starting the core services and verifying everything with the web UIs, you now have a functional mini-cluster running on a single machine. This setup is ideal for learning how Hadoop works under the hood and for running smaller-scale MapReduce jobs without the need for multiple servers.

Now that you’re familiar with the core components, why not try setting up a multi-node Hadoop cluster? This will give you hands-on experience with real-world distributed data processing and node-to-node communication.

Cloud VPS Hosting

Starting at just $3.24 / month, get virtual servers with top-tier performance.

Share this article

Related Articles

Published on Jun 7, 2021 Updated on Jun 29, 2022

AlmaLinux Review: a CentOS Clone Supported by CloudLinux

AlmaLinux is an open-source Linux distribution focused on long-term stability, that is a 1:1 binary compatible fork of Red Hat Enterprise Linux (RHEL)

Read More
Published on May 31, 2022 Updated on May 5, 2023

A Complete Guide to Linux Bash History

Learn how to work with Bash history to become more efficient with any modern *nix operating system.

Read More
Published on Jan 26, 2022 Updated on Jun 15, 2023

How to Use Cron to Automate Linux Jobs on Ubuntu 20.04

Learn how to use Cron - the most popular Linux workload automation tool that is widely used in Linux community - to automate Linux jobs on Ubuntu 20.04.

Read More
We use cookies to ensure seamless user experience for our website. Required cookies - technical, functional and analytical - are set automatically. Please accept the use of targeted cookies to ensure the best marketing experience for your user journey. You may revoke your consent at any time through our Cookie Policy.
build: c918ba4bc.1319