Linux Commands for Data Engineering

Introduction

As a data engineer, proficiency in Linux commands is crucial for managing data pipelines, handling files, and working with distributed systems. In most companies, data engineering applications are run on the cloud computing platforms offered by AWS, Azure, GCP, etc. On these platforms, we get machines with Linux installed in them. The reason Linux is widely preferred for such applications is because it is lightweight, easy to install, and easy to maintain. These machines can be accessed using a command line interface (CLI) on a terminal. All the common tasks like creating a folder, copying a file, deleting files or folders, etc., will be carried out through this CLI. Here is where Linux commands are used.

Commands

For all these examples, I am considering my username to be sushrut. Also, note that the commands are what appears after ~$ .

Clearing the Screen

This is a very common command. Whenever your terminal is full of text, which is typically the result of running a lot of commands, you can clear the terminal screen using the following:

~$ clear

Printing the current working directory

The following command prints the path of the current working directory:

~$ pwd

Here, pwd stands for “print working directory”.

Listing all the content inside a directory

Say when you run

~$ pwd

you get the output as

/home/sushrut

This means that you are currently inside the directory titled sushrut, which is the home directory for this user. Now say this directory has the following directories and files inside it:

home/
├─ sushrut/
│  ├─ projects/
│  │  ├─ calculator_app/
│  │  │  ├─ app.py
│  │  │  ├─ src/
│  │  │  │  ├─ config.py
│  ├─ documents/
│  │  ├─ resume/
│  │  │  ├─ amazon.pdf
│  │  │  ├─ google.pdf
│  ├─ README.md

Now if you want to list all the content inside the sushrut directory, you can simply run the following command:

~$ ls

The output of this command will be the following:

projects  documents  README.md

Notice that this command does not print nested content.

Changing the current working directory

Again, consider that you are inside the same directory titled sushrut with the same structure. Now if you want to change your working directory to calculator_app (in terms of Windows, think of it as going inside the calculator_app folder), you will simply run the following command:

~$ cd projects/calculator_app

Once you run this, you will start seeing projects/calculator_app after the ~ sign and before the $ sign in your terminal, i.e.,

~/projects/calculator_app$

This means that when there was nothing between these two symbols, you were inside the user’s home directory (recall that the user is sushrut).

Now if you list all the items in this directory using the following command:

~/projects/calculator_app$ ls

you will get the output as

app.py  src

Now if you want to go to the parent directory of the current directory you are in (i.e., to the directory projects), you can run the following command:

~/projects/calculator_app$ cd ..

This will take you to the projects directory, which will also be indicated in your terminal in the following way:

~/projects$

You can think of this as clicking on the “Back” button of your Windows Explorer. If you would have run the following command:

~/projects/calculator_app$ cd ~

you would have directly gone back to your home directory, i.e.,

~$

Creating Directories

🚧 Work in Progress: This blog post is currently being written. Some sections are complete, while others are still under construction. Feel free to explore and check back later for updates!

References




      Enjoy Reading This Article?

      Here are some more articles you might like to read next:

    • Introduction to APIs
    • Feedforward Neural Networks
    • Network of Sigmoid Neurons
    • Probability
    • Notation Reference For My Posts