Linux Commands for Data Engineering
Introduction
As a data engineer, proficiency in Linux commands is crucial for managing data pipelines, handling files, and working with distributed systems. In most companies, data engineering applications are run on the cloud computing platforms offered by AWS, Azure, GCP, etc. On these platforms, we get machines with Linux installed in them. The reason Linux is widely preferred for such applications is because it is lightweight, easy to install, and easy to maintain. These machines can be accessed using a command line interface (CLI) on a terminal. All the common tasks like creating a folder, copying a file, deleting files or folders, etc., will be carried out through this CLI. Here is where Linux commands are used.
Commands
For all these examples, I am considering my username to be sushrut
. Also, note that the commands are what appears after ~$
.
Clearing the Screen
This is a very common command. Whenever your terminal is full of text, which is typically the result of running a lot of commands, you can clear the terminal screen using the following:
~$ clear
Printing the current working directory
The following command prints the path of the current working directory:
~$ pwd
Here, pwd
stands for “print working directory”.
Listing all the content inside a directory
Say when you run
~$ pwd
you get the output as
/home/sushrut
This means that you are currently inside the directory titled sushrut
, which is the home directory for this user. Now say this directory has the following directories and files inside it:
home/
├─ sushrut/
│ ├─ projects/
│ │ ├─ calculator_app/
│ │ │ ├─ app.py
│ │ │ ├─ src/
│ │ │ │ ├─ config.py
│ ├─ documents/
│ │ ├─ resume/
│ │ │ ├─ amazon.pdf
│ │ │ ├─ google.pdf
│ ├─ README.md
Now if you want to list all the content inside the sushrut
directory, you can simply run the following command:
~$ ls
The output of this command will be the following:
projects documents README.md
Notice that this command does not print nested content.
Changing the current working directory
Again, consider that you are inside the same directory titled sushrut
with the same structure. Now if you want to change your working directory to calculator_app
(in terms of Windows, think of it as going inside the calculator_app
folder), you will simply run the following command:
~$ cd projects/calculator_app
Once you run this, you will start seeing projects/calculator_app
after the ~
sign and before the $
sign in your terminal, i.e.,
~/projects/calculator_app$
This means that when there was nothing between these two symbols, you were inside the user’s home directory (recall that the user is sushrut
).
Now if you list all the items in this directory using the following command:
~/projects/calculator_app$ ls
you will get the output as
app.py src
Now if you want to go to the parent directory of the current directory you are in (i.e., to the directory projects
), you can run the following command:
~/projects/calculator_app$ cd ..
This will take you to the projects
directory, which will also be indicated in your terminal in the following way:
~/projects$
You can think of this as clicking on the “Back” button of your Windows Explorer. If you would have run the following command:
~/projects/calculator_app$ cd ~
you would have directly gone back to your home directory, i.e.,
~$
Creating Directories
🚧 Work in Progress: This blog post is currently being written. Some sections are complete, while others are still under construction. Feel free to explore and check back later for updates!
References
Enjoy Reading This Article?
Here are some more articles you might like to read next: