A data engineering pipeline with ec2 and s3 of AWS

خانه » مهندسی داده » A data engineering pipeline with ec2 and s3 of AWS

Here is how I have launched a simple but effective data pipeline on AWS just with an EC2 micro instance and one tiny S3 bucket (but behind the scene there are a lot of codes and bash files 😎):

Create a bucket and launch one ec2

0- Go to s3 panel and create a bucket, for example “files-1234”.

1- Launch a t2.micro EC2 instance with Ubuntu OS and name it ETLProject

2- Select instance and then click on Actions > Security> Modify IAM Role. Create a role with admin or read-only access to s3. If you want to transfer files later from ec2 to s3 it’s better to grant admin permission or read-write permission.

Operations on our PC

3- Connect to it from personal computer cli by running a command like this. First, cd to the directory of ETLProject.pem and then

$ chmod 400 ETLProject.pem

$ ssh -i "ETLProject.pem" ubuntu@ec2-3-71-182-555.eu-east-1.compute.amazonaws.com

Note that “ETLProject.pem” file should be created at the time when we’re launching our instance.

Install required tools

4- Now run these commands to install aws-cli and python

$ sudo apt update

$ sudo apt install python3 pandas

$ sudo snap install aws-cli --classic

Copy files from our PC to aws

5- From our computer’s cli which has been configured by the AWS cli credentials run this command to copy our desired files and scripts to s3

$ aws s3 cp <Fully Qualified Local filename> s3://files-1234

$ for f in *.py; do aws s3 cp $f s3://files-1234; done

Setup cronjobs

6- Since I’m going to schedule a task, I go for cron jobs. So in the instance cli I run

$ sudo usermod -a -G crontab $(whoami)

Copy files from our aws to ec2

7- Then I copy some scripts that are in a s3 bucket to this instance by running:

$ aws s3 cp s3://files-1234/script.py .

$ aws s3 cp s3://files-1234/script.sh .

The above command will be copying the script.py and script.sh to the current directory of my instance

Schedule your job

8- Now schedule your job like this:

$ crontab -e

and then

*/5 * * * * /home/ubuntu/SOME-DIRECTORY/script.sh

That’s it. Now your script will do its job. After one day or one week you can see the results and copy the output again to s3 by

$ aws s3 cp /home/ubuntu/SOME-DIRECTORY s3://files-1234

Or you can write a bash script again to copy the final result for you.

A data engineering pipeline with ec2 and s3 of AWS

Create a bucket and launch one ec2

Operations on our PC

Install required tools

Copy files from our PC to aws

Setup cronjobs

Copy files from our aws to ec2

Schedule your job

سایر مقالات مجموعه:

پست‌های مرتبط با این مقاله:

۱۰ ایده برای انجام پروژه‌های مهندسی داده

مهندسی داده چیست؟ همراه با معرفی کورس و کتاب

جاوا برای مهندسی داده

13 ترفند از داکر که نمی‌دانستید

معرفی دو کورس فوق‌العاده و رایگان از کوبرنتیز

دیدگاهتان را بنویسید لغو پاسخ

محصولات و خدمات ما

مقالات کلیدی

کد تخفیف نیاز دارید؟

کسب درآمد

ارتباط مستقیم