Robson Sampaio

Skills

ETL

Using Python scripts for extraction, processing, and loading into a database

Data Modeling

Modeling and creating database infrastructure using the most commonly used techniques in the market

Data Science

Building predictive and descriptive models to bring intelligence to decision-making

Technologies

Main technologies that i work with:

Apache Airflow is a workflow platform for scheduling, monitoring, and managing complex data processes.

Python is a high-level, interpreted, general-purpose programming language, known for its simple syntax and readability.

Git is a widely used version control system for tracking changes in source code during development.

Docker is an open-source platform that enables the creation, deployment, and execution of applications in containers.

Ubuntu is an open-source operating system based on Linux, developed and maintained by Canonical.

AWS is a cloud computing platform offered by Amazon.

My Projects

Streaming Unstructured Data Pipeline Architecture

Project implements a scalable data pipeline architecture that combines Apache Spark's processing capabilities with AWS services for data storage, cataloging, and analysis. The pipeline enables efficient processing of multiple data formats and provides the possibility of various visualization options for data analysis.

ETL of Mobility Data from Prefeitura de Belo Horizonte

This ETL project was developed within 72 hours with the objective of extracting, transforming, and loading urban mobility data provided by the Prefeitura de Belo Horizonte. Using modern technologies like MinIO for data storage, PostgreSQL as the Data Warehouse, and Python with Pandas for ETL processes, the project ensures efficient integration and continuous database documentation. The infrastructure is managed with Docker, ensuring portability and ease of deployment. Additionally, the use of pre-commit and MKDocs enhances the quality and documentation of the code.

Versioned PostgreSQL DB Management with Alembic and SQLAlchemy

This Python project involves creating a versioned PostgreSQL database using Alembic and SQLAlchemy. The aim is to manage database migrations efficiently by leveraging Alembic's capabilities for handling schema changes in a version-controlled manner. SQLAlchemy is utilized for its powerful ORM (Object-Relational Mapping) features, allowing seamless interaction with the PostgreSQL database. This setup ensures that database schema modifications are tracked and applied consistently across different environments, facilitating robust and scalable database management.

Soft Skills That Make a Difference in Data Engineering

Hey there! I've learned something very important: being good at technical skills isn't enough. You need to have strong soft skills - they're the sugar in the coffee, the strawberry on the chocolate cake. They're what make your hard skills truly useful.

Hello, World!

My name is Robson, and I currently work as a Data Engineer. Previously, I was a machining mechanic and a plastic mold toolmaker. I’ll share my journey and how I got here, marking this as my "hello, world" for this blog (every programmer knows that if you don’t follow this tradition, you won’t have good luck).

Git and GitHub

Version control is one of the main pillars when it comes to programming. Saving your projects only on your computer is like shooting an arrow upwards and hoping not to hit your own foot! You, Junior, need to champion this with all your might to avoid becoming a Senior who leaves crucial scripts for the company in the "Documents" folder on their computer.

Challenges and Learnings of a Junior Data Engineer: A Real Journey

Hey everyone! Today I want to share something really honest about my life as a junior data engineer. You know that idea that everything is perfect in the first years of your career? Well, the reality is quite different.

Hello! I'm Robson.