Hey, Iām a Senior Data Engineer at TripAdvisor (ex PepsiCo) with more than 7 years dealing with Data.
Completely self-taught, Iāve started with 6-month bootcamp (thought Iām gonna be a Python developer, lol) and then picked up everything else on the go.
I have a community of over 80,000 data and AI enthusiasts ā¤ļø
My content is mostly memes and educational videos (in funny and relatable format), covering many trendy topics.
You can share your thoughts, doubts and ask for a help, cause my people are amazing and always ready to help ā¤ļø
My road was bumpy⦠I remember
2018: Iām staring at my laptop š»
āWhat even is Apache Kafka?ā
āDo I need to learn Java?ā
āHadoop is essential!? No wait, learn dbt!ā
I felt like I was drowning in tutorials...
But you CAN become a Data Engineer:
- FASTER
- EASIER
- MENTORED
If you want to start your career as a Data Engineer:
Drop the š„ in the comments and Iāll send you the details
ā”ļø Why Spotify Wrapped is a smart*$$ in processing petabytes of data?
(My previous reel went completely viral (3 Million of Views!) and you voted for a detailed explanation)
š¬Ā SPOILER: theyāve decreased 50% of their cloud costs with this!
Spotify Wrapped is a giant distributed ETL pipeline, that uses a technique called Sort Merge Bucket (SMB) join.
Spotify uses 3 main data sources for Wrapped:
- Streaming activity š§
- User metadata š¶
- Streaming context ā°
Tech stack: GCP platform with Scala based Dataflow, Avro
š§Ā Here is the juice: These sources are converted to SMB format, which is bucketing and sorting data by user_id
SMB is a technique where:
1. Bucketing: Data (usually the join column) is divided into smaller parts called buckets
2. Sorting: The data within each bucket is then sorted
3. Merging: When combining two datasets, like matching users with their listening history, SMB speeds things up because both datasets are **already bucketed and sorted**. Itās joined using a merge-sort algorithm, which is faster than traditional join methods
Small tweak here:
- The sortMergeTransform function is used to combine the 3 data sources, reading each one keyed by user_id.
- This allows Spotify to join roughly 1PB of data without using conventional shuffle or Bigtable.
š®āšØĀ The rest is simple: Smaller jobs aggregate a weekās or dayās worth of data for each user.
And then weekly partitions are aggregated into 1 yearās worth of data.
ā”ļø This ended up being a huge cost savings , we managed to join roughly a total of 1PB data without using conventional shuffle or Bigtable!
š·ļø sql, data, spotify, big data, database, #dataengineering, gcp, google cloud, python programming
Anyone else feeling like this?
not burnt out. not thriving. just⦠somewhere in the middle, running on caffeine and vibes and the constant feeling that if i donāt lock in RIGHT NOW iām going to fall behind forever.
the AI stuff moves so fast that ācatching upā is basically a full-time job on top of your actual full-time job.
I keep telling myself Iāll touch grass after I ship this one thing.
that one thing keeps multiplying.
if youāre in the same spiral: the gym guilt, the protein fixation, the claude-maxxing, the āI should really sleepā - just know itās not just you.
weāre all just trying to optimize our token usage and our lives at the same time.
Thoughts?
I finally read the actual paper. And here are my thoughts:
Iāve read āAttention is all you needā, not a summary, not a YouTube explainer.
8 researchers at Google were trying to fix machine translation and got annoyed by a very specific problem:
-> Older models, called RNNs, had to read text the way a very slow person reads a book. One word at a time. Left to right. They could not skip ahead, could not look back efficiently, and the longer the sentence, the worse they got at remembering the beginning of it.
Imagine trying to understand a 200-word sentence but your brain erases what you read three seconds ago. That was the architecture powering state-of-the-art AI in 2016.
So the researchers removed it entirely.
The Transformer they built lets every single word look at every other word at the same time. Think of it less like reading a book and more like spreading all the pages on a table and seeing the whole story at once. That mechanism is called self-attention, and it is the core of the paper.
Then they ran that process not once but 8 times in parallel, with each run learning different kinds of relationships. One head might learn grammar. Another might learn who āitā refers to in a sentence (It is usually NOT obvious.). They called this multi-head attention.
And since the model no longer processes words in order, they had to tell it where each word sits in the sequence. They did that with positional encodings, basically injecting a signal built from sine and cosine waves into the data so the model knows word 1 from word 47.
The result? Trained in 12 hours on 8 GPUs. Beat every previous model on translation benchmarks. At a fraction of the cost.
It reads like eight very annoyed engineers optimizing a bottleneck on a Tuesday.
And yet. GPT, Claude, Gemini, every LLM you used this week all running on the exact same core idea from that 11-page paper.
Insane, huh?
reduced my token baseline by 15% after this one-time set up
5 easy once-and-forget fixes for your Claude Code.
* this is āweird things I do to stay ahead of AIā series
Btw, Iāve run a test to compare before and after, and confirmed 15% reduction in tokens
Ep.1: 3 things that actually eat your tokens in Claude Code
* Welcome to my series āStrange things I do to stay ahead of AIā ->
Follow my journey of becoming someone AI canāt replace