@nataindata Instagram profile, stories

Download

Hey, I’m a Senior Data Engineer at TripAdvisor (ex PepsiCo) with more than 7 years dealing with Data. Completely self-taught, I’ve started with 6-month bootcamp (thought I’m gonna be a Python developer, lol) and then picked up everything else on the go. I have a community of over 80,000 data and AI enthusiasts ❤️ My content is mostly memes and educational videos (in funny and relatable format), covering many trendy topics. You can share your thoughts, doubts and ask for a help, cause my people are amazing and always ready to help ❤️ My road was bumpy… I remember 2018: I’m staring at my laptop 💻 ‘What even is Apache Kafka?’ ‘Do I need to learn Java?’ ‘Hadoop is essential!? No wait, learn dbt!’ I felt like I was drowning in tutorials... But you CAN become a Data Engineer: - FASTER - EASIER - MENTORED If you want to start your career as a Data Engineer: Drop the 🔥 in the comments and I’ll send you the details

281 341

10 months ago

Download

⚡️ Why Spotify Wrapped is a smart*$$ in processing petabytes of data? (My previous reel went completely viral (3 Million of Views!) and you voted for a detailed explanation) 🍬 SPOILER: they’ve decreased 50% of their cloud costs with this! Spotify Wrapped is a giant distributed ETL pipeline, that uses a technique called Sort Merge Bucket (SMB) join. Spotify uses 3 main data sources for Wrapped: - Streaming activity 🎧 - User metadata 👶 - Streaming context ⏰ Tech stack: GCP platform with Scala based Dataflow, Avro 🧃 Here is the juice: These sources are converted to SMB format, which is bucketing and sorting data by user_id SMB is a technique where: 1. Bucketing: Data (usually the join column) is divided into smaller parts called buckets 2. Sorting: The data within each bucket is then sorted 3. Merging: When combining two datasets, like matching users with their listening history, SMB speeds things up because both datasets are **already bucketed and sorted**. It’s joined using a merge-sort algorithm, which is faster than traditional join methods Small tweak here: - The sortMergeTransform function is used to combine the 3 data sources, reading each one keyed by user_id. - This allows Spotify to join roughly 1PB of data without using conventional shuffle or Bigtable. 😮‍💨 The rest is simple: Smaller jobs aggregate a week’s or day’s worth of data for each user. And then weekly partitions are aggregated into 1 year’s worth of data. ⚡️ This ended up being a huge cost savings , we managed to join roughly a total of 1PB data without using conventional shuffle or Bigtable! 🏷️ sql, data, spotify, big data, database, #dataengineering, gcp, google cloud, python programming

13.5k 71

5 months ago

Download

tools and tech I use Data & AI girlie. Also MCP servers I looove ❤️

1,610 24

6 months ago

Download

Anyone else feeling like this? not burnt out. not thriving. just… somewhere in the middle, running on caffeine and vibes and the constant feeling that if i don’t lock in RIGHT NOW i’m going to fall behind forever. the AI stuff moves so fast that “catching up” is basically a full-time job on top of your actual full-time job. I keep telling myself I’ll touch grass after I ship this one thing. that one thing keeps multiplying. if you’re in the same spiral: the gym guilt, the protein fixation, the claude-maxxing, the “I should really sleep” - just know it’s not just you. we’re all just trying to optimize our token usage and our lives at the same time. Thoughts?

428 16

2 days ago

Download

this is what I lived for.

15.3k 118

6 days ago

$I finally read the actual paper. And here are my thoughts:I’ve read “Attention is all you need”, not a summary, not a YouTube explainer.8 researchers at Google were trying to fix machine translation and got annoyed by a very specific problem:-> Older models, called RNNs, had to read text the way a very slow person reads a book. One word at a time. Left to right. They could not skip ahead, could not look back efficiently, and the longer the sentence, the worse they got at remembering the beginning of it.Imagine trying to understand a 200-word sentence but your brain erases what you read three seconds ago. That was the architecture powering state-of-the-art AI in 2016.So the researchers removed it entirely.The Transformer they built lets every single word look at every other word at the same time. Think of it less like reading a book and more like spreading all the pages on a table and seeing the whole story at once. That mechanism is called self-attention, and it is the core of the paper.Then they ran that process not once but 8 times in parallel, with each run learning different kinds of relationships. One head might learn grammar. Another might learn who “it” refers to in a sentence (It is usually NOT obvious.). They called this multi-head attention.And since the model no longer processes words in order, they had to tell it where each word sits in the sequence. They did that with positional encodings, basically injecting a signal built from sine and cosine waves into the data so the model knows word 1 from word 47.The result? Trained in 12 hours on 8 GPUs. Beat every previous model on translation benchmarks. At a fraction of the cost.It reads like eight very annoyed engineers optimizing a bottleneck on a Tuesday.And yet. GPT, Claude, Gemini, every LLM you used this week all running on the exact same core idea from that 11-page paper.Insane, huh?$

Download

I finally read the actual paper. And here are my thoughts: I’ve read “Attention is all you need”, not a summary, not a YouTube explainer. 8 researchers at Google were trying to fix machine translation and got annoyed by a very specific problem: -> Older models, called RNNs, had to read text the way a very slow person reads a book. One word at a time. Left to right. They could not skip ahead, could not look back efficiently, and the longer the sentence, the worse they got at remembering the beginning of it. Imagine trying to understand a 200-word sentence but your brain erases what you read three seconds ago. That was the architecture powering state-of-the-art AI in 2016. So the researchers removed it entirely. The Transformer they built lets every single word look at every other word at the same time. Think of it less like reading a book and more like spreading all the pages on a table and seeing the whole story at once. That mechanism is called self-attention, and it is the core of the paper. Then they ran that process not once but 8 times in parallel, with each run learning different kinds of relationships. One head might learn grammar. Another might learn who “it” refers to in a sentence (It is usually NOT obvious.). They called this multi-head attention. And since the model no longer processes words in order, they had to tell it where each word sits in the sequence. They did that with positional encodings, basically injecting a signal built from sine and cosine waves into the data so the model knows word 1 from word 47. The result? Trained in 12 hours on 8 GPUs. Beat every previous model on translation benchmarks. At a fraction of the cost. It reads like eight very annoyed engineers optimizing a bottleneck on a Tuesday. And yet. GPT, Claude, Gemini, every LLM you used this week all running on the exact same core idea from that 11-page paper. Insane, huh?

11.0k 60

7 days ago

Download

I’m somewhat of a coder myself

837 43

8 days ago

Download

reduced my token baseline by 15% after this one-time set up 5 easy once-and-forget fixes for your Claude Code. * this is “weird things I do to stay ahead of AI” series Btw, I’ve run a test to compare before and after, and confirmed 15% reduction in tokens

165 10

14 days ago

Download

Ep.1: 3 things that actually eat your tokens in Claude Code * Welcome to my series “Strange things I do to stay ahead of AI” -> Follow my journey of becoming someone AI can’t replace

201 7

16 days ago