Madhukar's Blog
About me
Category: scala
Understanding Spark Connect API - Part 5: Dataframe Sharing Across Spark Sessions
Understanding Spark Connect API - Part 4: PySpark Example
Understanding Spark Connect API - Part 3: Scala API Example
Understanding Spark Connect API - Part 2: Introduction to Architecture
Understanding Spark Connect API - Part 1: Shortcomings of Spark Driver Architecture
Latest Java Features from a Scala Dev Perspective - Part 5: Java Streams
Latest Java Features from a Scala Dev Perspective - Part 4: Higher Order Functions
Latest Java Features from a Scala Dev Perspective - Part 3: Functional Interfaces
Latest Java Features from a Scala Dev Perspective - Part 2: Lambda Expressions
Latest Java Features from a Scala Dev Perspective - Part 1: Type Inference
Introduction to Spark 3.0 - Part 10 : Ignoring Data Locality in Spark
Data Source V2 API in Spark 3.0 - Part 6 : MySQL Source
Introduction to Spark 3.0 - Part 9 : Join Hints in Spark SQL
Introduction to Spark 3.0 - Part 8 : DataFrame Tail Function
Adaptive Query Execution in Spark 3.0 - Part 2 : Optimising Shuffle Partitions
Adaptive Query Execution in Spark 3.0 - Part 1 : Introduction
Spark Plugin Framework in 3.0 - Part 5: RPC Communication
Spark Plugin Framework in 3.0 - Part 4 : Custom Metrics
Spark Plugin Framework in 3.0 - Part 3 : Dynamic Stream Configuration using Driver Plugin
Introduction to Spark 3.0 - Part 7 : Dynamic Allocation Without External Shuffle Service
Spark Plugin Framework in 3.0 - Part 2 : Anatomy of the API
Spark Plugin Framework in 3.0 - Part 1: Introduction
Introduction to Spark 3.0 - Part 6 : Min and Max By Functions
Introduction to Spark 3.0 - Part 5 : Easier Debugging of Cached Data Frames
Introduction to Spark 3.0 - Part 4 : Handling Class Imbalance Using Weights
Data Source V2 API in Spark 3.0 - Part 5 : Anatomy of V2 Write API
Introduction to Spark 3.0 - Part 3 : Data Loading From Nested Folders
Introduction to Spark 3.0 - Part 2 : Multiple Column Feature Transformations in Spark ML
Introduction to Spark 3.0 - Part 1 : Multi Character Delimiter in CSV Source
Data Source V2 API in Spark 3.0 - Part 4 : In-Memory Data Source with Partitioning
Data Source V2 API in Spark 3.0 - Part 3 : In-Memory Data Source
Data Source V2 API in Spark 3.0 - Part 2 : Anatomy of V2 Read API
Data Source V2 API in Spark 3.0 - Part 1 : Motivation for New Abstractions
Scala Integration Testing with TestContainers Library
Writing Apache Spark Programs in JavaScript
Experiments with GraalVM - Part 5 :Passing Scala Object to JavaScript
Experiments with GraalVM - Part 4 : JavaScript Object to Case Class
Experiments with GraalVM - Part 3 : Invoke JS Functions from JVM
Experiments with GraalVM - Part 2 : Polyglot JavaScript Hello World
Experiments with GraalVM - Part 1 : Introduction
Scala Magnet Pattern
ClickHouse Clustering for Spark Developer
Data Modeling in Apache Spark - Part 2 : Working With Multiple Dates
Data Modeling in Apache Spark - Part 1 : Date Dimension
Dynamic Shuffle Partitions in Spark SQL
Scala Developer Journey into Rust - Part 7 : Type Classes
Scala Developer Journey into Rust - Part 6 : Traits
Multi Source Data Analysis using Spark and Tellius : Meetup Video
Migrating to Spark 2.4 Data Source API
Multiple Column Feature Transformations in Spark ML
Parallel Cross Validation in Spark
Scala Developer Journey into Rust - Part 5: Domain Models
Scala Developer Journey into Rust - Part 4: Algebraic Data Types
Scala Developer Journey into Rust - Part 3: Expression Based Language
Scala Developer Journey into Rust - Part 2 : Type Inference
Scala Developer Journey into Rust - Part 1 : Introduction
Spark on Kubernetes : Native Kubernetes Integration for Spark
Exploring Spark DataSource V2 - Part 8 : Transactional Writes
Exploring Spark DataSource V2 - Part 7 : Meetup Talk
Exploring Spark DataSource V2 - Part 6 : Anatomy of V2 Write API
Exploring Spark DataSource V2 - Part 5 : Filter Push
Exploratory Data Analysis in Spark with Jupyter
Exploring Spark DataSource V2 - Part 4 : In-Memory DataSource with Partitioning
Exploring Spark DataSource V2 - Part 3 : In-Memory DataSource
Exploring Spark DataSource V2 - Part 2 : Anatomy of V2 Read API
Exploring Spark Data Source V2 - Part 1 : Limitations of Data Source V1 API
Converting Spark ML Vector to Numpy Array
Introduction to Spark Structured Streaming - Part 15: Meetup Talk on Time and Window API
Class Imbalance in Credit Card Fraud Detection - Part 3 : Undersampling in Spark
Class Imbalance in Credit Card Fraud Detection - Part 2 : Undersampling in Python
Class Imbalance in Credit Card Fraud Detection - Part 1 : Understanding Effect on Model Accuracy
Analysing Kaggle Titanic Survival Data using Spark ML
Introduction to Spark Structured Streaming - Part 14 : Session Windows using Custom State
Introduction to Spark Structured Streaming - Part 13: Meetup Talk
Introduction to Spark Structured Streaming - Part 12 : Watermarks
Introduction to Spark Structured Streaming - Part 11 : Event Time
Introduction to Spark Structured Streaming - Part 10 : Ingestion Time
Introduction to Spark Structured Streaming - Part 9 : Processing Time Window
Introduction to Spark Structured Streaming - Part 8 : Time Abstraction
Introduction to Spark Structured Streaming - Part 7 : Checkpointing State
Introduction to Spark Structured Streaming - Part 6 : Stream Enrichment using Static Data Join
Introduction to Spark Structured Streaming - Part 5 : File Streams
Introduction to Spark Structured Streaming - Part 4 : Stateless Aggregations
Introduction to Spark Structured Streaming - Part 3 : Stateful WordCount
Introduction to Spark Structured Streaming - Part 2 : Source and Sinks
Introduction to Spark Structured Streaming - Part 1 : DataFrame Abstraction to Stream
Migrating to Spark 2.0 - Part 10 : Second Meetup Talk
Migrating to Spark 2.0 - Part 9 : Hive Integration
Migrating to Spark 2.0 - Part 8 : Catalog API
Migrating to Spark 2.0 - Part 7 : SubQueries
Migrating to Spark 2.0 - Part 6 : Spark ML Transformer API
Migrating to Spark 2.0 - Part 5 : Meetup Talk
Migrating to Spark 2.0 - Part 4 : Cross Joins
Migrating to Spark 2.0 - Part 3 : DataFrame to Dataset
Scalable Spark Deployment using Kubernetes - Part 9 : Service Update and Rollback
Scalable Spark Deployment using Kubernetes - Part 8 : Meetup Talk
Migrating to Spark 2.0 - Part 2 : Built-in CSV Connector
Migrating to Spark 2.0 - Part 1 : Scala Version and Dependencies
Scalable Spark Deployment using Kubernetes - Part 7 : Dynamic Scaling and Namespaces
Scalable Spark Deployment using Kubernetes - Part 6 : Building Spark 2.0 Two Node Cluster
Scalable Spark Deployment using Kubernetes - Part 5 : Building Spark 2.0 Docker Image
Scalable Spark Deployment using Kubernetes - Part 4 : Service Abstractions
Scalable Spark Deployment using Kubernetes - Part 3 : Kubernetes Abstractions
Scalable Spark Deployment using Kubernetes - Part 2 : Installing Kubernetes Locally using Minikube
Scalable Spark Deployment using Kubernetes - Part 1 : Introduction to Kubernetes
Statistical Data Exploration using Spark 2.0 - Part 3 : Outlier Detection using Quantiles
Statistical Data Exploration using Spark 2.0 - Part 2 : Shape of Data with Histograms
Statistical Data Exploration using Spark 2.0 - Part 1 : Five Number Summary
Introduction to Spark 2.0 - Part 7 : Meetup Talk on Spark 2.0 API
Evolution of Apache Spark : Journey of Spark in 1.x Series
Introduction to Spark 2.0 - Part 6 : Custom Optimizers in Spark SQL
Introduction to Spark 2.0 - Part 5 : Time Window in Spark SQL
Introduction to Spark 2.0 - Part 4 : Introduction to Catalog API
Introduction to Spark 2.0 - Part 3 : Porting Code from RDD API to Dataset API
Introduction to Spark 2.0 - Part 2 : Wordcount in Dataset API
Introduction to Spark 2.0 - Part 1 : Spark Session API
Introduction to Flink Streaming - Part 10 : Meetup Talk
Introduction to Flink Streaming - Part 9 : Event Time in Flink
Introduction to Flink Streaming - Part 8 : Understanding Time in Flink Streaming
Introduction to Flink Streaming - Part 7 : Implementing Session Windows using Custom Trigger
Introduction to Flink Streaming - Part 6 : Anatomy of Window API
Introduction to Flink Streaming - Part 5 : Window API in Flink
Introduction to Flink Streaming - Part 3 : Running Streaming Applications in Flink Local Mode
Introduction to Flink Streaming - Part 2 : Discretization of Stream using Window API
Introduction to Flink Streaming - Part 1 : WordCount
Introduction to Spark 2.0 : A Sneak Peek At Next Generation Spark
Interactive Scheduling using Azkaban - Part 1 : Setting up Solo Server
Building Distributed Systems from Scratch - Part 2 : Handling third party libraries
Building Distributed Systems from Scratch - Part 1
Akka HTTP testing
JSON in Akka HTTP
Akka HTTP Hello world
Introduction to Machine learning with Spark
Improving Mobile payments with Real time Spark
Anatomy of Data Frame API : Deep dive into Spark SQL Data Frame API
Anatomy of Data Source API : Deep dive into Spark SQL Data Source API
Structured data processing with Spark SQL - Meetup Video
Analysing CSV data in Spark : Introduction to Spark Data Source API - Part 2
Introduction to Spark Data Source API - Part 1
An Introduction to Spark Streaming- Meetup Video
Handling empty batches in Spark streaming
Anatomy of RDD : Deep dive into spark RDD abstraction - Meetup video
Extending Spark API
Pipe in Spark
A Simple Akka Remote example
Running scala programs on YARN
Implementing shuffle in Mesos
Distributing third party libraries in Mesos
sizeof operator for Java/Scala
Kryo disk serialization in Spark
Custom mesos executor in Scala
Mesos Hello world in Scala
Sbt on ubuntu
Google, it's time - We want Scala for Android
Converting Java collections to Scala