Madhukar's Blog

Category: scala

Rediscovering Implicits in Scala 3 - Part 3: Summoning Implicits

Rediscovering Implicits in Scala 3 - Part 2: Extension methods

Rediscovering Implicits in Scala 3 - Part 1: Implicit Parameters

Understanding Spark Connect API - Part 5: Dataframe Sharing Across Spark Sessions

Understanding Spark Connect API - Part 4: PySpark Example

Understanding Spark Connect API - Part 3: Scala API Example

Understanding Spark Connect API - Part 2: Introduction to Architecture

Understanding Spark Connect API - Part 1: Shortcomings of Spark Driver Architecture

Latest Java Features from a Scala Dev Perspective - Part 5: Java Streams

Latest Java Features from a Scala Dev Perspective - Part 4: Higher Order Functions

Latest Java Features from a Scala Dev Perspective - Part 3: Functional Interfaces

Latest Java Features from a Scala Dev Perspective - Part 2: Lambda Expressions

Latest Java Features from a Scala Dev Perspective - Part 1: Type Inference

Introduction to Spark 3.0 - Part 10 : Ignoring Data Locality in Spark

Data Source V2 API in Spark 3.0 - Part 6 : MySQL Source

Introduction to Spark 3.0 - Part 9 : Join Hints in Spark SQL

Introduction to Spark 3.0 - Part 8 : DataFrame Tail Function

Adaptive Query Execution in Spark 3.0 - Part 2 : Optimising Shuffle Partitions

Adaptive Query Execution in Spark 3.0 - Part 1 : Introduction

Spark Plugin Framework in 3.0 - Part 5: RPC Communication

Spark Plugin Framework in 3.0 - Part 4 : Custom Metrics

Spark Plugin Framework in 3.0 - Part 3 : Dynamic Stream Configuration using Driver Plugin

Introduction to Spark 3.0 - Part 7 : Dynamic Allocation Without External Shuffle Service

Spark Plugin Framework in 3.0 - Part 2 : Anatomy of the API

Spark Plugin Framework in 3.0 - Part 1: Introduction

Introduction to Spark 3.0 - Part 6 : Min and Max By Functions

Introduction to Spark 3.0 - Part 5 : Easier Debugging of Cached Data Frames

Introduction to Spark 3.0 - Part 4 : Handling Class Imbalance Using Weights

Data Source V2 API in Spark 3.0 - Part 5 : Anatomy of V2 Write API

Introduction to Spark 3.0 - Part 3 : Data Loading From Nested Folders

Introduction to Spark 3.0 - Part 2 : Multiple Column Feature Transformations in Spark ML

Introduction to Spark 3.0 - Part 1 : Multi Character Delimiter in CSV Source

Data Source V2 API in Spark 3.0 - Part 4 : In-Memory Data Source with Partitioning

Data Source V2 API in Spark 3.0 - Part 3 : In-Memory Data Source

Data Source V2 API in Spark 3.0 - Part 2 : Anatomy of V2 Read API

Data Source V2 API in Spark 3.0 - Part 1 : Motivation for New Abstractions

Scala Integration Testing with TestContainers Library

Writing Apache Spark Programs in JavaScript

Experiments with GraalVM - Part 5 :Passing Scala Object to JavaScript

Experiments with GraalVM - Part 4 : JavaScript Object to Case Class

Experiments with GraalVM - Part 3 : Invoke JS Functions from JVM

Experiments with GraalVM - Part 2 : Polyglot JavaScript Hello World

Experiments with GraalVM - Part 1 : Introduction

Scala Magnet Pattern

ClickHouse Clustering for Spark Developer

Data Modeling in Apache Spark - Part 2 : Working With Multiple Dates

Data Modeling in Apache Spark - Part 1 : Date Dimension

Dynamic Shuffle Partitions in Spark SQL

Scala Developer Journey into Rust - Part 7 : Type Classes

Scala Developer Journey into Rust - Part 6 : Traits

Multi Source Data Analysis using Spark and Tellius : Meetup Video

Migrating to Spark 2.4 Data Source API

Multiple Column Feature Transformations in Spark ML

Parallel Cross Validation in Spark

Scala Developer Journey into Rust - Part 5: Domain Models

Scala Developer Journey into Rust - Part 4: Algebraic Data Types

Scala Developer Journey into Rust - Part 3: Expression Based Language

Scala Developer Journey into Rust - Part 2 : Type Inference

Scala Developer Journey into Rust - Part 1 : Introduction

Spark on Kubernetes : Native Kubernetes Integration for Spark

Exploring Spark DataSource V2 - Part 8 : Transactional Writes

Exploring Spark DataSource V2 - Part 7 : Meetup Talk

Exploring Spark DataSource V2 - Part 6 : Anatomy of V2 Write API

Exploring Spark DataSource V2 - Part 5 : Filter Push

Exploratory Data Analysis in Spark with Jupyter

Exploring Spark DataSource V2 - Part 4 : In-Memory DataSource with Partitioning

Exploring Spark DataSource V2 - Part 3 : In-Memory DataSource

Exploring Spark DataSource V2 - Part 2 : Anatomy of V2 Read API

Exploring Spark Data Source V2 - Part 1 : Limitations of Data Source V1 API

Converting Spark ML Vector to Numpy Array

Introduction to Spark Structured Streaming - Part 15: Meetup Talk on Time and Window API

Class Imbalance in Credit Card Fraud Detection - Part 3 : Undersampling in Spark

Class Imbalance in Credit Card Fraud Detection - Part 2 : Undersampling in Python

Class Imbalance in Credit Card Fraud Detection - Part 1 : Understanding Effect on Model Accuracy

Analysing Kaggle Titanic Survival Data using Spark ML

Introduction to Spark Structured Streaming - Part 14 : Session Windows using Custom State

Introduction to Spark Structured Streaming - Part 13: Meetup Talk

Introduction to Spark Structured Streaming - Part 12 : Watermarks

Introduction to Spark Structured Streaming - Part 11 : Event Time

Introduction to Spark Structured Streaming - Part 10 : Ingestion Time

Introduction to Spark Structured Streaming - Part 9 : Processing Time Window

Introduction to Spark Structured Streaming - Part 8 : Time Abstraction

Introduction to Spark Structured Streaming - Part 7 : Checkpointing State

Introduction to Spark Structured Streaming - Part 6 : Stream Enrichment using Static Data Join

Introduction to Spark Structured Streaming - Part 5 : File Streams

Introduction to Spark Structured Streaming - Part 4 : Stateless Aggregations

Introduction to Spark Structured Streaming - Part 3 : Stateful WordCount

Introduction to Spark Structured Streaming - Part 2 : Source and Sinks

Introduction to Spark Structured Streaming - Part 1 : DataFrame Abstraction to Stream

Migrating to Spark 2.0 - Part 10 : Second Meetup Talk

Migrating to Spark 2.0 - Part 9 : Hive Integration

Migrating to Spark 2.0 - Part 8 : Catalog API

Migrating to Spark 2.0 - Part 7 : SubQueries

Migrating to Spark 2.0 - Part 6 : Spark ML Transformer API

Migrating to Spark 2.0 - Part 5 : Meetup Talk

Migrating to Spark 2.0 - Part 4 : Cross Joins

Migrating to Spark 2.0 - Part 3 : DataFrame to Dataset

Scalable Spark Deployment using Kubernetes - Part 9 : Service Update and Rollback

Scalable Spark Deployment using Kubernetes - Part 8 : Meetup Talk

Migrating to Spark 2.0 - Part 2 : Built-in CSV Connector

Migrating to Spark 2.0 - Part 1 : Scala Version and Dependencies

Scalable Spark Deployment using Kubernetes - Part 7 : Dynamic Scaling and Namespaces

Scalable Spark Deployment using Kubernetes - Part 6 : Building Spark 2.0 Two Node Cluster

Scalable Spark Deployment using Kubernetes - Part 5 : Building Spark 2.0 Docker Image

Scalable Spark Deployment using Kubernetes - Part 4 : Service Abstractions

Scalable Spark Deployment using Kubernetes - Part 3 : Kubernetes Abstractions

Scalable Spark Deployment using Kubernetes - Part 2 : Installing Kubernetes Locally using Minikube

Scalable Spark Deployment using Kubernetes - Part 1 : Introduction to Kubernetes

Statistical Data Exploration using Spark 2.0 - Part 3 : Outlier Detection using Quantiles

Statistical Data Exploration using Spark 2.0 - Part 2 : Shape of Data with Histograms

Statistical Data Exploration using Spark 2.0 - Part 1 : Five Number Summary

Introduction to Spark 2.0 - Part 7 : Meetup Talk on Spark 2.0 API

Evolution of Apache Spark : Journey of Spark in 1.x Series

Introduction to Spark 2.0 - Part 6 : Custom Optimizers in Spark SQL

Introduction to Spark 2.0 - Part 5 : Time Window in Spark SQL

Introduction to Spark 2.0 - Part 4 : Introduction to Catalog API

Introduction to Spark 2.0 - Part 3 : Porting Code from RDD API to Dataset API

Introduction to Spark 2.0 - Part 2 : Wordcount in Dataset API

Introduction to Spark 2.0 - Part 1 : Spark Session API

Introduction to Flink Streaming - Part 10 : Meetup Talk

Introduction to Flink Streaming - Part 9 : Event Time in Flink

Introduction to Flink Streaming - Part 8 : Understanding Time in Flink Streaming

Introduction to Flink Streaming - Part 7 : Implementing Session Windows using Custom Trigger

Introduction to Flink Streaming - Part 6 : Anatomy of Window API

Introduction to Flink Streaming - Part 5 : Window API in Flink

Introduction to Flink Streaming - Part 3 : Running Streaming Applications in Flink Local Mode

Introduction to Flink Streaming - Part 2 : Discretization of Stream using Window API

Introduction to Flink Streaming - Part 1 : WordCount

Introduction to Spark 2.0 : A Sneak Peek At Next Generation Spark

Interactive Scheduling using Azkaban - Part 1 : Setting up Solo Server

Building Distributed Systems from Scratch - Part 2 : Handling third party libraries

Building Distributed Systems from Scratch - Part 1

Akka HTTP testing

JSON in Akka HTTP

Akka HTTP Hello world

Introduction to Machine learning with Spark

Improving Mobile payments with Real time Spark

Anatomy of Data Frame API : Deep dive into Spark SQL Data Frame API

Anatomy of Data Source API : Deep dive into Spark SQL Data Source API

Structured data processing with Spark SQL - Meetup Video

Analysing CSV data in Spark : Introduction to Spark Data Source API - Part 2

Introduction to Spark Data Source API - Part 1

An Introduction to Spark Streaming- Meetup Video

Handling empty batches in Spark streaming

Anatomy of RDD : Deep dive into spark RDD abstraction - Meetup video

Extending Spark API

Pipe in Spark

A Simple Akka Remote example

Running scala programs on YARN

Implementing shuffle in Mesos

Distributing third party libraries in Mesos

sizeof operator for Java/Scala

Kryo disk serialization in Spark

Custom mesos executor in Scala

Mesos Hello world in Scala

Sbt on ubuntu

Google, it's time - We want Scala for Android

Converting Java collections to Scala