Skip to content

Sosa9/Apache-Pyspark-basic-tutorials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache-Pyspark-basic-tutorials

This repo contains of basic concepts of Apache Pyspark.

  1. Create a new environment before starting the session.
  2. Create new environment using "python -m venv myenv"
  3. Activate the virtual environment (source <path_location>/Scripts/activate)

Basic Concepts covered under this files are -

PART - 1

  • Pyspark Dataframe
  • Reading the dataset
  • Checking the datatypes (Schemas)
  • Selecting columns
  • Check describe
  • Adding columns
  • Renaming columns

PART - 2

  • Dropping rows and columns
  • Various parameter in dropping functionalities
  • Handling missing values (mean, median, mode)

PART - 3

  • Filter operations

PART - 4

  • Group by and Aggregate functions

About

This repo contains of basic concepts of Apache Pyspark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published