Sosa9 / Apache-Pyspark-basic-tutorials Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

This repo contains of basic concepts of Apache Pyspark.

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
demo.ipynb		demo.ipynb
test.csv		test.csv
test2.csv		test2.csv
test3.csv		test3.csv
test4.csv		test4.csv

Repository files navigation

Apache-Pyspark-basic-tutorials

This repo contains of basic concepts of Apache Pyspark.

Create a new environment before starting the session.
Create new environment using "python -m venv myenv"
Activate the virtual environment (source <path_location>/Scripts/activate)

Basic Concepts covered under this files are -

PART - 1

Pyspark Dataframe
Reading the dataset
Checking the datatypes (Schemas)
Selecting columns
Check describe
Adding columns
Renaming columns

PART - 2

Dropping rows and columns
Various parameter in dropping functionalities
Handling missing values (mean, median, mode)

PART - 3

Filter operations

PART - 4

Group by and Aggregate functions

About

This repo contains of basic concepts of Apache Pyspark.

Report repository

Releases

No releases published

Packages

No packages published

Languages

Jupyter Notebook 100.0%