How to Install PySpark and Apache Spark on MacOS

-

Here is an easy Step by Step guide to installing PySpark and Apache Spark on MacOS.

Step 1: Get Homebrew

Homebrew makes installing applications and languages on a Mac OS a lot easier. You can get Homebrew by following the instructions on its website. In short you can install Homebrew in the terminal using this command:

 

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Step 2: Installing xcode-select

Xcode is a large suite of software development tools and libraries from Apple. In order to install Java, and Spark through the command line we will probably need to install xcode-select. Use the blow command in your terminal to install Xcode-select: xcode-select –install You usually get a prompt that looks something like this to go further with installation: You need to click “install” to go further with the installation.

Step 3: DO NOT use Homebrew to install Java!

The latest version of Java (at time of writing this article), is Java 10. And Apache spark has not officially supported Java 10! Homebrew will install the latest version of Java and that imposes many issues!

To install Java 8, please go to the official website: https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Then From “Java SE Development Kit 8u191” Choose:

Mac OS X x64 245.92 MB jdk-8u191-macosx-x64.dmg

To download Java. Once Java is downloaded please go ahead and install it locally.

Step 3: Use Homebrew to install Apache Spark

To do so, please go to your terminal and type: brew install apache-spark Homebrew will now download and install Apache Spark, it may take some time depending on your internet connection. You can check the version of spark using the below command in your terminal: pyspark –version You should then see some stuff like below:

Step 4: Install PySpark and FindSpark in Python

To be able to use PyPark locally on your machine you need to install findspark and pyspark If you use anaconda use the below commands:  

 

#Find Spark Option 1: 
     conda install -c conda-forge findspark 
#Find Spark Option 2: 
     conda install -c conda-forge/label/gcc7 findspark 
#PySpark: 
     conda install -c conda-forge pyspark

If you use regular python use pip install as: 
     pip install findspark 
     pip install pyspark

Step 5: Your first code in Python

After the installation is completed you can write your first helloworld script:

 

    import findspark from pyspark 
    import SparkContext from pyspark.sql 
    import SparkSession 
    findspark.init() 
    sc = SparkContext(appName="MyFirstApp") 
    spark = SparkSession(sc) 
    print("Hello World!") 
    sc.close() #closing the spark session