当前位置:   article > 正文

python连接sql server2017,Pyspark:使用Python从Spark 2.4连接到MS SQL Server 2017时没有合适的驱动程序错误...

pyspark读取sql server添加驱动

I am facing a problem while running spark job using python i.e. pyspark.

Please see below the code snippets

from pyspark.sql import SparkSession

from os.path import abspath

from pyspark.sql.functions import max,min,sum,col

from pyspark.sql import functions as F

spark = SparkSession.builder.appName("test").config("spark.driver.extraClassPath", "/usr/dt/mssql-jdbc-6.4.0.jre8.jar").getOrCreate()

spark.conf.set("spark.sql.execution.arrow.enabled", "true")

spark.conf.set("spark.sql.session.timeZone", "Etc/UTC")

warehouse_loc = abspath('spark-warehouse')

#loading data from MS SQL Server 2017

df = spark.read.format("jdbc").options(url="jdbc:sqlserver://;DATABASE=TransTrak_V_1.0;user=sa;password=m2m@ipcl1234",properties = { "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver" },dbtable="Current_Voltage").load()

When I run this code, I am facing the following error:

py4j.protocol.Py4JJavaError: An error occurred while calling o38.load.

: java.sql.SQLException: No suitable driver

The same code used to run fine earlier. However, due to some reasons, I had to reinstall centOS 7 again and then Python 3.6. I have set python 3.6 as a default python in spark i.e. when I start pyspark the default python is 3.6.

Just to mention, the system default python is Python 2.7. I am using centOS 7.

What is going wrong here? Can anybody please help on this?


Ok, so after long search, it appears that probably spark doesn't work properly with openjdk i.e. java-1.8.0-openjdk- When I see the default Java I see it is as follows

openjdk version "1.8.0_131"

OpenJDK Runtime Environment (build 1.8.0_131-b12)

OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)

Then I tried to install Oracle JDK 8 from official site, however, then I faced separate issues.

So in nutshell, I am not able to run the spark jobs like earlier.

