当前位置:   article > 正文

Spark SQL Data Types with Examples_datatypes.createstructfield array

datatypes.createstructfield array

Spark SQL DataType class is a base class of all data types in Spark which defined in a package org.apache.spark.sql.types.DataType and they are primarily used while working on DataFrames, In this article, you will learn different Data Types and their utility methods with Scala examples.

1. Spark SQL DataType – base class of all Data Types

All data types from the below table are supported in Spark SQL and DataType class is a base class for all these. For some types like IntegerType, DecimalType, ByteType e.t.c are subclass of NumericType which is a subclass of DataType.

StringTypeShortType
ArrayTypeIntegerType
MapTypeLongType
StructTypeFloatType
DateTypeDoubleType
TimestampTypeDecimalType
BooleanTypeByteType
CalendarIntervalTypeHiveStringType
BinaryTypeObjectType
NumericTypeNullType

1.1 DataType common methods

All Spark SQL Data Types extends DataType class and should provide implementation to the methods explained in this example.

  1. val arr = ArrayType(IntegerType,false)
  2. println("json() : "+arrayType.json) // Represents json string of datatype
  3. println("prettyJson() : "+arrayType.prettyJson) // Gets json in pretty format
  4. println("simpleString() : "+arrayType.simpleString) // simple string
  5. println("sql() : "+arrayType.sql) // SQL format
  6. println("typeName() : "+arrayType.typeName) // type name
  7. println("catalogString() : "+arrayType.catalogString) // catalog string
  8. println("defaultSize() : "+arrayType.defaultSize) // default size

Yields below output.

  1. json() : {"type":"array","elementType":"string","containsNull":true}
  2. prettyJson() : {
  3. "type" : "array",
  4. "elementType" : "string",
  5. "containsNull" : true
  6. }
  7. simpleString() : array<string>
  8. sql() : ARRAY<STRING>
  9. typeName() : array
  10. catalogString() : array<string>
  11. defaultSize() : 20

Besides these, the DataType class has the following static methods.

1.2 DataType.fromJson()

If you have a JSON string and you wanted to convert to a DataType use fromJson() . For example you wanted to convert JSON schema from a string to StructType.

  1. val typeFromJson = DataType.fromJson(
  2. """{"type":"array",
  3. |"elementType":"string","containsNull":false}""".stripMargin)
  4. println(typeFromJson.getClass)
  5. val typeFromJson2 = DataType.fromJson("\"string\"")
  6. println(typeFromJson2.getClass)
  7. //This prints
  8. class org.apache.spark.sql.types.ArrayType
  9. class org.apache.spark.sql.types.StringType$

1.3 DataType.fromDDL()

Like loading structure from JSON string, we can also create it fromDDL(),

  1. val ddlSchemaStr = "`fullName` STRUCT<`first`: STRING, `last`: STRING," +
  2. "`middle`: STRING>,`age` INT,`gender` STRING"
  3. val ddlSchema = DataType.fromDDL(ddlSchemaStr)
  4. println(ddlSchema.getClass)
  5. // This prints
  6. class org.apache.spark.sql.types.StructType

1.4 DataType.canWrite()

1.5 DataType.equalsStructurally()

2. Use Spark SQL DataTypes class to get a type object

In order to get or create a specific data type, we should use the objects and factory methods provided by org.apache.spark.sql.types.DataTypes class. for example, use object DataTypes.StringType to get StringType and the factory method DataTypes.createArrayType(StirngType) to get ArrayType of string.

  1. //Below are some examples
  2. val strType = DataTypes.StringType
  3. val arrayType = DataTypes.createArrayType(StringType)
  4. val structType = DataTypes.createStructType(
  5. Array(DataTypes.createStructField("fieldName",StringType,true)))

3. StringType

StringType “org.apache.spark.sql.types.StringType” is used to represent string values, To create a string type use either DataTypes.StringType or StringType(), both of these returns object of String type.

  1. val strType = DataTypes.StringType
  2. println("json : "+strType.json)
  3. println("prettyJson : "+strType.prettyJson)
  4. println("simpleString : "+strType.simpleString)
  5. println("sql : "+strType.sql)
  6. println("typeName : "+strType.typeName)
  7. println("catalogString : "+strType.catalogString)
  8. println("defaultSize : "+strType.defaultSize)

Outputs

  1. json : "string"
  2. prettyJson : "string"
  3. simpleString : string
  4. sql : STRING
  5. typeName : string
  6. catalogString : string
  7. defaultSize : 20

4. ArrayType

Use ArrayType to represent arrays in a DataFrame and use either factory method DataTypes.createArrayType() or ArrayType() constructor to get an array object of a specific type.

On Array type object you can access all methods defined in section 1.1 and additionally, it provides containsNull(), elementType(), productElement() to name a few.

  1. val arr = ArrayType(IntegerType,false)
  2. val arrayType = DataTypes.createArrayType(StringType,true)
  3. println("containsNull : "+arrayType.containsNull)
  4. println("elementType : "+arrayType.elementType)
  5. println("productElement : "+arrayType.productElement(0))

Yields below output.

  1. containsNull : true
  2. elementType : StringType
  3. productElement : StringType

For more example and usage, please refer Using ArrayType on DataFrame

5. MapType

Use MapType to represent maps with key-value pair in a DataFrame and use either factory method DataTypes.createMapType() or MapType() constructor to get a map object of a specific key and value type.

On Map type object you can access all methods defined in section 1.1 and additionally, it provides keyType(), valueType(), valueContainsNull(), productElement() to name a few.

  1. val mapType1 = MapType(StringType,IntegerType)
  2. val mapType = DataTypes.createMapType(StringType,IntegerType)
  3. println("keyType() : "+mapType.keyType)
  4. println("valueType() : "+mapType.valueType)
  5. println("valueContainsNull() : "+mapType.valueContainsNull)
  6. println("productElement(1) : "+mapType.productElement(1))

Yields below output.

  1. keyType() : StringType
  2. valueType() : IntegerType
  3. valueContainsNull() : true
  4. productElement(1) : IntegerType

For more example and usage, please refer Using MapType on DataFrame

6. DateType

Use DateType “org.apache.spark.sql.types.DataType” to represent the date on a DataFrame and use either DataTypes.DateType or DateType() constructor to get a date object.

On Date type object you can access all methods defined in section 1.1

7. TimestampType

Use TimestampType “org.apache.spark.sql.types.TimestampType” to represent the time on a DataFrame and use either DataTypes.TimestampType or TimestampType() constructor to get a time object.

On Timestamp type object you can access all methods defined in section 1.1

8. SructType

Use StructType “org.apache.spark.sql.types.StructType” to define the nested structure or schema of a DataFrame, use either DataTypes.createStructType() or StructType() constructor to get a struct object.

StructType object provides lot of functions like toDDL(), fields(), fieldNames(), length() to name few.

  1. //StructType
  2. val structType = DataTypes.createStructType(
  3. Array(DataTypes.createStructField("fieldName",StringType,true)))
  4. val simpleSchema = StructType(Array(
  5. StructField("name",StringType,true),
  6. StructField("id", IntegerType, true),
  7. StructField("gender", StringType, true),
  8. StructField("salary", DoubleType, true)
  9. ))
  10. val anotherSchema = new StructType()
  11. .add("name",new StructType()
  12. .add("firstname",StringType)
  13. .add("lastname",StringType))
  14. .add("id",IntegerType)
  15. .add("salary",DoubleType)

For more example and usage, please refer StructType

9. All other remaining Spark SQL Data Types

Similar to the above-described types, for the rest of the datatypes use the appropriate method on DataTypes class or data type constructor to create an object of the desired Data Type, And all common methods described in section 1.1 are available with these types.

Conclusion

In this article, you have learned all different Spark SQL DataTypes, DataType, DataTypes classes and their methods using Scala examples. I would recommend referring to DataType and DataTypes API for more details.

Thanks for reading. If you like it, please do share the article by following the below social links and any comments or suggestions are welcome in the comments sections! 

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/小丑西瓜9/article/detail/603671
推荐阅读
相关标签
  

闽ICP备14008679号