You can operate it on data sources as a normal RDD or by registering as a temporary table.
However, this method is more verbose, but it lets you build DataFrames when you do not know the columns and their types until runtime.
It works well when the schema is already known when you are writing your Spark application.
Interoperating with RDDs, to convert existing RDDs into DataFrames, SparkSQL supports two methods. To get benefit with this feature, just install it with Hive. For ETL and business intelligence tools, Spark offers industry-standard ODBC and JDBC connectivity. Specific methods allow you to operate on built-in data sources. The module allows you to query structured data in programs of Spark by using SQL or a similar DataFrame API. For instance, from the Spark shell, to connect to Postgres, you need to run the command as depicted below. hiveContext is just packaged separately for avoiding the dependencies of Hive in the default Spark build. Additionally, it mixes SQL queries with programs of Spark easily.
