Getting Started With Hue on HDP

Step 1:
 Prerequisites

  1. Verify that you have a host that supports Hue:
    1. RHEL, CentOS, Oracle v5 or v6
    2. Windows (Vista, 7)
    3. Mac OS X (10.6 or later)

Note
Hue is not supported on Ubuntu.
  1. Verify that you have a browser that supports Hue:

Hue Browser Support

Linux (RHEL, CentOS, Oracle, SLES)

Windows (VISTA, 7)

Mac OS X (10.6 or later)

Firefox latest stable release Firefox latest stable release Firefox latest stable release
Google Chrome latest stable release Google Chrome latest stable release Google Chrome latest stable release
N/A Internet Explorer 9 (for Vista + Windows 7) N/A
N/A Safari latest stable release Safari latest stable release
  1. Verify that you have the fallowing Technical details that supports Hue.

    • Distribution: Hortonworks Data Platform (HDP) 2.2
    • Cluster Manager: Apache Ambari 1.7
    • Environment: Amazon EC2/Google cloud
    • Operating System: Ubuntu 12.04 LTS (RHEL6/CentOS6 works fine as well)

     

 

 

Step 2: You have to install all the dependent packages before installing Hue.

  1. If you are downloading the Hue from github make sure ‘git’ command is installed in your system. Otherwise you can install with the fallowing command in centos.

     

    Command: sudo yum install git

     


     

  2. Install ‘apache-maven’

     

    First you need to download apache-maven

     

    Command: wget https://repos.fedorapeople.org/repos/dchen/apache-maven/epel-6/x86_64/repodata/repomd.xml

     


     

    Now you can install apache-maven using the below command.

     

    Comamnd: sudo yum install apache-maven

     


     

  3. Some times when you are installing the packages using the command ‘sudo yum install <package name you want to install>, It will give you an error like package not found ,Nothing to do. In that case you can find the exact package names in your system using the below command, then you can use that package name to install.

     

    Command: yum search <package name you want to search>

     


     

  4. Install ‘gcc’.

     

    Command : sudo yum install gcc

     


     

  5. Install ‘krb5-devel’.

     

    Command: sudo yum install krb5-devel

     


     

  6. Install ‘libtidy’.

    Command: sudo yum install libtidy


     

  7. Install ‘libxml2-devel’.

     

    Command: sudo yum install libxml2-devel

     


     

  8. Install ‘mysql’.

     

    Command: sudo yum install mysql

     


     

  9. Install ‘mysql-devel’.

     

    Comamnd: sudo yum install mysql-devel

     


     


     

  10. Install ‘openldap-devel’.

     

    Command: sudo yum install openldap-devel.x86_64

     


     

  11. Install ‘python-devel’.

     

    Command: sudo yum install python-devel.x86_64

     


     

  12. Install ‘python-simplejson’.

     

    Command: sudo yum install ‘python-simplejson’

     


     

  13. Install ‘sqlite-devel’.

     

    Command: sudo yum install ‘sqlite-devel’

     


     

  14. Install ‘libxslt-devel’.

     

    Command: sudo yum install libxslt-devel.x86_64

     


Step 3:
If you are using an Ambari-managed cluster, use Ambari to update the Service configurations (core-site.xml, mapred-site.xml, webhbcat-site.xml and oozie-site.xml). Do not edit the configuration files directly and use Ambari to start and stop the services.

First you need to stop all the services in Ambari to modify the configuration file. Use the fallowing steps to stop the services in Ambari.

  1. Login to Ambari.

     


     

  2. Stop all the services.

     


     

  3. Make sure all are stopped.

     


Step 4: Configure HDP

  1. Modify the hdfs-site.xml file.

    On the NameNode, Secondary NameNode, and all the DataNodes, add the following properties to the $HADOOP_CONF_DIR/hdfs-site.xml file.

    where $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

    <property>

    <name>dfs.webhdfs.enabled</name>

    <value>true</value>

    </property>

    <property>

    <name>dfs.support.append</name>

    <value>true</value>

    </property>

 


    Command: sudo vi /etc/Hadoop/conf/hdfs-site.xml


  1. Modify the core-site.xml file.

    On the NameNode, Secondary NameNode, and all the DataNodes, add the following properties to the $HADOOP_CONF_DIR/core-site.xml file.

    where $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

    <property>

     <name>hadoop.proxyuser.hue.hosts</name>

     <value>*</value>

    </property>

    <property>

     <name>hadoop.proxyuser.hue.groups</name>

     <value>*</value>

    </property>

<property>

<name>hadoop.proxyuser.hcat.groups</name>

<value>*</value>

</property>

<property>

<name>hadoop.proxyuser.hcat.hosts</name>

<value>*</value>

</property>

    Command:
sudo vi /etc/Hadoop/conf/core-site.xml

  1. [Optional] – Run HttpFS Server to provide Hue access to HDFS.

If you are using a remote Hue Server, you can run an HttpFS server to provide Hue access to HDFS. Add the following properties to  /etc/hadoop-httpfs/conf/httpfs-site.xml file:

<property>

<name>httpfs.proxyuser.hue.hosts</name>

<value>*</value>

</property>

<property>

<name>httpfs.proxyuser.hue.groups</name>

<value>*</value>

</property>

 

  1. Modify the webhcat-site.xml file. On the WebHCat Server host, add the following properties to the $WEBHCAT_CONF_DIR/webhcat-site.xml

    Where$WEBHCAT_CONF_DIR is the directory for storing WebHCat configuration files. For example, /etc/webhcat/conf.

 

    <property>

<name>webhcat.proxyuser.hue.hosts</name>

<value>*</value>

</property>

<property>

<name>webhcat.proxyuser.hue.groups</name>

<value>*</value>

</property>

    Command:
sudo vi $WEBHCAT_CONF_DIR/webhcat-site.xml

  1. Modify the oozie-site.xml file. On the Oozie Server host, add the following properties to the $OOZIE_CONF_DIR/oozie-site.xml Where$OOZIE_CONF_DIR is the directory for storing Oozie configuration files. For example, /etc/oozie/conf.

     <property>

        <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>

        <value>*</value>

     </property>

     <property>

        <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>

        <value>*</value>

     </property>

Command: sudo vi $OOZIE_CONF_DIR/oozie-site.xml

  1. [Optional] – If you are setting $HADOOP_CLASSPATH in your $HADOOP_CONF_DIR/hadoop-env.sh file, ensure that your settings preserve the user-specified options (where $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files, for example, /etc/hadoop/conf).

    For example, the following sample illustrates correct settings for $HADOOP_CLASSPATH:

# HADOOP_CLASSPATH=<your_additions>:$HADOOP_CLASSPATH

This setting allows certain Hue components to add to Hadoop’s CLASSPATH using the environment variable.

  1. [Optional] – Enable job submission via Hue and command line utility.

The hadoop.tmp.dir is used to unpack JAR files in bin/hadoop JAR.

If the users start using both Hue and command line interface for job submission it leads to contention for the hadoop.tmp.dir directory. By default, hadoop.tmp.diris at /tmp/hadoop-$USER_NAME.

To enable job submission via both Hue and command line utility, update the following property in the core-site.xml file:

<property>

<name>hadoop.tmp.dir</name>

<value>/tmp/hadoop-$USER_NAME$HUE_SUFFIX</value>

</property>

  1. Restart all the services in your cluster i.e in Ambari go to Actions->start all, it will start all the services again.

 

Step 5: Installing Hue from github.

 

Command: git clone https://github.com/cloudera/hue.git

 


 

Step 6: Change the directory to hue.

 

Command : cd hue

 


 

Step 7: Run the below command

 

Command: make apps

 


 

Step 8: Configure Hue.


    Hue configuration file can be found under /hue/desktop/conf

    File name is : pseudo-distributed.ini

 

  1. Configure Web Server

Use the following instructions to configure Web server:

These configuration variables are under the [desktop] section in the hue.ini configuration file.

  1. Specify the Hue HTTP Address.

    Use the following options to change the IP address and port of the existing Web Server for Hue (by default, Spawning or CherryPy).

          # Webserver listens on this address and port
             http_host=0.0.0.0
             http_port=8000

The default setting is port 8000 on all configured IP addresses.

  1. Specify the Secret Key.

To ensure that your session cookies are secure, enter a series of random characters (30 to 60 characters is recommended) as shown below:

    secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
  1. Configure authentication.

    By default, the first user who logs in to Hue can choose any username and gets the administrator privileges. This user can create other user and administrator accounts. User information is stored in the Django database in the Django backend.

  2. Configure Hue for SSL.

    Install pyOpenSSL in order to configure Hue to serve over HTTPS. To install pyOpenSSL, from the root of your Hue installation path, complete the following instructions:

    1. Execute the following command on the Hue Server:
             Command: ./build/env/bin/easy_install pyOpenSSL
    
    1. Configure Hue to use your private key. Add the following to hue.ini file:
                      ssl_certificate=$PATH_To_CERTIFICATE
             ssl_private_key=$PATH_To_KEY
					

Ideally, you should have an appropriate key signed by a Certificate Authority. For test purposes, you can create a self-signed key using the openssl command on your system:


					To Create a key

					Command : openssl genrsa 1024 > host.key
To Create a self-signed certificate
Command: openssl req -new -x509 -nodes -sha1 -key host.key > host.cert
  1. Configure Hadoop.

    Edit the following configuration variables under [hadoop] section in the /hue/desktop/conf/ pseudo-distributed.ini configuration file.

    1. Configure HDFS Cluster. Hue supports only one HDFS cluster currently.

    Ensure that you define the HDFS cluster under the [[[default]]] sub-section.

    Configure the following variables:


    [hadoop]

    [[hdfs_clusters]]

    [[[default]]]

    # This is equivalent to fs.defaultFS (fs.default.name) in Hadoop configuration.

    fs_defaultfs=hdfs://localhost:8020

    # Use WebHDFS/HttpFS to access HDFS data.

    # You can also set this to be the HttpFS URL.

    # The default value is the HTTP port on the NameNode.

    webhdfs_url=
    http://localhost:50070/webhdfs/v1

    # This is the home of your Hadoop HDFS installation. Defaults to $HADOOP_HDFS_HOME or to /usr/lib/hadoop.

    hadoop_hdfs_home=/usr/lib/hadoop

    # This is the HDFS Hadoop launcher script. Defaults to $HADOOP_BIN or /usr/bin/hadoop.

    hadoop_bin=/usr/bin/hadoop

    # This is the configuration directory of the HDFS. Defaults to $HADOOP_CONF_DIR or /etc/hadoop/conf.

    hadoop_conf_dir=/etc/hadoop/conf

    1. Configure the MapReduce Cluster. Currently, Hue supports only one MapReduce cluster.

    Ensure that you define the HDFS cluster under the [[[default]]] sub-section.

    Configure the following variables:


    [hadoop]

    [[mapred_clusters]]

    [[[default]]]

    # The host running the JobTracker.

    # For secure Hadoop cluster, this needs to be the FQDN of the JobTracker host.

    # The “host” portion must match with the ‘mapred’ Kerberos principal full name.

    jobtracker_host=

    # The port for the JobTracker IPC service.

    jobtracker_port=8021

    # If Oozie is configured to talk with a MapReduce service, then set this to true.

    # Hue will be submitting jobs to this MapReduce cluster.

    submit_to=True

    # Home of your Hadoop MapReduce installation and defaults to either $HADOOP_MR1_HOME or /usr/lib/hadoop-0.20-mapreduce

    hadoop_mapred_home=/usr/lib/hadoop

    # MR1 Hadoop launcher script. Defaults to $HADOOP_BIN or /usr/bin/hadoop

    hadoop_bin=/usr/bin/hadoop

    # Configuration directory of the MR1 service. Defaults to $HADOOP_CONF_DIR or /etc/hadoop/conf

    hadoop_conf_dir=/etc/hadoop/conf

  2. [Optional] – Configure Beeswax.

In the [beeswax] section of the configuration file, you can specify the following:


[beeswax]

# Hostname or IP that the Beeswax Server should bind to.

beeswax_server_host=localhost

# Base directory of your Hive installation

hive_home_dir=/usr/lib/hive

# Directory containing your hive-site.xml Hive configuration file.

hive_conf_dir=/etc/hive/conf

# Heap size (-Xmx) of the Beeswax Server.

beeswax_server_heapsize=

  1. Configure JobDesigner and Oozie.

In the [liboozie] section of the configuration file, specify the following:


[liboozie]

# URL of the Oozie service as specified by the OOZIE_URL environment variable for Oozie.

oozie_url= http://localhost:11000/oozie

  1. Configure UserAdmin.

In the [useradmin] section of the configuration file, specify the following:


[useradmin]

# Default group suggested when creating a user manually.

# If the LdapBackend or PamBackend are configured for user authentication, new users will automatically be members of the default group.

default_user_group=

  1. Validate your configuration.

For any invalid configurations, Hue displays red alert icon on the top navigation bar:


To view the configuration of an existing Hue instance, either browse to http://myserver:8888/dump_config or use the About menu.

Step 9: Start the Hue server.

 

Command: build/env/bin/hue runserver.

 


 

Step 10: Open the browser to connect to Hue web server.

 

Link:  http://localhost:8000.

 

 

 

 

 

 

 


Leave a Reply