• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/20

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

20 Cards in this Set

  • Front
  • Back

Test disk I/O speed

Command to test disk I/O speed hdparm -t




Speed should be 70MB/sec or more. Anything less is an indication of problem.

OS parameters

Set vm.sawppiness to 0 in /etc/sysctl.conf


Use ext3/etx4 filesystem, Recommended ext4


Increase ulimit for mapred and hdfs user to atleast 32k. Recommended 64k (/etc/security/limits.conf)


Disable IPv6


Disable SELinux


Install and Configure NTP daemon


Unix user accounts

The HDFS, MapReduce, and YARN services are usually run as separate users, named hdfs, mapred, and yarn, respectively.




They all belong to the same hadoop group

Formatting HDFS filesystem

The formatting process creates an empty filesystem by creating the storage directories and the initial versions of the namenode’s persistent data structures.


Datanodes are not involved in the initial formatting process, since the namenode manages all of the filesystem’s metadata, and datanodes can join or leave the cluster dynamically.

start-dfs.sh

As hdfs user


Starts a namenode on each machine returned by executing hdfs getconf -namenodes


Starts a datanode on each machine listed in the slaves file


Starts a secondary namenode on each machine returned by executing hdfs getconf -secondarynamenodes

start-yarn.sh

As yarn user




Starts a resource manager on the local machine




Starts a node manager on each machine listed in the slaves file

Environment variables



HADOOP_CLASSPATH hadoop-env.sh


HADOOP_HEAPSIZE hadoop-env.sh


JAVA_HOME hadoop-env.sh


HADOOP_NAMENODE_OPTS hadoop-env.sh


HADOOP_LOG_DIR hadoop-env.sh


HADOOP_IDENT_STRING hadoop-env.sh


HADOOP_SSH_OPTS hadoop-env.sh


YARN_RESOURCEMANAGER_HEAPSIZE yarn-env.sh



Important HDFS daemon properties

fs.defaultFS core-site.xml


dfs.namenode.name.dir hdfs-site.xml


dfs.datanode.name.dir hdfs-site.xml


dfs.namenode.checkpoint.dir hdfs-site.xml



Importnat YARN daemon properties

yarn-site.xml


yarn.resourcemanger.hostname


yarn.resourcemanger.address ($y.rm.hostname:8032)


yarn.nodemanager.local-dirs


yarn.nodemanager.aux-services


yarn.nodemanager.resource.memory-mb (8192)


yarn.nodemanager.resource.cpu-vcores (8)


yarn.nodemanager.vmem-pmem-ratio (2.1)

Mapreduce job memory/cpu properties

mapreduce.map.memory.mb (1024)


mapreduce.reduce.memory.mb (1024)


mapred.child.java.opts (-Xmx200m)


mapreduce.map.java.opts (-Xmx200m)


mapreduce.reduce.java.opts (-Xmx200m)




mapreduce.map.cpu.vcores (1)


mapreduce.reduce.cpu.vcores (1)

Default RPC Ports

Namenode 8020


Datanode 50020


Job History 10020


Resource Manager 8032


Resource Manager Admin 8033


Resource Manager Scheduler 8030


Resource Manager resource tracker 8031


Node Manager 0


Node Manager localizer 8040

Default http ports

Namenode 50070


Seconday Namenode 50090


Datanode 50075


Jobhistory 19888


mapreduce shuffle 13562


Resource Manager 8088


Node Manager 8042

Cluster membership

In hdfs-site.xml for datanodes


dfs.hosts (include filename)


dfs.hosts.exclude ( exclude filename)




In yarn-site.xml for node managers


yarn.resourcemanager.nodes.include-path


yarn.resourcemanager.nodes.exclude-path



I/O buffer size

Default 4 kb




Recommended 128 kb




Set the property in bytes in core-site.xml


io.file.buffer.size



Block Size

default 128 MB




Set the property in bytes in hdfs-site.xml


dfs.blocksize

Datanode reserve storage space

Set the property in bytes in hdfs-site.xml


dfs.datanode.du.reserved

Trash

Hadoop filesystem has Trash facility


fs.trash.interval in minutes


default value 0


User level feature .Trash folder for every user.




hadoop fs -expunge

Reduce Slow start

default 5%

Setting mapreduce.job.reduce.slowstart.completedmaps to a higher value, such as 0.80 (80%), will help improve throughput.

Short circuit local read

Enable short-circuit local reads by setting dfs.client.read.shortcircuit to true.




The path is set using the property dfs.domain.socket.path, and must be a path that only the datanode user (typically hdfs) or root can create, such as /var/run/hadoop-hdfs/dn_socket.

Configuration Precedence

Highest to Lowest




Code


CLI


Client


Slave


Cluster


Default




If a value in configuration file is marked final it overrides all others.