Hi,
Today I am going to share my experience in running MapReduce wordcount program through pydoop (python) installed in UBUNTU 16.04.
I have been struggling with Hadoop for about a year. Now I am okay in running HDFS, MapReduce. However, yet to get thorough of the same.
I tried pydoop with my naive-like attitude. First I was trying to install pydoop in Ubuntu terminal (my favourite OS), but in vain. I wasted almost all day at my office. I could not figure out as how to erect pydoop since the office system works behind proxy :(
I came home and started doing the same in my PC (HP laptop). I first did the following statement in the terminal but found helpless...
sudo pip install pydoop.
I found a problem regarding "HADOOP_HOME". I somehow understood that the pydoop is trying to figure-out Hadoop installation directly but fails. I got certain solution through forums (promise, I forgot where it is....). I changed the command as below.
sudo pip -E install pydoop
That "E" did the trick. Now pydoop set right properly in my laptop as I found from the output message.
As I already have Hadoop properly installed in pseudo-distribution mode (single node deployment). I did the following steps in terminal.
Today I am going to share my experience in running MapReduce wordcount program through pydoop (python) installed in UBUNTU 16.04.
I have been struggling with Hadoop for about a year. Now I am okay in running HDFS, MapReduce. However, yet to get thorough of the same.
I tried pydoop with my naive-like attitude. First I was trying to install pydoop in Ubuntu terminal (my favourite OS), but in vain. I wasted almost all day at my office. I could not figure out as how to erect pydoop since the office system works behind proxy :(
I came home and started doing the same in my PC (HP laptop). I first did the following statement in the terminal but found helpless...
sudo pip install pydoop.
I found a problem regarding "HADOOP_HOME". I somehow understood that the pydoop is trying to figure-out Hadoop installation directly but fails. I got certain solution through forums (promise, I forgot where it is....). I changed the command as below.
sudo pip -E install pydoop
That "E" did the trick. Now pydoop set right properly in my laptop as I found from the output message.
As I already have Hadoop properly installed in pseudo-distribution mode (single node deployment). I did the following steps in terminal.
- I created a small text file in which I have some arbitrary text.
- Uploaded to HDFS (directory)
- Created two folder in HDFS directory such as ../input and ../output in the same directory
- Wrote pydoop script file for word counting
- Executed the script.
hdfs dfs -ls /dir2
hdfs dfs -mkdir /dir2/input /dir2/output
hdfs dfs -copyFromLocal text.txt /dir2/input
Now I prepared the script file (script.py) with the following code.
def mapper(_, text, writer):
for word in text.split():
writer.emit(word, "1")
def reducer(word, icounts, writer):
writer.emit(word, sum(map(int, icounts)))
Went back to the terminal and executed the following statement.
pydoop script script.py /dir2/input/text.txt /dir2/output/output_
Checked the results in the output folder of the directory.
BINGO!!! I am now a big data expert! Hey! Hey! Hey.
No comments:
Post a Comment