August9
The following tutorial guides you through installing Yahoo! LDA code to a newly installed Ubuntu 11.04 server. Apart from the default installation, the only package that is assumed to be installed from the installation menu is open-ssh server.
1) Install JAVA
1
2
3
4
| sudo apt-get install python-software-properties
sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
sudo apt-get update
sudo apt-get install sun-java6-jdk |
sudo apt-get install python-software-properties sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" sudo apt-get update sudo apt-get install sun-java6-jdk
JDK can now be found in /usr/lib/jvm/java-6-sun witch is actually a symlink on Ubuntu.
2) Download source code
In order to continue with the installation make a directory to your home folder (or wherever you like). This folder will be the folder that the application will reside after the installation. I have an apps folder inside my home so i created an LDA folder inside apps.
1
2
| mkdir ~/apps/LDA
cd ~/apps/LDA |
mkdir ~/apps/LDA cd ~/apps/LDA
Download source code from Github and extract.
1
2
3
4
5
6
7
8
| wget https://github.com/shravanmn/Yahoo_LDA/tarball/master
extract code
tar -xzf master
cd shravanmn-Yahoo_LDA-*
mv * ../
cd ..
rm -rf shravanmn-Yahoo_LDA-*
rm master |
wget https://github.com/shravanmn/Yahoo_LDA/tarball/master extract code tar -xzf master cd shravanmn-Yahoo_LDA-* mv * ../ cd .. rm -rf shravanmn-Yahoo_LDA-* rm master
3)Install system prerequisites
Install build-essential
1
| sudo apt-get install build-essential |
sudo apt-get install build-essential
Install emacs
1
| sudo apt-get install emacs |
sudo apt-get install emacs
Install ant
1
| sudo apt-get install ant |
sudo apt-get install ant
4) Make
Inside the LDA directory we created
5)Install Ice
The only problem with make is Ice, this must be compiled seperatelly.
Install system prerequisites.
Install libbz2-dev package.
1
| sudo apt-get install libbz2-dev |
sudo apt-get install libbz2-dev
Install xml parser.
1
| sudo apt-get install expat libexpat-dev |
sudo apt-get install expat libexpat-dev
Also install libssl-dev.
1
| sudo apt-get install libssl-dev |
sudo apt-get install libssl-dev
Install mono-develop and some needed packages.
1
| sudo apt-get install mono-complete |
sudo apt-get install mono-complete
Finally install mandotory dev packages.
1
| sudo apt-get install python-dev ruby ruby-dev php5-dev |
sudo apt-get install python-dev ruby ruby-dev php5-dev
Download third party sources for Ice.
1
2
3
4
5
| wget http://www.zeroc.com/download/Ice/3.4/ThirdParty-Sources-3.4.2.zip
sudo apt-get install unzip
unzip ThirdParty-Sources-3.4.2.zip
rm ThirdParty-Sources-3.4.2.zip
cd ThirdParty-Sources-3.4.2 |
wget http://www.zeroc.com/download/Ice/3.4/ThirdParty-Sources-3.4.2.zip sudo apt-get install unzip unzip ThirdParty-Sources-3.4.2.zip rm ThirdParty-Sources-3.4.2.zip cd ThirdParty-Sources-3.4.2
Install derby db.
1
2
3
4
5
6
| unzip db-4.8.30.NC.zip
cd db-4.8.30.NC
cd build_unix
../dist/configure --prefix=/usr/local/berkeleydb --enable-compat185 --enable-cxx --enable-debug_rop --enable-debug_wop --enable-java
make
sudo make install |
unzip db-4.8.30.NC.zip cd db-4.8.30.NC cd build_unix ../dist/configure --prefix=/usr/local/berkeleydb --enable-compat185 --enable-cxx --enable-debug_rop --enable-debug_wop --enable-java make sudo make install
Locate the db.jar file in order to have the extra java classes. It should be here:
1
| /usr/local/berkeleydb/lib/db.jar |
/usr/local/berkeleydb/lib/db.jar
Copy the jar file to your classpath.
1
| sudo cp /usr/local/berkeleydb/lib/db.jar /usr/lib/jvm/java-6-sun/lib |
sudo cp /usr/local/berkeleydb/lib/db.jar /usr/lib/jvm/java-6-sun/lib
Also will need classes from these packages included in third party sources directory.
1
2
3
4
5
6
7
8
9
10
11
12
13
| cd ../../
unzip jgoodies-common-1_2_0.zip
cd jgoodies-common-1.2.0
sudo cp jgoodies-common-1.2.0.jar /usr/lib/jvm/java-6-sun/lib
cd ../
unzip jgoodies-forms-1_4_1.zip
cd jgoodies-forms-1.4.1
sudo cp jgoodies-forms-1.4.1.jar /usr/lib/jvm/java-6-sun/lib
cd ../
unzip jgoodies-looks-2_4_1.zip
cd jgoodies-looks-2.4.1/
sudo cp jgoodies-looks-2.4.1.jar /usr/lib/jvm/java-6-sun/lib
cd ../ |
cd ../../ unzip jgoodies-common-1_2_0.zip cd jgoodies-common-1.2.0 sudo cp jgoodies-common-1.2.0.jar /usr/lib/jvm/java-6-sun/lib cd ../ unzip jgoodies-forms-1_4_1.zip cd jgoodies-forms-1.4.1 sudo cp jgoodies-forms-1.4.1.jar /usr/lib/jvm/java-6-sun/lib cd ../ unzip jgoodies-looks-2_4_1.zip cd jgoodies-looks-2.4.1/ sudo cp jgoodies-looks-2.4.1.jar /usr/lib/jvm/java-6-sun/lib cd ../
Manually have to compile manually some slice libraries in the build/Ice-3.4.1/cpp/src directory. These are slice2php, slice2cs, slice2freezej and slice2java.
1
2
3
4
5
6
7
8
9
10
11
12
13
| cd ~/apps/LDA/build/Ice-3.4.1/cpp/src
cd slice2php/
make
cd ../
cd slice2cs/
make
cd ../
cd slice2freezej/
make
cd ../
cd slice2java/
make
cd ../../../ |
cd ~/apps/LDA/build/Ice-3.4.1/cpp/src cd slice2php/ make cd ../ cd slice2cs/ make cd ../ cd slice2freezej/ make cd ../ cd slice2java/ make cd ../../../
Point your classpath to the directory you copied the needed jar files
1
| export CLASSPATH=/usr/lib/jvm/java-6-sun/lib/jgoodies-common-1.2.0.jar:/usr/lib/jvm/java-6-sun/lib/jgoodies-forms-1.4.1.jar:/usr/lib/jvm/java-6-sun/lib/db.jar:/usr/lib/jvm/java-6-sun/lib/jgoodies-looks-2.4.1.jar |
export CLASSPATH=/usr/lib/jvm/java-6-sun/lib/jgoodies-common-1.2.0.jar:/usr/lib/jvm/java-6-sun/lib/jgoodies-forms-1.4.1.jar:/usr/lib/jvm/java-6-sun/lib/db.jar:/usr/lib/jvm/java-6-sun/lib/jgoodies-looks-2.4.1.jar
Now build Ice
When this ends copy all created files from the lib folder to systems lib folder
1
2
| cd ../../
sudo cp lib/* /usr/lib |
cd ../../ sudo cp lib/* /usr/lib
This is all you need to run Yahoo! LDA code in a single machine.
6) Test installation (Batch Mode)
Follow the example referred in the documentation that accompanies code (inside the docs folder). Commands are a little bit altered in order to work for the single node example.
Phase 1 – Tokenization and Formatting
1
2
3
4
| cd ut_out
cp ../Tokenizer.java .
javac Tokenizer.java
cat ydir_1k.txt | java -classpath . Tokenizer | ../formatter |
cd ut_out cp ../Tokenizer.java . javac Tokenizer.java cat ydir_1k.txt | java -classpath . Tokenizer | ../formatter
Phase 2 – Learning the topic mixtures
1
| ../learntopics --topics=100 --iter=500 |
../learntopics --topics=100 --iter=500
If everything is ok, you will be able to see the word mixtures for each topic
and the topic assignments
Phese 3 – Testing
Go to ut_test directory.
Copy Tokenizer class.
1
| cp ../ut_out/Tokenizer.class |
cp ../ut_out/Tokenizer.class
Format test data.
1
| cat ydir_1k.tst.txt | java -classpath . Tokenizer | ../formatter --dumpfile=../ut_out/lda.dict.dump |
cat ydir_1k.tst.txt | java -classpath . Tokenizer | ../formatter --dumpfile=../ut_out/lda.dict.dump
Learn test topics.
1
| ../learntopics -test --dumpprefix=../ut_out/lda --topics=100 |
../learntopics -test --dumpprefix=../ut_out/lda --topics=100
Output files are created inside ut_test.