- Open Source Search with Lucene & Solr
- Lucene Near Realtime Search
- LinkedIn Search & Lucene
- Distributed Scoring For Lucene Implementation
- Twitter’s New Search Architecture
- Real time search – Billions quieries per day
- Distributed lucene – Katta project
- Does iTunes use Lucene for search?
- Index Server Project Proposal
- NRS – Near Real time search with Lucene
- Solr’s distributed search
- Effective Smoothing for a Terabyte of Text
- Zoie – real time search and indexing based on Lucene
- Distrubuted lucene index on top of Cassandra : Lucandra = Lucene + Cassandra
Archive for the ‘search engines’ Category
Okay, we have text file with list of urls and want to have firefox’s screenshots from this pages and also we need to have this screenshots in some normalized resolution ( like all images should be in 300×400 – thumbnails ). First of all you need to install Command line print Firefox add-on. Then create some simple script which will run firefox with needed url, print screenshot and close ( in my case via kill – may be it’s too brutal ) firefox in cycle. It may look like this ( url_list.txt – file with urls – each url on its own line :-)), after running this script you will have many *.png files which is screenshots for ulrs – 0.png – for first url in urls_list.txt, 1.png for second and so on.
#!/bin/bash
id=0
while read line
do
firefox -print $line -printmode png -printdelay 10 -printfile ${id}.png
ps ax | grep firefox | awk '{ print $1 }' | xargs kill -9 ;
id=$[$id+1]
done < urls_list.txt
And now then we have screenshots ( all this guys are in different resolution in common ) then we need to normalize them – to create thumbnails for all images in 300×400 resolution – convert helps!
for f in *.png;
do
convert -thumbnail 300x400! ${f} thumb_${f}
done
And we have many thumb_*.pn with 300×400 resolution all. A little note – using resolution without ! sign will work in another way – resize will be processed proportionally with using resize only for one dimension ( bigger one ).
- The Web as a graph by Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan,Eli Upfaly, Andrew S. Tomkins
- The Web as a graph: measurements, models, and methods
- Web Graph and PageRank algorithm by Danil Nemirovsky
- Parallel implementation of graph diameter algorithms
- Diameter of the World-Wide-Web
- Fast Computation of Empirically Tight Bounds for the Diameter of Massive Graphs Cl ?mence Magnien 1 , Matthieu Latapy 1 and Michel Habib
- Structural Analysis of the Web
- Small world phenomenon
- Fast Radius Plot and Diameter Computation for Terabyte Graphs (SDM 2010)
- Probabiblistic Counting algorithms for Database applications by Philippe Flajolet
- Philippe Flajolet home page
- HADI: Fast Diameter Estimation and Mining in Massive Graphs with Hadoop
- U Kang personal home page – one of HADI authors
- Data Mining with M AP R EDUCE: Graph and Tensor Algorithms with Applications
- PEGASUS: Mining Peta-Scale Graphs
Federal Office for Information Security ( Germany ) publish warning about agaings Google Chrome browser. Also guys in Google change some strange things in Chrome’s User Agremeent – Google does not want rights to things you do using Chrome. Anyway browser may give Google much more information about user’s behaviour than toolbar.