com.webscrap4j
Class WebScrap

java.lang.Object
  extended by com.webscrap4j.WebScrap

public class WebScrap
extends java.lang.Object

This calss is use to Scrap any website and extract data from them easily and fast.

Version:
1.0
Author:
Sawan Kumar

Constructor Summary
WebScrap()
           
 
Method Summary
 java.util.ArrayList<java.lang.String> getImageTagData(java.lang.String tag, java.lang.String attr)
          This method is use to get all attributes of single HTML tag data like a(anchor), img(image) etc like <img src="imagelink" title="imagetitle" />
 java.util.ArrayList<java.lang.String> getSingleHTMLScriptData(java.lang.String startTag, java.lang.String endTag)
          This method is used to extract data between two HTML tag like div, span etc.
 java.util.ArrayList<java.lang.String> getSingleHTMLScriptData(java.lang.String startTag, java.lang.String endTag, java.lang.String middleTagOne, java.lang.String middleTagTwo)
          This method is used to get data from two different tag like ul-li tags.
 java.lang.String getSingleHTMLTagData(java.lang.String tagData)
          This method is used to extract data from HTML tag like title, b,strong etc which tag start and end with same name.
 void setUrl(java.lang.String url)
          This method is used to set your website full url which you want to scrap.
 void startWebScrap()
          This method is used to start the WebScrap and call it on WebScrap object.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WebScrap

public WebScrap()
Method Detail

setUrl

public void setUrl(java.lang.String url)
This method is used to set your website full url which you want to scrap.

Parameters:
url - it should be with 'http://' without single qoute

startWebScrap

public void startWebScrap()
                   throws WebScrapException
This method is used to start the WebScrap and call it on WebScrap object.

Throws:
WebScrapException

getSingleHTMLTagData

public java.lang.String getSingleHTMLTagData(java.lang.String tagData)
                                      throws WebScrapException
This method is used to extract data from HTML tag like title, b,strong etc which tag start and end with same name.

Parameters:
tagData -
Returns:
String - data between that tag
Throws:
WebScrapException

getSingleHTMLScriptData

public java.util.ArrayList<java.lang.String> getSingleHTMLScriptData(java.lang.String startTag,
                                                                     java.lang.String endTag)
                                                              throws WebScrapException
This method is used to extract data between two HTML tag like div, span etc.

Parameters:
startTag - - starting tag
endTag - - ending tag
Returns:
ArrayList String type
Throws:
WebScrapException

getSingleHTMLScriptData

public java.util.ArrayList<java.lang.String> getSingleHTMLScriptData(java.lang.String startTag,
                                                                     java.lang.String endTag,
                                                                     java.lang.String middleTagOne,
                                                                     java.lang.String middleTagTwo)
                                                              throws WebScrapException
This method is used to get data from two different tag like ul-li tags. eg. <ul> <li>Data 1</li> <li>Data 2</li> <li>Data 3</li> <li>Data 4</li> </ul>

Parameters:
startTag - - It is a primary tag start like <ul>
endTag - - It is primary tag end like </ul>
middleTagOne - - It is a secondary tag start like <li>
middleTagTwo - - It is seconday tag end like </li>
Returns:
ArrayList String type
Throws:
WebScrapException

getImageTagData

public java.util.ArrayList<java.lang.String> getImageTagData(java.lang.String tag,
                                                             java.lang.String attr)
                                                      throws WebScrapException
This method is use to get all attributes of single HTML tag data like a(anchor), img(image) etc like <img src="imagelink" title="imagetitle" />

Parameters:
tag - - It is tag like img or a.
attr - - It is attribute tag like href, src, title etc.
Returns:
All data of attributes of given HTML tags.
Throws:
WebScrapException