Automated Data Mining in Python

Introduction to automated data mining in python

In this article, we will learn how one can execute automated data mining in python. Data mining is a subset of machine learning of artificial intelligence. It basically means to get more accurate results while applying machine learning to any dataset by using more layers while doing the learning process. One of the applications of data mining in python is to search a particular keyword in the text and then give the word associated with it in the output window.

To process the above application one need to train the model developed using a machine learning algorithm by using a good amount of data. To automate this process there would be a need for API. One must first determine what you need because APIs are either REST or SOAP. The basic code to implement the same is shown here:

There are different libraries that users need to import while making a connection with the API. The very first one is DateTime which is useful in tracking the current date and time as well as information about the time when the data is fetched from the source.

The second ones are Client and suds which are mainly responsible for creating the connection with the API. Next in the line is cStringIO which is mainly used by programmers to create a text object which can be used further in the code to apply different functions to it. The purpose of BeauitfulSoup is here to generalize the code in the output window. The code for the same is shown here:

# importing the required modules
import logging, time, requests, re, suds_requests
from datetime import timedelta,date,datetime,tzinfo
from requests.auth import HTTPBasicAuth
from suds.client import Client
from suds.wsse import *
from suds import null
from cStringIO import StringIO
from bs4 import BeautifulSoup as Soup

In the next piece of code, StringIO is creating an object named log_stream() on which we are applying different functions on it. The functions called basis_config and getLogger are basically useful to configure logging by setting the complexity level as well as the format of the logged file. The code is shown below:

# creating an object for logging purpose 
log_stream = StringIO()
logging.basicConfig(stream=log_stream, level=logging.INFO)
logging.getLogger('suds.transport').setLevel(logging.DEBUG)
logging.getLogger('suds.client').setLevel(logging.DEBUG)

# defining the link for the client server connection 
WSDL_URL = 'http://213.166.38.97:8080/SRIManagementWS/services/SRIManagementSOAP?wsdl'

After that, we create a username and password for the session which we have created in the further lines of code for security purposes. Here the work of the session is to transfer the data. The code is shown below for the same:

# creating a username and password for the session
user_name='username'
pass_word='pass_word'

# creation of the session and its suthentication
session = requests.session()
session.auth=(user_name, pass_word)

The function called addSecuirtyHeader checks the created username and password. If the given parameters are correct it returns the appended tokenized value and if false will show an error message. The program for the same is as follows:

# function to check username and password before data fetching
def addSecurityHeader(client,user_name,pass_word):
    security=Security()
# creating tokens for the username and password 
    userNameToken=UsernameToken(user_name,pass_word)
    security.tokens.append(userNameToken)
    client.set_options(wsse=security)

# calling the function 
addSecurityHeader(client,user_name,pass_word)

val1 = "argument_1"
val2 = "argument_2"

# using the try except block in case of the error message 
try:
    client.service.GetServiceById(val1, val2)
except TypeNotFound as e:
    print e
logresults = log_stream.getvalue()

The complete program looks like this:

import logging, time, requests, re, suds_requests
from datetime import timedelta,date,datetime,tzinfo
from requests.auth import HTTPBasicAuth
from suds.client import Client
from suds.wsse import *
from suds import null
from cStringIO import StringIO
from bs4 import BeautifulSoup as Soup

# creating an object to conig logging parameters

log_stream = StringIO()
logging.basicConfig(stream=log_stream, level=logging.INFO)
logging.getLogger('suds.transport').setLevel(logging.DEBUG)
logging.getLogger('suds.client').setLevel(logging.DEBUG)

WSDL_URL = 'http://213.166.38.97:8080/SRIManagementWS/services/SRIManagementSOAP?wsdl'

# creating username and password for security reasons

user_name='username'
pass_word='password'
session = requests.session()
session.auth=(user_name, pass_word)

# creating a security function for encryption of code 

def addSecurityHeader(client,user_name,pass_word):
    security=Security()
    userNameToken=UsernameToken(user_name,pass_word)
    security.tokens.append(userNameToken)
    client.set_options(wsse=security)

addSecurityHeader(client,username,password)

val1 = "argument_1"
val2 = "argument_2"

try:
    client.service.GetServiceById(val1, val2)
except TypeNotFound as e:
    print e
logresults = log_stream.getvalue()

The above program shows the basic libraries one needs to import into their system as well as adding the security for the encryption of data from the website. Basically, this code makes a connection between the machine with an API. After the connection of API for the purpose of storing the data, one needs the MySQL database.

Also Read: AlphaStar: Strategy game StarCraft II expertise

Share this post

Introduction to automated data mining in python

Share this:

You May Also Like

Leave a Reply Cancel reply