Binning Python

Problem binning python

This problem is mainly related to how we can find the mean by the binning python. The major challenge is to get an optimized version of the code to calculate the mean of an array not only in a decimal system but in binary numbers as well. The code for the same is shown below:

 

from scipy import *
from numpy import *

def get_bin_mean(a, b_start, b_end):
    ind_upper = nonzero(a >= b_start)[0]
    a_upper = a[ind_upper]
    a_range = a_upper[nonzero(a_upper < b_end)[0]]
    mean_val = mean(a_range)
    return mean_val


data = rand(100)
bins = linspace(0, 1, 10)
binned_data = []

n = 0
for n in range(0, len(bins)-1):
    b_start = bins[n]
    b_end = bins[n+1]
    binned_data.append(get_bin_mean(data, b_start, b_end))

print binned_data

 

In the above code, we are not using any inbuilt function which makes it quite complex as compared to the ones we will see in the part of the solution. 

Output 

 

 

Solution

Apart from the above approach we can use the digitize function of the NumPy library in pandas to achieve the same results.

import numpy
data = numpy.random.random(100)
bins = numpy.linspace(0, 1, 10)
digitized = numpy.digitize(data, bins)
bin_means = [data[digitized == i].mean() for i in range(1, len(bins))]

 

The output for this will be the same as above as well. Another alternative for this is the use of the histogram function in python.

 

bin_means = (numpy.histogram(data, bins, weights=data)[0] /

numpy.histogram(data, bins)[0])

 

The following operations can be effectively carried out using the capabilities found in the NumPy indexed package. The code for the following is:

 

import numpy_indexed as npi

print(npi.group_by(np.digitize(data, bins)).mean(data))

 

 

Use of the ufunc.at is another option. This method performs the desired operation at the supplied indices in-place. Using the searchsorted approach, we can determine the bin position for each data point. Then, each time we come across an index at bin indexes, we may use at to advance the position of the histogram by 1 at the index provided by bin indexes.

 

np.random.seed(1)
data = np.random.random(100) * 100
bins = np.linspace(0, 100, 10)

histogram = np.zeros_like(bins)

bin_indexes = np.searchsorted(bins, data)
np.add.at(histogram, bin_indexes, 1)

 

 

Also Read: What are Runtime Errors and Their Causes?

 

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *