douglib.core

Hello!

Created on Mon Aug 26 11:02:21 2013.

A library holding common subroutines and classes that I’ve created.

douglib.core._integrate(f, a, b, N=200)

Integrate function f from a to b using N itertions.

Parameters:
  • f (function) – The function to integrate. Must take a single numeric argument (or more if args 2 through n are optional).
  • a (float) –
  • b (float) – The limits of the integral
  • N (int, optional) – The number of samples to use. Higher numbers yield more accurate values but cost more processing power and memory.
Returns:

area – The area under the function.

Return type:

float

See also

https()
//helloacm.com/how-to-compute-numerical-integration-in-numpy-python/

Examples

>>> _integrate(np.sin, 0, np.pi/2, 100)
1.0000102809119051
douglib.core.array_2d_to_str(array_2d, delim='')

Convert a 2D array to a spreadsheet string.

Parameters:
  • array_2d (list of lists) – The array to convert.
  • delim (str, optional) – The delimiter. Defaults to the empty string. Use ‘,’ to make a true CSV string.
Returns:

A csv-compatible string.

Return type:

str

douglib.core.binary_file_compare(file1, file2)

Compare two files byte-by-byte.

Parameters:
  • file1 (str) – The path to the master file
  • file2 (str) – The path to the 2nd file.
Returns:

failcode – A flag providing information on where the difference is located.

Return type:

int

Notes

Fail codes can be:

  • 0: files match
  • 1: different sizes
  • 2: different first or last byte
  • 3: different data in statistically significant random sample
  • 4: different data in full search

See significant_subsample() for more information on failcode 3.

douglib.core.clip(x, min_max, clipval=None)

Clip the value x to x_min or x_max.

If clipval is defined, then returns those values instead. clipval must be a list or tuple of length 2.

Parameters:
  • x (numeric) – The value to clip
  • min_max (sequence of numerics, length 2) – The (minimum, maximum) value to return.
  • clipval (sequence of length 2, any type, optional) – The items to return when x is outside of (x_min, x_max). This sequence can be made up of any type.
Returns:

clipped

Return type:

any

Examples

>>> clip(10, (0, 1))
1
>>> clip(10, (0, 1), clipval=("Zero", "One"))
'One'
>>> clip(5.23, (3.24, 8.91))
5.23
douglib.core.convert_rcd_xyd(rcd)

Convert a list of (a, b, data) to (b, a, data).

Simply swaps the first two items in each sublist. Also sorts the new list by x then y.

Parameters:rcd (list of tuples) – The data to convert.
Returns:A copy of rcd with sublist index 0 and 1 swapped, sorted.
Return type:list of tuples
douglib.core.frange(start, stop, step)

Generator that creates an arbitrary-stepsize range.

Creates a list generator that returns [start, start + step, start + step * 2, ..., stop)

Note that the interval is closed-open [). The stop value is not supposed to be part of the returned list generator.

Parameters:
  • start (numeric) – The number to start at
  • stop (numeric) – The number to end at
  • step (numeric) – The delta between points
Returns:

  • frange (generator) – A generator that returns the numbers in the range on demand.
  • .. note:: – This function does not accout for floating-point math errors. This means that there’s a possibliity that rounding the last point to the step precision will equal stop. See examples.

Examples

>>> list(frange(1.5, 6.5, 0.5))
[1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0]

Floating Point Error:

>>> list(frange(1.2, 1.8, 0.2))
[1.2, 1.4, 1.5999999999999999, 1.7999999999999998]
douglib.core.from_engineering_notation(string)

Convert a number string with order-of-magnitude suffix to a float.

Parameters:string (string) – The string to convert.
Returns:number – The numerical equivalent of string.
Return type:float

Examples

>>> from_engineering_notation("1.23m")
0.00123
>>> from_engineering_notation("4.5k")
4500.0
>>> from_engineering_notation("-6.84u")
-6.84e-06
douglib.core.hash_file(file_object, hasher, blocksize=65536)

Hash a file using a given hashing type.

Parameters:
  • file_object (io.IOBase object) – The stream to hash.
  • hasher (hashlib.HASH object) – The hasher to use.
  • blocksize (int, optional) – The block size to read from file_object.
Returns:

The hash digest of the stream.

Return type:

digest

Note

file_object must already be opened.

Hint

Examples of valid hashers are hashlib.md5(), hashlib.sha256(), etc.

douglib.core.interpolate_1d_array(array, x)

Emulate LabVIEW’s Interpolate 1D Array function.

Takes a fractional index value x and returns an interpolated Y value.

Parameters:
  • array (list) – A 1D list of numeric values.
  • x (numeric) – The fractional index to inerpolate to.
Returns:

y – The interpolated value.

Return type:

float

Notes

This function only performs linear interpolation.

Note

Timing: O(1)

douglib.core.max_dist(center, size)

Calculate the distance to the farthest corner of a rectangle.

Assumes that the orgin is at (0, 0).

If the rectangle’s center is in Q1, then the upper-right corner is the farthest away from the origin. If in Q2, then the upper-left corner is farthest away. Etc.

Returns the magnitude of the largest distance.

Used primarily for calculating if a die has any part outside of wafer’s edge exclusion.

Parameters:
  • center (tuple of length 2, numerics) – (x, y) tuple defining the rectangle’s center coordinates
  • size (tuple of length 2) – (x, y) tuple that defines the size of the rectangle.
Returns:

dist – The distance from the origin (0, 0) to the farthest corner of the rectangle.

Return type:

numeric

See also

max_dist_sqrd()

douglib.core.max_dist_sqrd(center, size)

Calculate the squared distance to the farthest corner of a rectangle.

Assumes that the orgin is at (0, 0).

Does not take the square of the distance for the sake of speed.

If the rectangle’s center is in the Q1, then the upper-right corner is the farthest away from the origin. If in Q2, then the upper-left corner is farthest away. Etc.

Returns the squared magnitude of the largest distance.

Used primarily for calculating if a die has any part outside of wafer’s edge exclusion.

Parameters:
  • center (tuple of length 2, numerics) – (x, y) tuple defining the rectangle’s center coordinates
  • size (tuple of length 2) – (x, y) tuple that defines the size of the rectangle.
Returns:

dist – The distance from the origin (0, 0) to the farthest corner of the rectangle.

Return type:

float

See also

max_dist()

douglib.core.nearest_indicies(data, x)

Find the two array positions (indices) around x.

Parameters:
  • data (array-like) – A sequence of [x1, x2, ... xn] values
  • x (numeric) – The value to to search for in data
Returns:

indices – The indices which surround the value x. See Notes for more information.

Return type:

list

Examples

>>> nearest_indicies([1,4,6,8,10,15], 3)
[0, 1]
>>> nearest_indicies([1,4,6,8,10,15], 6)
[2]
>>> nearest_indicies([1,4,6,8,6,10], 7)     # only returns 1st match
[2, 3]

See also

pick_x_at_y()

Note

  • Timing: O(n)
  • If an exact match is found, returns a list of length 1 which contains the index of the element x. Otherwise, returns a list of length 2 containing the two indices that surround x.
  • If there are more than two possible locations, it only returns the first.
douglib.core.normal_cdf(x)

Return the probability for a z-score of x.

Parameters:x (float) – The value to.. stuff and things.
Returns:The probability that a value below x will occur.
Return type:float

References

https://en.wikipedia.org/wiki/Normal_distribution#Cumulative_distribution_function

Examples

>>> round(normal_cdf(1.96), 3)
0.975
>>> round(normal_cdf(1.6448536269514722), 3)
0.95
>>> round(normal_cdf(2.5758293035489004), 3)
0.995
>>> round(normal_cdf(0), 3)
0.5
>>> round(normal_cdf(-1), 3)
0.159

# 68-95-99.7 rule >>> round(normal_cdf(1) - normal_cdf(-1), 2) 0.68 >>> round(normal_cdf(2) - normal_cdf(-2), 2) 0.95 >>> round(normal_cdf(3) - normal_cdf(-3), 3) 0.997

# The probit function should be the inverse of this >>> round(probit(normal_cdf(1)), 2) 1.0 >>> round(probit(normal_cdf(2)), 2) 2.0

douglib.core.pick_x_at_y(xy_array, y)

Manual linear interpolation at a POI.

Parameters:
  • xy_array (list) – A list in the format [(x1,y1), (x2,y2), ...]
  • y (numeric) – The y value to look for.
Returns:

x – The x value for the given y.

Return type:

numeric

douglib.core.position(array, item)

Emulate Mathematica’s Position[] function as best as possible.

Only works on 1D arrays.

Parameters:
  • array (sequence) – The list of items to search through.
  • item (any) – The item to search for.
Returns:

indices – The a generator for the index(es) of item in array. Returns an empty generator if item is not found.

Return type:

generator

Examples

>>> list(position([0, 1, 2, 3, 4], 2))
[2]
>>> list(position(["a", "B", "C", "d"], "d"))
[3]
>>> list(position(['1', '1', 'a', 15, 1], '1'))
[0, 1]

Note

Timing: O(1)

douglib.core.probit(p)

Return the probit function at probability p.

Parameters:p (float) – Probability that a value will be drawn from the returned range. Must be between 0 and 1 inclusive.
Returns:The value of the probit function at p.
Return type:float

Notes

This was shamelessly taken from the Scipy source code. I don’t want to deal with getting a scipy requirement working for this project and I only use this bit from it so... I figured I’d make it myself.

Examples

>>> round(probit(0.025), 2)
-1.96
>>> round(probit(0.975), 2)
1.96
>>> probit(0.5)
0.0
>>> round(probit(0.95), 12)
1.644853626951
douglib.core.rc_to_radius(rc_coord, die_xy, center_rc)

Convert a die RC coordinate to a radius.

Parameters:
  • rc_coord (sequence of ints, length 2) – The (row, column) grid coordinate die
  • die_xy (sequence of numerics, length 2) – The die (x, y) size. Typically in units of mm.
  • center_rc (sequence of numerics, length 2) – The grid (row, column) coordinate which defines the origin (center of the wafer).
Returns:

radius – The radius of the center of the die in question.

Return type:

float

douglib.core.rc_to_radius_sqrd(rc_coord, die_xy, center_rc)

Convert a die RC coordinate to a radius.

Returns the squared radius for the sake of speed.

Parameters:
  • rc_coord (sequence of ints, length 2) – The (row, column) grid coordinate die
  • die_xy (sequence of numerics, length 2) – The die (x, y) size. Typically in units of mm.
  • center_rc (sequence of numerics, length 2) – The grid (row, column) coordinate which defines the origin (center of the wafer).
Returns:

radius – The squared radius of the center of the die in question.

Return type:

float

See also

rc_to_radius()

douglib.core.rcd_to_2d_array(data, missing=0)

Convert an array of tuples to a 2D array (matrix-like).

Takes an array of tuples of (Row (y), column (x), data) and converts it to a 2D array where the element index is the row and column value.

Parameters:
  • data (list of tuples) – The data to convert, in the format [(x1, y1, d1), (x2, y2, d2), ...]
  • missing (any, optional) – The value to replace use for missing points.
Returns:

array – The matrix-like array.

Return type:

list

Example

>>> data = [[0, 0, 'a'], [0, 1, 'b'], [0, 2, 'c'],
...         [1, 0, 'd'], [1, 1, 'e'], [1, 2, 'f'],
...         [2, 0, 'g'], [2, 2, 'i'],
...         ]
>>> rcd_to_2d_array(data, 'X')
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'X', 'i']]

Warning

data must be sorted by Row (y) then by Column (x) values.

douglib.core.reedholm_die_to_rc(die_name)

Convert the Reedholm die name (“x27y54”) to a row-column tuple.

Parameters:die_name (str) – The die name to parse.
Returns:The (row, column) grid coordinate.
Return type:tuple
douglib.core.rescale(x, orig_scale, new_scale=(0, 1))

Rescale x to run over a new range.

Rescales x (which was part of scale original_min to original_max) to run over a range (new_min to new_max) such that the value x maintains position on the new scale. If x is outside of xRange, then y will be outside of yRange.

Default new scale range is 0 to 1 inclusive.

Parameters:
  • x (numeric) – The value to rescale.
  • orig_scale (sequence of numerics, length 2) – The (min, max) value that x typically ranges over.
  • new_scale (sequence of numerics, length 2, optional) – The new (min, max) value that the rescaled x should reference
Returns:

result – The rescaled x value

Return type:

float

Examples

>>> rescale(5, (10, 20), (0, 1))
-0.5
>>> rescale(27, (0, 200), (0, 5))
0.675
>>> rescale(1.5, (0, 1), (0, 10))
15.0

See also

rescale_clip()

douglib.core.rescale_clip(x, orig_scale, new_scale=(0, 1))

Same as rescale(), but also clips the new data.

Any result that is below new_min or above new_max is return as new_min or new_max, respectively

Parameters:
  • x (numeric) – The value to rescale.
  • orig_scale (sequence of numerics, length 2) – The (min, max) value that x typically ranges over.
  • new_scale (sequence of numerics, length 2, optional) – The new (min, max) value that the rescaled x should reference
Returns:

result – The rescaled x value

Return type:

float

Examples

>>> rescale_clip(5, (10, 20), (0, 1))
0
>>> rescale_clip(15, (10, 20), (0, 1))
0.5
>>> rescale_clip(25, (10, 20), (0, 1))
1

See also

rescale()

douglib.core.reservoir_sampling(array, num)

Randomly selects a number of elements from array.

Adapted from Wikipedia page on Reservoir Sampling: http://en.wikipedia.org/wiki/Reservoir_sampling

Parameters:
  • array (list) – The list of items to choose from
  • num (int) – The number of elements to choose from array
Returns:

list_subset – A random subset of array which is num items long.

Return type:

list

Note

Timing: O(n)

douglib.core.round_to_multiple(x, y)

Round x to a multiple of y.

Parameters:
  • x (numeric) – The value to be rounded.
  • y (numeric) – The multiplier to round to.
Returns:

roundedx rounded to the nearest multiple of y

Return type:

numeric

Examples

>>> round_to_multiple(1.1234, 0.1)
1.1
>>> round_to_multiple(4.767, 0.3)
4.8
>>> round_to_multiple(1.1234, 0.32)
1.28
>>> round_to_multiple(-1.1234, 0.06)
-1.14
douglib.core.significant_sample_size(N, **kwargs)

Return the significant sample size.

The significant sample size is the sample size needed to provide a given z-score. (or confidence interval) and margin of error from a population of size N and response distribution p. Assumes a normal distribution.

Parameters:
  • N (int) – The population size.
  • Z (float, optional [1.96]) – The Z-score for the desired confidence interval. If given, CI must not be given. Defaults to a confidence interval of 95%.
  • CI (float, optional [0.95]) – The desired confidence interval. Must be between 0 and 1 inclusive. If given, Z must not be given. Defaults to a Z-score of 1.96.
  • E (float, optional [0.02]) – The desired margin of error. Must be between 0 and 1 inclusive.
  • p (float, optional [0.5]) – Response distribution. This is what the expected response rate is. If you aren’t sure, use 0.5 as that results in the largest sample size. Must be between 0 and 1 inclusive.
Returns:

n – The number of samples needed.

Return type:

int

Examples

>>> significant_sample_size(1000)
706
>>> significant_sample_size(1000, Z=1.6448, E=0.05)
213
>>> significant_sample_size(1000, Z=1.6448, E=0.1)
63
>>> significant_sample_size(1000, Z=1.6448, E=0.1, p=0.3)
53
>>> significant_sample_size(10000)
1936
>>> significant_sample_size(1000, CI=0.95, E=0.02)
706
>>> significant_sample_size(1000, CI=0.96, E=0.02)
725
>>> significant_sample_size(1000, CI=0.95, E=0.03)
516

Notes

The sample size for the statistically significant random sample is given by:

\[n = \frac{N \times Z^2 \times p(1-p)} {(N-1) E^2 +(Z^2 \times p(1-p))}\]
  • n = sample size
  • N = population size
  • Z = z-score for a given confidence interval
  • E = margin of error
  • p = is the response distribution (what the expected response rate is)

Info from http://www.raosoft.com/samplesize.html which provides the following equations:

\[x = Z^2 \times p(1-p)\]
\[n = \frac{(N \times x)}{((N-1) \times E^2 + x)}\]
\[E^2 = \frac{(N - n) \times x}{n(N-1)}\]

Note that on the website: \(Z(c)^2\), where \(Z\) is a function of \(c\).

Typical Z-scrore / confidence interval values are:

  • Z = 1.6448536269514722 -> 90%
  • Z = 1.959963984540054 -> 95%
  • Z = 2.5758293035489004 -> 99%

and I have no idea how to calculate them.

For a given population size, the margin of error E typically has a much stronger effect on n than the confidence interval does.

Example:

>>> # Given a population of 1000, a CI of 95%, and a MoE of 2%:
>>> significant_sample_size(1000, CI=0.95, E=0.02)
706
>>> # a 1% change in CI means a 2.5% change in sample size:
>>> significant_sample_size(1000, CI=0.94, E=0.02)  # 1% change in CI
688
>>> # a 1% change in margin of error means a 27% change in sample size:
>>> significant_sample_size(1000, CI=0.95, E=0.03)  # 1% change in error
516
douglib.core.significant_subsample(array, CI=0.95, E=0.02, p=0.5)

Return a subarray that is a statictically significant sampling.

Assumes the original array is the entire population.

See docstring for the significant_sample_size function for more information.

Parameters:
  • array (sequence) – The array to create a subset of.
  • CI (float [0.95]) – The desired confidence interval. Must be between 0 and 1 inclusive.
  • E (float [0.02]) – The desired margin of error. Must be between 0 and 1 inclusive.
  • p (float [0.5]) – Response distribution. This is what the expected response rate is. If you aren’t sure, use 0.5 as that results in the largest sample size. Must be between 0 and 1 inclusive.
Returns:

subarray – A random subset of array that is N items long, where N is defined by the input parameters.

Return type:

sequence

Note

  • Timing: O(n)
douglib.core.sort_by_column(big_list, *args, **kwargs)

Sort a 2D list by columns defined by args.

Will sort by multiple columns if args is longer than 1 element.

Parameters:
  • big_list (list) – The list to sort.
  • *args (int) – The column(s) to sort by.
  • inplace (bool, optional [False]) – If True, the variable sent to big_list will be modified. If False, a copy of the list is made.
Returns:

sorted – A copy or reference to the sorted list.

Return type:

list

Notes

sort_by_column(A, 3, 1) will sort by the 4th column (index 3) and then by 2nd column (index 1).

sort_by_column(A, 1) is the same as sort_by_column(A, 1, inplace=False)

Sorting in place (inplace=True) means that the data for the variable that you entered (A) will be modified. inplace=False returns a copy of the 2D array and is the default.

Examples

>>> my_array = [[3 ,5], [2, 4], [1, 7]]
>>> sort_by_column(my_array, 1)     # sort by column 1 (2nd col) and copy
[[2, 4], [3, 5], [1, 7]]
>>> sort_by_column(my_array, 1, inplace=True)   # modifies my_array
>>> print(my_array)
[[2, 4], [3, 5], [1, 7]]
douglib.core.threshold_1d_array(array, y)

Emulate LabVIEW’s Threshold 1D Array function.

Takes a Y value and returns a fractional index for that Y value. If the function is not monotomically increasing, it returns the first value found.

Parameters:
  • array (list) – A 1D list of numeric values.
  • y (numeric) – The value to search for.
Returns:

fractional_index – A fractional index representing the location of y.

Return type:

float

Note

Timing: O(n)

douglib.core.to_engineering_notation(number, num_digits=5)

Convert a float to string with an SI order-of-magnitude suffix.

Caution

This function can reduce significant digits.

Note

  • Only uses suffixes that are multiples of 3.
  • Always uses smaller of two options.
Parameters:
  • number (numeric) – The number to convert.
  • num_digits (int, optional) – The maximum number of digits to display in string.
Returns:

engr_string – An engineering-formatted string representation of number.

Return type:

string

Examples

>>> to_engineering_notation(123456)
'123.46k'
>>> to_engineering_notation(-0.003216)
'-3.216m'

Using num_digits:

>>> to_engineering_notation(1000036, 2)
'1M'
>>> to_engineering_notation(1000036, 6)
'1.00004M'
>>> to_engineering_notation(-0.003216, 1)
'-3m'
>>> to_engineering_notation(-0.003216, 3)
'-3.22m'
>>> to_engineering_notation(32165, 1)
'3e+01k'
>>> to_engineering_notation(32165, 2)
'32k'
>>> to_engineering_notation(32165, 3)
'32.2k'
>>> to_engineering_notation(32165, 4)
'32.16k'
douglib.core.xyd_to_2d_array(data, missing=0)

Convert an array of tuples to a 2D array (matrix-like).

Takes an array of (x, y, data) tuples and converts it to a 2D array where the element index is the Y and X value.

Parameters:
  • data (list of tuples) – The data to convert, in the format [(x1, y1, d1), (x2, y2, d2), ...]
  • missing (any, optional) – The value to replace use for missing points.
Returns:

array – The matrix-like array.

Return type:

list

Example

>>> data = [[0, 0, 'a'], [0, 1, 'b'], [0, 2, 'c'],
...         [1, 0, 'd'], [1, 1, 'e'], [1, 2, 'f'],
...         [2, 0, 'g'], [2, 2, 'i'],
...         ]
>>> xyd_to_2d_array(data, 'X')
[['a', 'b', 'c'], ['d', 'e', 'f'], ['g', 'X', 'i']]

Warning

data must be sorted by X then by Y values.

douglib.core.z_score_from_confidence_interval(ci)

Return a Z-score for a given confidence interval.

Parameters:ci (float) – The confidence intervalue to use. Must be beween 0 and 1 inclusive.
Returns:The z-score (the number of standard deviations from the mean) for a symmetric interval.
Return type:float

Examples

>>> round(z_score_from_confidence_interval(0.95), 12)
1.95996398454
>>> round(z_score_from_confidence_interval(0.90), 12)
1.644853626951
>>> round(z_score_from_confidence_interval(0.975), 12)
2.241402727605