Source code for kalfeat.typing

# -*- coding: utf-8 -*-
#   author: KLaurent <etanoyau@gmail.com>
#   Licence:  GPL-3.0 


""" 
`kalfeat`_ Type variables 
=========================== 

.. |ERP| replace:: Electrical resistivity profiling 

.. _kalfeat: https://github.com/WEgeophysics/kalfeat/
.. _pandas DataFrame: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
.. _Series: https://pandas.pydata.org/docs/reference/api/pandas.Series.html

Some customized type variables  need to be explained for easy understanding 
in the whole package. Indeed, customized type hints is used to define the 
type of arguments. 

**M**: Suppose to be the interger variable `IntVar` to denote the number of 
    rows in the ``Array``. 
    
**N**: Like the ``M``, *N* means the number of column in the ``Array``. It 
    is bound with  integer variable. 
    
**T**: Is known as generic type standing for `Any` type of variable. We keep 
    it unchanged. 

**U**: Unlike `T`, `U` stands for nothing. Use to sepcify the one dimentional 
    array. For instance:: 
        
        >>> import numpy as np 
        >>> array = np.arange(4).shape 
        ... (4, )
        
**S**: Indicates the `Shape` status. It is bound by `M`, `U`, `N`. 'U' stands
    for nothing for one dimensional array. While, the common shape expects 
    for one of two dimensional arrays, it is possible to extend array for 
    more than one dimensional. The class object :class:`AddShape` is 
    created to grand all the remaining value of integers shape. 
    
**D**: Stands for  dtype object. It is bound with  :class:`DType`.

**Array**: Defined for  one dimensional array and `DType` can be specify. For 
    instance, we generated two arrays (`arr1`and `arr2`) for different types:: 
        
        >>> import numpy as np
        >>> from kalfeat.typing import TypeVar, Array, DType
        >>> T = TypeVar ('T', float) 
        >>> A = TypeVar ('A', str, bytes )
        >>> arr1:Array[T, DType[T]] = np.arange(21) # dtype ='float'
        >>> arr2: Array[A, DType[A]] = arr1.astype ('str') # dtype ='str'
        
**NDArray**: Stands for multi-dimensional arrays i.e more than two. Here, the 
    difference between the one dimensional type variable ``Array`` is that 
    while the latter accepts the ``DType`` argument  as the second parameter. 
    It could be turn to the number of multidimentional rows including the 
    `Array as first argument and specify the DType as the second argument 
    like this:: 
        
        >>> import numpy as np 
        >>> from kalfeat.typing import TypeVar, Array, NDarray, DType 
        >>> T =TypeVar ('T', int)
        >>> U = TypeVar ('U')
        >>> multidarray = np.arange(7, 7).astype (np.int32)
        >>> def accept_multid(
                arrays: NDArray[Array[T, U], DType [T]]= multidarray
                ):
            ''' asserted with MyPy and work-fine.'''
                ...
                
**Sub**: Stands for subset. Indeed, the class is created to define the 
    conductive zone. It is a subset ``Sub`` of ``Array``. For example, we first 
    build an array secondly extract the conductive zone from |ERP| line.
    Finally, we checked the type hint to assert whether the extracted zone 
    is a subset of the whole |ERP| line. The demo is given below:: 
        
        >>> import numpy as np 
        >>> from kalfeat.typing import TypeVar, DType, Array , Sub
        >>> from kalfeat.tools.exmath import _define_conductive_zone
        >>> T= TypeVar ('T', float)
        >>> erp_array: Array[T, DType[T]] = np.random.randn (21) # whole line 
        >>> select_zone, _ = _define_conductive_zone (erp = erp_array , auto =True)
        >>> select_zone: Array[T, DType[T]]
        >>> def check_cz (select_zone: Sub[Array]): 
                ''' assert with MyPy and return ``True`` as it works fine. '''
                ... 
                
**SP**: Stands for Station positions. The unit of position may vary, however, 
    we keep for :mod:`kalfeat.method.electrical.ElectricalResistivityProfiling`
    the default unit in ``meters`` by starting at position 0. Typically,
    positions are recording according to the dipole length. For the example, 
    we can generated a position values for ``121 stations`` with dipole 
    length equals to ``50m`` i.e the length of the survey line is ``6 km``. 
    Here we go: 
        
        * Import required modules and generate the whole survey line::
            
            >>> import numpy as np 
            >>> from kalfeat.typing import TypeVar, DType, SP, Sub 
            >>> T =TypeVar ('T', bound =int)
            >>> surveyL:SP = np.arange(0, 50 *121 , 50.).astype (np.int32)
            ... (work fine with MyPy )
            
        * Let's verify whether the extract data from surveyL is also a subset 
            of station positions:
                
            -  We use the following fonction to to extract the specific
                part of whole survey line `surveyL`:: 
                    
                    >>> from kalfeat.tools.exmath import define_conductive_zone
                    >>> subpos,_ = define_conductive_zone (surveyL, s='S10') 
                    
            -  Now, we check the instance value `subpos` as subset array of 
                of `SP`. Note that the station 'S10' is included in the 
                extracted locations and is extented for seven points. For 
                further details, refer to `define_conductive_zone.__doc__`:: 
                
                    >>> def checksup_type (sp: Sub[SP[T, DType[T]]] = subpos ): 
                            ''' SP is an array of positions argument `sp`  
                            shoud be asserted as a subestof the whole line.'''
                            ... 
                    ... (test verified. subpos is a subset of `SP`) 
                    
**Series**: Stands for `pandas Series`_ object rather than using the specific 
    ``pandas.Series`` everywhere in the package. 
    
**DataFrame**: Likewise the ``Series`` generic type hint, it stands for 
    ``pandas DataFrame`_ object. It used to replace ``pandas.DataFrame`` object
    to identify the callable arguments in the whole packages. 
    Both can be instanciated as below:: 
        
        >>> import numpy as np 
        >>> import pandas pd 
        >>> from kalfeat.typing import TypeVar , Any, DType , Series, DataFrame
        >>> T  =TypeVar('T')
        >>> seriesStr = pd.Series ([f'obx{s}' for s in range(21)],
                                 name ='stringobj')
        >>> seriesFloat = pd.Series (np.arange(7).astype(np.float32),
                                 name =floatobj)
        >>> SERs = Series [DType[str]] # pass 
        >>> SERf =Series [DType [float]] # pass 
    
        ..
    
        >>> dfStr= pd.DataFrame {'ser1':seriesStr , 
                            'obj2': [f'none' for i in range (21)]}
        >>> dfFloat= pd.DataFrame {'ser1':seriesFloat , 
                            'obj2': np.linspace (3, 28 , 7)}
        >>> dfAny= pd.DataFrame {'ser1':seriesStr, 
                            'ser2':seriesFloat}
        >>> DFs  = DataFrame [SERs] | DataFrame [DType[str]]
        >>> DFf  = DataFrame [SERf] | DataFrame [DType[float]]
        >>> DFa =  DataFrame [Series[Any]] | DataFrame [DType[T]]
        
---

Additional definition for common arguments 
=========================================== 

To better construct a hugue API, an explanation of some argument is useful 
to let the user aware when meeting such argument in a callable function. 

**erp** : Stand for Electrical Resistivity Profiling. Typically, the type hint 
    for |ERP| is ``Array[float, DType [float]]`` or ``List[float]``. Its
    array is supposed to hold the apparent resistivy values  collected 
    during the survey. 
    
**p**: Typically mean position but by preference means station location
    positions. The type hint used to defined the `p` is ``
    ``Array[int, DType [int]]`` or ``List[int]``. Indeed, the position 
    supposed to be on integer array and the given values enven in float 
    should be casted to integers. 
     
**cz**: Stands for Conductive Zone. It is a subset of |ERP| so they share the 
    same type hint. However, for better demarcation, ``Sub`` is convenient to 
    use to avoid any confusion about the full |ERP| and the extracted  
    conductive as demontrated in the example above in ``Sub`` type hint
    definition.
        
"""

from typing import (
    TypeVar, 
    List,
    Tuple,
    Sequence, 
    Dict, 
    Iterable, 
    Callable, 
    Union, 
    Any , 
    Generic,
    Optional,
    Union,
    Type , 
    Mapping,
    Text,

)

T = TypeVar('T')
V = TypeVar('V')
K = TypeVar('K')
M =TypeVar ('M', bound= int ) 
N= TypeVar('N',  bound =int )
U= TypeVar('U')
D =TypeVar ('D', bound ='DType')
S = TypeVar('S', bound='Shape')

[docs]class AddShape (Generic [S]): 
    """ Suppose to be an extra bound to top the `Shape` for dimensional 
    more than two. 
    
    Example 
    ------- 
    >>> import numpy as np 
    >>> np.random.randn(7, 3, 3) 
    >>> def check_valid_type (
        array: NDArray [Array[float], Shape[M, AddShape[N]]]): 
        ... 
    
    """
[docs]class Shape (Generic[M, S], AddShape[S]): 
    """ Generic to construct a tuple shape for NDarray. `Shape` has is 
    written wait for two dimensional arrays with M-row and N-columns. However 
    for three dimensional,`Optional` Type could be: 
        
    :Example: 
        >>> import numpy as np 
        >>> # For 1D array 
        >>> np
        >>> np.random.rand(7)
        >>> def check_array1d( 
            array: Array[float, Shape[M, None]])
        >>> np.random.rand (7, 7).astype('>U12'):
        >>> def check_array2d_type (
            array: NDArray[Array[str], Shape [M, N], DType ['>U12']])
        
    """
    def __getitem__ (self, M, N) -> S: 
        """ Get the type of rown and type of columns 
        and return Tuple of ``M`` and ``N``. """
        ... 
    
[docs]class DType (Generic [T]): 
    """ DType can be Any Type so it holds 'T' type variable. """
    def __getitem__  (self, T) -> T: 
        """ Get Generic Type object and return Type Variable"""
        ...  
       
[docs]class Array(Generic[T, D]): 
    """ Arry Type here means the 1D array i.e singular column. """
    
    def __getitem__ (self, T) -> Union ['Array', T]: 
        """ Return Type of the given Type variable. """ 
        ... 
    
    
[docs]class NDArray(Array[T, DType [T]], Generic [T, D ]) :
    """ NDarray has ``M``rows, ``N`` -columns, `Shape` and `DType` object. 
    and Dtype. `Shape` is unbound for this class since it does not make since
    to sepecify more integers. However, `DType` seems useful to provide. 
    
    :Example: 
        >>> import numpy as np 
        >>> T= TypeVar (T, str , float) # Dtype here is gone to be "str" 
        >>> array = np.c_[np.arange(7), np.arange(7).astype ('str')]
        >>> def test_array (array: NDArray[T, DType [T]]):...
    """
    def __getitem__ (self,T ) -> T: 
        """ Return type variable. Truly the ``NDArray``"""
        ... 
    
[docs]class F (Generic [T]): 
    """ Generic class dedicated for functions, methods and class and 
    return the given types i.e callable object with arguments or `Any`. 
    
    :Example: 
        >>> import functools 
        >>> def decorator (appender ='get only the documention and pass.'):
                @functools.wraps(func):
                def wrapper(*args, **kwds)
                    func.__doc__ = appender + func.__doc__
                    return func (*args, **kwds) 
                return wrapper 
        >>> @decorator  # do_nothing = decorator (anyway)
            def anyway(*args, **kwds):
                ''' Im here to '''
                ...
        >>> def check_F(anyway:F): 
                pass 
    """
    def __getitem__ (self, item: Callable [...,T]
                     ) -> Union ['F', Callable[..., T], T, Any]:
        """ Accept any type of variable supposing to be a callable object 
        functions, methods or even classes and return the given type 
        object or another callable object  with its own or different specific 
        parameters or itself or Any."""
        return self 
    
[docs]class Sub (Generic [T]): 
    """ Return subset of whatever Array"""
    ... 
     
[docs]class SP(Generic [T, D]): 
    """ Station position arrays hold integer values of the survey location.
    Most likely, the station position is given according to the dipole length.
    Assume the dipole length is ``10 meters`` and survey is carried out on 
    21 stations. The station position array  should be an array of interger 
    values from 0. to 200 meters. as like:: 
        
        >>> import numpy as np 
        >>> positions: SP = np.arange(0, 21 * 10, 10.
                                     ).astype (np.int32) # integer values 
    """
    ... 
    
[docs]class Series (DType[T], Generic [T]): 
    """ To reference the pandas `Series`_ object. 
    
    .. _Series: https://pandas.pydata.org/docs/reference/api/pandas.Series.html
    
    :Example:
        >>> import numpy as np
        >>> import pandas as pd 
        >>> from kalfeat.typing import DType, Series  
        >>> ser = pd.Series (np.arange (21), name ='nothing')
        
    .. code: Python 
        
        def check_type (serObj:Series): 
            ''' pass anyway'''
            ... 
        check_type (seObj: Series[DType[str]]=ser ) 
    
    """
    def __getitem__ (self, item: T) -> 'Series': 
        """ Get the type variable of item T and return `Series`_ object."""
        return self 
    
    
       
[docs]class DataFrame (Series[T], Generic[T]): 
    """ Type hint variable to illutsrate the `pandas DataFrame`_ object. 
    
    .. _pandas DataFrame: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html
    .. _Series: https://pandas.pydata.org/docs/reference/api/pandas.Series.html
    
    Indeed, `pandas DataFrame`_ can be considered as an aggregation of `Series`_, 
    thus, the generic type hint variable is supposed to hold a `Series`_
    object. 
    
    :Example:
        
        >>> import numpy as np
        >>> import pandas as pd 
        >>> from kalfeat.typing import DType, DataFrame 
        
    .. code: Python 
         
        df =pd.DataFrame ({serie1: np.arange(7), 
                           serie2: np.linspace (0, 1000, 7), 
                           serie3: [f'0b{i} for i in range(7)]
                                    })
        def check_type (dfObj:DataFrame): 
            ... 
        ckeck_type (dfObj: DataFrame [DType [object]] =df)
    
    """
    
    def __getitem__(self, item: T)->'DataFrame':
        """ Get the type hint variable of `pandas DataFrame`_ and return the 
        object type variable."""
        
        return self     
    
if __name__=='__main__': 
    def test (array:Sub[SP[Array[int, DType[int]], DType [int]]]):... 
    def test2 (array:Sub[SP[Array, DType [int]]]):... 
    
    DFSTR  = DataFrame [Series[DType[str]]]
    DF = DataFrame [DType [object]]