21 de mayo de 2017

Identificar telefonos falsos

El siguiente script identifica "teléfonos falsos" calculando un ratio de repeticion de digitos, donde un ratio=0 significa que el teléfono es la repeticion de un digito, como 0000-0000-0000, y un ratio mas alto indica mayor distribucion de los digitos.

El calculo es DigitosUnicos/CantidadDigitos, y conceptualmente queda así:














NOTA:
En este script los teléfonos son creados de forma aleatoria y la longitud de dígitos es la mima.



scirpt:
tel_tmp:
LOAD area_code &'-'& tel_p1 &'-'& tel_p2 as telefono;
LOAD 
num(Round((NORMDIST(Rand(),Rand(),Rand()) * 1000)),'0000') as area_code,
num(Round((NORMDIST(Rand(),Rand(),Rand()) * 1000)),'0000') as tel_p1,
num(Round((NORMDIST(Rand(),Rand(),Rand()) * 1000)),'0000') as tel_p2 
AutoGenerate(99999);

tel:
LOAD str_len, telefono, unq_str, 
if(str_len>=3,unq_str / str_len,0) as ratio_bad_phone;
LOAD *, 
(flag_str_0+flag_str_1+flag_str_2+flag_str_3+flag_str_4+flag_str_5+flag_str_6+flag_str_7+flag_str_8+flag_str_9) as unq_str;
LOAD *,
if(str_0>0,1,0) as flag_str_0,
if(str_1>0,1,0) as flag_str_1,
if(str_2>0,1,0) as flag_str_2,
if(str_3>0,1,0) as flag_str_3,
if(str_4>0,1,0) as flag_str_4,
if(str_5>0,1,0) as flag_str_5,
if(str_6>0,1,0) as flag_str_6,
if(str_7>0,1,0) as flag_str_7,
if(str_8>0,1,0) as flag_str_8,
if(str_9>0,1,0) as flag_str_9;
LOAD *, 
SubStringCount(telefono,'0') as str_0,
SubStringCount(telefono,'1') as str_1,
SubStringCount(telefono,'2') as str_2,
SubStringCount(telefono,'3') as str_3,
SubStringCount(telefono,'4') as str_4,
SubStringCount(telefono,'5') as str_5,
SubStringCount(telefono,'6') as str_6,
SubStringCount(telefono,'7') as str_7,
SubStringCount(telefono,'8') as str_8,
SubStringCount(telefono,'9') as str_9;
LOAD telefono, Len(telefono) as str_len
Resident tel_tmp;

DROP Table tel_tmp;

EXIT SCRIPT;


No hay comentarios: