| # pip-code-to-doc | |
| [pipableAi](https://www.linkedin.com/company/pipable.ai/about/) | |
| [colab_notebook](https://colab.research.google.com/drive/17PyMU_3QN9LROy7x-jmaema0cuLRzBvc?usp=sharing) | |
| ## What have we built? | |
| A 1.3 bn code documentation model that outperforms most models on documenting codes and making your in-house libs ready for LLM and RAG pipelines. | |
| We have also open sourced a [parsing lib](https://github.com/PipableAI/pip-library-parser) for the same, together the lib and model can turn your codebase to functional parse tree ready to be consumed by LLMs to execute complex tasks. | |
| This is a further trained version of pip-sql-1.3b. | |
| ## How we built it? | |
| We used softmax cross entropy and a modified form of policy grad along with Q loss, optimized in an EM set up. | |
| Loss behaviour in the set up mentioned above - | |
| ## License | |
| The model is open source under apache 2.0. License | |
| ## Usage | |
| ### Library use | |
| ```python | |
| !pip3 install git+https://github.com/PipableAI/pip-library-parser | |
| !pip3 install atlassian-python-api | |
| from pip_library_parser import CodeToDocGenerator | |
| from atlassian import Jira | |
| import torch | |
| torch.set_default_device("cuda") | |
| # Instantiate the CodeToDocGenerator | |
| generator = CodeToDocGenerator() | |
| # Generate docstrings for the module's functions and methods | |
| module = Jira | |
| module_name = "atlassian.Jira" | |
| docs = generator.generate_module_docs(module, module_name) | |
| print(docs) | |
| ``` | |
| ```python | |
| from pip_library_parser import CodeToDocGenerator | |
| # Instantiate the CodeToDocGenerator | |
| generator = CodeToDocGenerator() | |
| code_snippet = """ | |
| def example_function(x): | |
| return x * 2 | |
| """ | |
| docstring = generator.generate_docstring_from_pip_model(code_snippet) | |
| print("Generated Docstring:") | |
| print(docstring) | |
| ``` | |
| ### Installation | |
| ```bash | |
| pip install transformers | |
| ``` | |
| ### Prompt | |
| ```python | |
| prompt = f"""<function_code>{code}</function_code> | |
| <question>Give one line description of the python code above in natural language.</question> | |
| <doc>""" | |
| ``` | |
| ### PyTorch | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| device = "cuda" | |
| model = AutoModelForCausalLM.from_pretrained("PipableAI/pip-code-to-doc-1.3b").to(device) | |
| tokenizer = AutoTokenizer.from_pretrained("PipableAI/pip-code-to-doc-1.3b") | |
| prompt = f""" | |
| <function_code> | |
| def example_function(x): | |
| return x * 2 | |
| </function_code> | |
| <question>Give one line description of the python code above in natural language.</question> | |
| <doc>""" | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=300) | |
| tokenizer.decode(outputs[0], skip_special_tokens=True).split('<doc>')[-1].split('</doc>')[0] | |
| ``` | |
| ## Examples | |
| ### prompt | |
| ```python | |
| <function_code> | |
| ########################### | |
| # Generate Analytical Model | |
| ########################### | |
| ################################################## | |
| # func: get_np_array_transition_probability_matrix | |
| ################################################## | |
| def get_np_array_transition_probability_matrix(int_num_states, np_array_A_matrix): | |
| print('np_array_A_matrix:') | |
| print(np_array_A_matrix) | |
| ##################################################### | |
| # Perturb the adjacency matrix to avoid singularities | |
| ##################################################### | |
| np_array_A_matrix += (np.full((int_num_states, int_num_states), float_eps) - (np.identity(int_num_states) * float_eps)) | |
| print('np_array_A_matrix:') | |
| print(np_array_A_matrix) | |
| print('np_array_D_matrix:') | |
| np_array_D_matrix = np.diag(np.sum(np_array_A_matrix, axis=1)) | |
| print(np_array_D_matrix) | |
| print('np_array_D_matrix_inv:') | |
| np_array_D_matrix_inv = np.linalg.inv(np_array_D_matrix) | |
| print(np_array_D_matrix_inv) | |
| print('\n\n') | |
| print('np_array_P_matrix:') | |
| np_array_P_matrix = np.dot(np_array_D_matrix_inv, np_array_A_matrix) | |
| print(np_array_P_matrix) | |
| print('np.sum(np_array_P_matrix, axis=1):') | |
| print(np.sum(np_array_P_matrix, axis=1)) | |
| print('\n\n') | |
| return np_array_P_matrix | |
| ################################################## | |
| # func: get_np_array_perron_frobenius_eigen_vector | |
| ################################################## | |
| def get_np_array_perron_frobenius_matrix(int_num_states, np_array_P_matrix): | |
| np_array_perron_frobenius_matrix = np.linalg.matrix_power(np_array_P_matrix,1000) | |
| np_array_perron_frobenius_vector = np_array_perron_frobenius_matrix[0,:] | |
| print('np_array_perron_frobenius_matrix:') | |
| print(np_array_perron_frobenius_matrix) | |
| print('np.sum(np_array_perron_frobenius_matrix, axis=1):') | |
| print(np.sum(np_array_perron_frobenius_matrix, axis=1)) | |
| print('np.sum(np_array_perron_frobenius_matrix, axis=0):') | |
| print(np.sum(np_array_perron_frobenius_matrix, axis=0)) | |
| print('np.sum(np_array_perron_frobenius_matrix, axis=0)/int_num_states:') | |
| print(np.sum(np_array_perron_frobenius_matrix, axis=0)/int_num_states) | |
| print('np.dot(np_array_perron_frobenius_vector, np_array_P_matrix):') | |
| print(np.dot(np_array_perron_frobenius_vector, np_array_P_matrix)) | |
| print('np_array_perron_frobenius_vector:') | |
| print(np_array_perron_frobenius_vector) | |
| print('\n\n') | |
| return np_array_perron_frobenius_vector, np_array_perron_frobenius_matrix | |
| ############################# | |
| # func: get_np_array_Z_matrix | |
| ############################# | |
| def get_np_array_Z_matrix(int_num_states, np_array_P_matrix, np_array_perron_frobenius_matrix): | |
| np_array_Z_matrix = np.linalg.inv(np.identity(int_num_states) - np_array_P_matrix + np_array_perron_frobenius_matrix) | |
| print('np_array_Z_matrix:') | |
| print(np_array_Z_matrix) | |
| print('\n\n') | |
| return(np_array_Z_matrix) | |
| ############################# | |
| # func: get_np_array_H_matrix | |
| ############################# | |
| def get_np_array_H_matrix(int_num_states, np_array_Z_matrix, np_array_perron_frobenius_vector): | |
| np_array_H_matrix = np.zeros([int_num_states, int_num_states]) | |
| for i in range(int_num_states): | |
| for j in range(int_num_states): | |
| np_array_H_matrix[i][j] = (np_array_Z_matrix[j][j] - np_array_Z_matrix[i][j])/np_array_perron_frobenius_vector[j] | |
| print('np_array_H_matrix:') | |
| print(np_array_H_matrix) | |
| print('\n\n') | |
| return np_array_H_matrix | |
| ########### | |
| # func: run | |
| ########### | |
| def run(np_array_A_matrix): | |
| int_num_states = len(np_array_A_matrix) | |
| np_array_P_matrix = get_np_array_transition_probability_matrix(int_num_states, np_array_A_matrix) | |
| np_array_perron_frobenius_vector, np_array_perron_frobenius_matrix = get_np_array_perron_frobenius_matrix(int_num_states, np_array_P_matrix) | |
| np_array_Z_matrix = get_np_array_Z_matrix(int_num_states, np_array_P_matrix, np_array_perron_frobenius_matrix) | |
| np_array_H_matrix = get_np_array_H_matrix(int_num_states, np_array_Z_matrix, np_array_perron_frobenius_vector) | |
| return(np_array_H_matrix) | |
| </function_code> | |
| <question>Give one line description of the python code above in natural language.</question> | |
| <doc> | |
| ``` | |
| ### Response | |
| ```txt | |
| The given python code is a function that calculates the transition probability matrix, P, for a given adjacency matrix A, and then uses these matrices to calculate the Perron-Frobenius eigenvector and its inverse matrix Z, and finally, the H matrix which is the inverse of the Z matrix. The H matrix is then returned as the output of the function. The adjacency matrix A is a square matrix where each element at position (i, j) represents the probability of transitioning from state i to state j. The function first perturbs the adjacency matrix to avoid singularities, then calculates the transition probability matrix P, the Perron-Frobenius eigenvector and its inverse matrix Z, and finally, the H matrix. The H matrix is then returned as the output of the function. | |
| ``` | |
| ### Team | |
| Avi Kothari, Gyan Ranjan, Pratham Gupta, Ritvik Aryan Kalra, Soham Acharya |