Numpy Sorting

by cuba6112

art

Sorting and searching algorithms including O(n) partitioning, binary search, and hierarchical multi-key sorting. Triggers: sort, argsort, partition, searchsorted, lexsort, nan sort order.

Skill Details

Repository Files

4 files in this skill directory


name: numpy-sorting description: Sorting and searching algorithms including O(n) partitioning, binary search, and hierarchical multi-key sorting. Triggers: sort, argsort, partition, searchsorted, lexsort, nan sort order.

Overview

NumPy sorting provides efficient tools for ordering data. Beyond basic sorting, it includes partitioning for top-k selection and vectorized binary search for finding insertion points in sorted data.

When to Use

  • Finding the top $k$ largest or smallest elements without a full sort ($O(N)$).
  • Ordering data based on multiple criteria (e.g., sort by Date, then by Price).
  • Mapping data into bins or ranges using binary search.
  • Handling datasets containing NaNs where sorting order is sensitive.

Decision Tree

  1. Need the indices of the sorted order (not the values)?
    • Use np.argsort.
  2. Only need the $k$ smallest elements?
    • Use np.partition(arr, k). Elements to the left of index $k$ are smaller.
  3. Finding where to insert a value to keep order?
    • Use np.searchsorted(sorted_arr, value).

Workflows

  1. Efficiently Finding the Smallest K Elements

    • Identify an unsorted array.
    • Call np.partition(arr, kth=k).
    • Select the first k elements: result[:k].
  2. Vectorized Lookup in Sorted Ranges

    • Ensure the target array 'A' is sorted.
    • Pass a list of values 'V' to np.searchsorted(A, V).
    • Use the returned indices to map values to specific bins or ranges.
  3. Indirect Multi-Key Sort

    • Define primary and secondary key arrays.
    • Use np.lexsort((secondary, primary)) to get the index array.
    • Apply the indices to the data to achieve the desired hierarchical sort order.

Non-Obvious Insights

  • NaN Position: np.nan is treated as larger than np.inf and is always sorted to the end of the array.
  • Partition Performance: Partitioning along the last axis is significantly faster and uses less memory than partitioning along any other axis.
  • Lexsort Order: lexsort takes keys in reverse order of importance; the last key in the sequence is the primary sort key.

Evidence

  • "In the output array, all elements smaller than the k-th element are located to the left of this element and all equal or greater are located to its right." Source
  • "Binary search is used to find the required insertion points." Source

Scripts

  • scripts/numpy-sorting_tool.py: Implements top-k selection and hierarchical lexsort.
  • scripts/numpy-sorting_tool.js: Basic sort simulation.

Dependencies

  • numpy (Python)

References

Related Skills

Team Composition Analysis

This skill should be used when the user asks to "plan team structure", "determine hiring needs", "design org chart", "calculate compensation", "plan equity allocation", or requests organizational design and headcount planning for a startup.

artdesign

Startup Financial Modeling

This skill should be used when the user asks to "create financial projections", "build a financial model", "forecast revenue", "calculate burn rate", "estimate runway", "model cash flow", or requests 3-5 year financial planning for a startup.

art

Startup Metrics Framework

This skill should be used when the user asks about "key startup metrics", "SaaS metrics", "CAC and LTV", "unit economics", "burn multiple", "rule of 40", "marketplace metrics", or requests guidance on tracking and optimizing business performance metrics.

art

Market Sizing Analysis

This skill should be used when the user asks to "calculate TAM", "determine SAM", "estimate SOM", "size the market", "calculate market opportunity", "what's the total addressable market", or requests market sizing analysis for a startup or business opportunity.

art

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Geopandas

Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between dat

artdatacli

Market Research Reports

Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter's Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.

artdata

Plotly

Interactive scientific and statistical data visualization library for Python. Use when creating charts, plots, or visualizations including scatter plots, line charts, bar charts, heatmaps, 3D plots, geographic maps, statistical distributions, financial charts, and dashboards. Supports both quick visualizations (Plotly Express) and fine-grained customization (graph objects). Outputs interactive HTML or static images (PNG, PDF, SVG).

artdata

Excel Analysis

Analyze Excel spreadsheets, create pivot tables, generate charts, and perform data analysis. Use when analyzing Excel files, spreadsheets, tabular data, or .xlsx files.

artdata

Neurokit2

Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.

arttooldata

Skill Information

Category:Creative
Last Updated:12/26/2025